From
Wikipedia, the free encyclopedia
Jump to: navigation,
search
Search engine optimization (SEO)
is a set of methods aimed at improving the ranking of a website in search engine
listings, and could be considered a subset of search engine marketing. The term
SEO also refers to "search engine optimizers," an industry of consultants
who carry out optimization projects on behalf of clients' sites. Some commentators,
and even some SEOs, break down methods used by practitioners into categories such
as "white hat SEO" (methods generally approved by search engines, such
as building content and improving site quality), or "black hat SEO"
(tricks such as cloaking and spamdexing). White hatters charge that black hat
methods are an attempt to manipulate search rankings unfairly. Black hatters counter
that all SEO is an attempt to manipulate rankings, and that the particular methods
one uses to rank well are irrelevant.
Search engines display different
kinds of listings in the
search engine results pages
(
SERPs), including:
pay per
click advertisements,
paid
inclusion listings, and
organic search results.
SEO is primarily concerned with advancing the goals of a
website
by improving the number and position of its
organic search
results for a wide variety of relevant
keywords.
SEO strategies may increase both the number and quality of visitors. Search engine
optimization is sometimes offered as a stand-alone service, or as a part of a
larger marketing effort, and can often be very effective when incorporated into
the initial development and design of a site.
For competitive, high-volume
search terms, the cost of
pay per click advertising
can be substantial. Ranking well in the organic search results can provide the
same targeted traffic at a potentially significant savings. Site owners may choose
to optimize their sites for organic search, if the cost of optimization is less
than the cost of advertising.
Not all sites have identical goals for search
optimization. Some sites seek any and all traffic, and may be optimized to rank
highly for common search phrases. A broad search optimization strategy can work
for a site that has broad interest, such as a
periodical,
a
directory, or site that displays advertising with
a
CPM revenue model. In contrast, many businesses
try to optimize their sites for large numbers of highly specific keywords that
indicate readiness to buy. Overly broad search optimization can hinder marketing
strategy by generating a large volume of low-quality inquiries that cost money
to handle, yet result in little business. Focusing on desirable traffic generates
better quality
sales leads, resulting in more sales.
Search engine optimization can be very effective when used as part of a smart
niche marketing strategy.
Contents
o o | 1.1 Early
search engines 1.2 Organic search engines |
2
The relationship between SEO and the search engines
3 Getting into search
engines' listings
4 White hat methods
5 Black hat methods
6 SEO and
Marketing
7 Legal issues
8 See also
9 References
10 External links
| o
| 10.1 Additional research resources |
| o
| 10.2 Search engines' guidelines |
| o
| 10.3 Sources of background information |
HistoryEarly
search engines
Webmasters and content providers began optimizing
sites for search engines in the mid-1990s, as the first search engines were cataloging
the early Web. Initially, all a webmaster needed to do was submit a site to the
various engines which would run spiders, programs to "crawl" the site,
and store the collected data. The default search-bracket was to scan an entire
webpage for so-called related search words, so a page with many different words
matched more searches, and a webpage containing a dictionary-type listing would
match almost all searches, limited only by unique names. The search engines then
sorted the information by topic, and served results based on pages they had crawled.
As the number of documents online kept growing, and more webmasters realized the
value of organic search listings, some popular search engines began to sort their
listings so they could display the most relevant pages first. This was the start
of a friction between search engine and webmasters that continues to this day.
At
first search engines were guided by the webmasters themselves. Early versions
of search algorithms relied on webmaster-provided information such as category
and keyword meta tags, or index files in engines like ALIWEB. Meta-tags provided
a guide to each page's content. When some webmasters began to abuse meta tags,
causing their pages to rank for irrelevant searches, search engines abandoned
their consideration of meta tags and instead developed more complex ranking algorithms,
taking into account factors that elevated a limited number of words (anti-dictionary)
and were more diverse, including:
 | Text
within the title tag |
 | Domain
name |
 | URL
directories and file names |
 | HTML
tags: headings, emphasized (<em>) and strongly emphasized (<strong>)
text |
 | Term
frequency, both in the document and globally, often misunderstood and |
| | mistakenly
referred to as Keyword density |
 | Keyword
proximity |
 | Keyword
adjacency |
 | Keyword
sequence |
 | Alt
attributes for images |
 | Text
within NOFRAMES tags |
Pringle, et al. (Pringle et al., 1998)
, also defined a number of attributes within the HTML source of a page which were
often manipulated by web content providers attempting to rank well in search engines.
But by relying so extensively on factors that were still within the webmasters'
exclusive control, search engines continued to suffer from abuse and ranking manipulation.
In order to provide better results to their users, search engines had to adapt
to ensure their SERPs showed the most relevant search results, rather than useless
pages stuffed with numerous keywords by unscrupulous webmasters using a bait-and-switch
lure to display unrelated webpages. This led to the rise of a new kind of search
engine.
Organic search engines
Google
was started by two PhD students at Stanford University, Sergey Brin and Larry
Page, and brought a new concept to evaluating web pages. This concept, called
PageRank, has been important to the Google algorithm from the start . PageRank
relies heavily on incoming links and uses the logic that each link to a page is
a vote for that page's value. The more incoming links a page had the more "worthy"
it is. The value of each incoming link itself varies directly based on the PageRank
of the page it comes from and inversely on the number of outgoing links on that
page.
With help from PageRank, Google proved to be very good at serving
relevant results. Google became the most popular and successful search engine.
Because PageRank measured an off-site factor, Google felt it would be more difficult
to manipulate than on-page factors.
However, webmasters had already developed
link building tools and schemes to influence the Inktomi search engine. These
methods proved to be equally applicable to Google's algorithm. Many sites focused
on exchanging, buying, and selling links on a massive scale. PageRank's reliance
on the link as a vote of confidence in a page's value was undermined as many webmasters
sought to garner links purely to influence Google into sending them more traffic,
irrespective of whether the link was useful to human site visitors.
Further
complicating the situation, the default search-bracket was still to scan an entire
webpage for so-called related search-words, and a webpage containing a dictionary-type
listing would still match almost all searches (except special names) at an even
higher priority given by link-rank. Dictionary pages and link schemes could severely
skew search results.
It was time for Google -- and other search engines
-- to look at a wider range of off-site factors. There were other reasons to develop
more intelligent algorithms. The Internet was reaching a vast population of non-technical
users who were often unable to use advanced querying techniques to reach the information
they were seeking and the sheer volume and complexity of the indexed data was
vastly different from that of the early days. Search engines had to develop predictive,
semantic, linguistic and heuristic algorithms. Around the same time as the work
that led to Google, IBM had begun work on the Clever Project , and Jon
Kleinberg was developing the HITS algorithm.
A
proxy for the PageRank metric is still displayed in the Google Toolbar, but PageRank
is only one of more than 100 factors that Google considers in ranking pages.
Today,
most search engines keep their methods and ranking algorithms secret, to compete
for finding the most valuable search-results and to deter spam pages from clogging
those results. A search engine may use hundreds of factors in ranking the listings
on its SERPs; the factors themselves and the weight each carries may change continually.
Algorithms can differ widely: a webpage that ranks #1 in a particular search engine
could rank #200 in another search engine.
Much current SEO thinking on what
works and what doesn't is largely speculation and informed guesses. Some SEOs
have carried out controlled experiments to gauge the effects of different approaches
to search optimization.
Much current SEO thinking on what works and what
doesn't is largely speculation and informed guesses. Some SEOs have carried out
controlled experiments to gauge the effects of different approaches to search
optimization.
The following factors are speculation on some of the considerations
search engines may presently be using or which could be built into their algorithms.
A number of these are taken from one of Google's patent applications , and may
give some indication as to what is in the pipeline. Some are pure speculation.
It's also good to keep in mind that Google has over 180 patents and patent applications
assigned to them at the US Patent and Trademark Office
(USPTO), and a number of
those include possible insights into other factors, and other directions that
the search engine may follow, some of which may not be consistent with this list.
 | Age
of site |
 | Length
of time the domain has been registered |
 | Age
of content |
 | Frequency
of content: regularity with which new content is added |
 | Text
size: number of words above 200-250 (not affecting Google in 2005) |
 | Age
of link and reputation of linking site (authority) |
 | Standard
on-site factors |
 | Negative
scoring for on-site factors (for example, a dampening for websites with extensive
keyword meta-tags indicative of having been optimized [^SEO-ed]) |
 | Uniqueness
of content |
 | Related
terms used in content (the terms that the search engine associates as being related
to the main content of the page). |
 | Google
Pagerank (Only used in Google's algorithm) |
 | External
links, the anchor text in those external links and in the sites/pages containing
those links |
 | Citations
and research sources (indicating the content is of research quality) |
 | Stem-related
terms in the search engine's database (for example, finance/financing) |
 | Incoming
backlinks and anchor text of incoming backlinks |
 | Negative
scoring for some incoming backlinks (perhaps those coming from low value pages,
reciprocated backlinks, etc.) |
 | Rate
of acquisition of backlinks: too many too fast could indicate "unnatural"
link buying activity |
 | Text
surrounding outward links and incoming backlinks. A link following the words "Sponsored
Links" could be ignored |
 | Use
of "rel=nofollow" to suggest that the search engine should ignore the
link |
 | Depth
of document in site |
 |
Metrics collected from other sources, such as monitoring how frequently users
hit the back button when SERPs send them to a particular page |
 | Metrics
collected from sources like the Google Toolbar, Google Analytics, Google AdWords/Adsense
programs, etc. |
 | Metrics
collected in data-sharing arrangements with third parties (like providers of statistical
programs used to monitor site traffic) |
 | Rate
of removal of incoming links to the site |
 | Use
of sub-domains, use of keywords in sub-domains and volume of content on sub-domains
and negative scoring for such activity |
 | Semantic
connections of hosted documents |
 | Rate
of document addition or change |
 | IP
of hosting service and the number/quality of other sites hosted on that IP |
 | Other
affiliations of linking site with the linked site (do they share an IP? have a
common postal address on the "contact us" page?) |
 | Technical
matters like use of 301 or 302 to redirect moved pages, showing a 404 server header
rather than a 200 server header for pages that don't exist, proper use of robots.txt
|
 | Hosting
uptime |
 | Whether
the site serves different content to different categories of users (cloaking)
|
 | Broken
outgoing links not rectified promptly |
 | Unsafe
or illegal content |
 | Quality
of HTML coding, presence of coding errors |
 | Actual
click-through rates observed by the search engines for listings displayed on their
SERPs |
 | Hand
ranking by humans of the most frequently accessed SERPs |
The
relationship between SEO and the search engines
The
first mentions of Search Engine Optimization don't appear on Usenet until 1997,
a few years after the launch of the first Internet search engines. The operators
of search engines recognized quickly that some people from the webmaster community
were making efforts to rank well in their search engines, and even manipulating
the page rankings in search results. In some early search engines, such as Infoseek,
ranking first was as easy as grabbing the source code of the top-ranked page,
placing it on your website, and submitting a URL to instantly index and rank that
page.
Due to the high value and targeting of search results, there is potential
for an adversarial relationship between search engines and SEOs. In 2005, an annual
conference named AirWeb was created to discuss bridging the gap and minimizing
the sometimes damaging effects of aggressive web content providers.
Some
more aggressive site owners and SEOs generate automated sites or employ techniques
which eventually get domains banned from the search engines. Many search engine
optimization companies, which sell services, employ long-term, low-risk strategies,
and most SEO firms that do employ high-risk strategies do so on their own affiliate,
lead-generation, or content sites, instead of risking client websites.
Some
SEO companies employ aggressive techniques that get their client websites banned
from the search results. The Wall Street Journal profiled a company which allegedly
used high risk techniques and failed to disclose those risks to its clients. Wired
reported the same company sued a blogger for mentioning that they were banned.
Google's Matt Cutts later confirmed that Google did in fact ban Traffic Power
and some of its clients.
Google has enforced webpage restrictions for years,
such as for hidden-text (background and foreground colors the same hue); in 2006,
Google could punish a non-standard website by blocking search-results, automatically,
the next day for 30-35 days (or longer), pending a reinclusion request, and if
reinstated, revert the index to old/expired/deleted webpages from a year earlier,
delaying the re-indexing of the current website for a total of 2-4 months.
Yahoo!
and MSN Search do not automatically punish entire websites for small amounts of
accidental hidden text. Not surprisingly, Google's market share of daily searches
has fallen rapidly from 75% to 56% over the past few years, as other search engines
find many valuable webpages that Google has banned and cannot display due to Google's
severely limited index. In early 2006, MSN Search typically re-indexed small websites
every 14 days, and Yahoo! also re-indexed quickly, much faster than Google, but
all three MSN/Yahoo!/Google could require more than a month to index a new page
(new file name) on an old website.
Some search engines have also reached
out to the SEO industry, and are frequent sponsors and guests at SEO conferences
and seminars. In fact, with the advent of paid inclusion, some search engines
now have a vested interest in the health of the optimization community. All of
the main search engines provide information/guidelines to help with site optimization:
Google's, Yahoo!'s, MSN's and Ask.com's. Google has a Sitemaps program to help
webmasters learn if Google is having any problems indexing their website and also
provides data on Google traffic to the website. Yahoo! has SiteExplorer that provides
a way to submit your URLs for free (like MSN/Google), determine how many pages
are in the Yahoo! index and drill down on inlinks to deep pages. Yahoo! has an
Ambassador Program and Google has a program for qualifying Google Advertising
Professionals.
Getting into search engines'
listings
New sites do not need to be "submitted"
to search engines to be listed. A simple link from an established site will get
the search engines to visit the new site and begin to spider its contents. It
can take a few days or even weeks from the acquisition of a link from such an
established site for all the main search engine spiders to commence visiting and
indexing the new site.
Once the search engine has found the new site, it will
generally visit and start to index the pages on the site, as long as all the pages
are linked to with anchor tag hyperlinks. Pages which are accessible only through
Flash or Javascript links may not be findable by the spiders.
Search engine
crawlers may look at a number of different factors when crawling a site, and many
pages from a site may not be indexed by the search engines until they gain more
pagerank or links or traffic. Distance of pages from the root directory of a site
may also be a factor in whether or not pages get crawled, as well as other importance
metrics. Cho et al. (Cho et al., 1998) described some standards for those decisons
as to which pages are visited and sent by a crawler to be included in a search
engine's index.
Webmasters can instruct spiders to not index certain files
or directories through the standard robots.txt file in the root directory of the
domain. Standard practice requires a search engine to check this file upon visiting
the domain, though a search engine crawler will keep a cached copy of this file
as it visits the pages of a site, and may not update that copy as quickly as a
webmaster does. The web developer can use this feature to prevent pages such as
shopping carts or other dynamic, user-specific content from appearing in search
engine results, as well as keeping spiders from endless loops and other spider
traps.
For those search engines who have their own paid submission (like Yahoo!),
it may save some time to pay a nominal fee for submission, though Yahoo!'s paid
submission program does not guarantee inclusion in their search results.White
hat methods
White hat methods of SEO involve following the search
engines' guidelines as to what is and what isn't acceptable. Their advice generally
is to create content for the user, not the search engines; to make that content
easily accessible to their spiders; and to not try to game the system. Often,
webmasters make critical mistakes when designing or setting up their websites,
inadvertently "poisoning" them so that they will not rank well. White
hat SEOs attempt to discover and correct mistakes, such as machine-unreadable
menus, broken links, temporary redirects, or a poor navigation structure.
Because
search engines are text-centric, many of the same methods that are useful for
web accessibility are also advantageous for SEO. Methods are available for optimizing
graphical content, including ALT attributes, and adding a text caption. Even Flash
animations can be optimized by designing the page to include alternative content
in case the visitor cannot read Flash.
Some methods considered proper by
the search engines:
Using
unique and relevant title to name each page.
Editing web pages to replace vague wording with specific terminology relevant
to the subject of the page, and that the audiences that the site was developed
for will expect to see on the pages, and will search with to find the page.
Increasing the amount of unique content on the site.
Writing quality content for the website visitors instead of the search engines.
Using a reasonably-sized,
accurate description meta tag without excessive use of keywords, exclamation marks
or off topic terms.
Ensuring
that all pages are accessible via anchor tag hyperlinks, and not only via Java,
Javascript or Macromedia Flash applications or meta refresh redirection; this
can be done through the use of text-based links in site navigation and also via
a page listing all the contents of the site (a site map).
Allowing search engine spiders to crawl pages without having to accept session
IDs or cookies.
Developing
"link bait" strategies. High quality websites that offer interesting
content or novel features tend to accumulate large numbers of backlinks.
Participating in a web ring with other quality websites.
Writing useful, informational articles under a Creative Commons or other open
source license, in exchange for attribution to the author by hyperlink.
Black
hat methods
Main article: Spamdexing
"Black
hat" SEO are methods to try to improve rankings which are disapproved of
by the search engines, typically because they consider such methods deceptive,
and unrelated to providing quality content to site visitors. Search engines often
penalize sites they discover using black hat methods, by reducing their rankings
or eliminating their listings from the SERPs altogether. Such penalties are usually
applied automatically by the search engines' algorithms, because the Internet
is too large to make manual policing of websites feasible.
Spamdexing is the
promotion of irrelevant, chiefly commercial, pages through deceptive techniques
and the abuse of the search algorithms. Over time a widespread consensus has developed
in the industry as to what are and are not acceptable means of boosting one's
search engine placement and resultant traffic.
Spamdexing often gets confused
with white hat search engine optimization techniques, which do not involve deceit.
Spamming involves getting websites more exposure than they deserve for their keywords,
leading to unsatisfactory search results. Optimization involves getting websites
the rank they deserve on the most targeted keywords, leading to satisfactory search
experiences.
When discovered, search engines may take action against those
found to be using unethical SEO methods. In February 2006, Google removed both
BMW Germany and Ricoh Germany
for use of these practices.
Cloaking is the practice of serving one version
of a page to search engine spiders/bots and another version to human visitors.
SEO
and Marketing
There is a considerable sized body of practitioners
of SEO who see search engines as just another visitor to a site, and try to make
the site as accessible to those visitors as to any other who would come to the
pages. They often see the white hat/black hat dichotomy mentioned above as a false
dilemma. The focus of their work isn't primarily to rank the highest for certain
terms in search engines, but rather to help site owners fullfill the business
objectives of their sites. Indeed, ranking well for a few terms among the many
possibilities does not guarantee more sales. A successful Internet marketing campaign
may drive organic search results to pages, but it also may involve the use of
paid advertising on search engines and other pages, building high quality web
pages to engage and persuade, addressing technical issues that may keep search
engines from crawling and indexing those sites, setting up analytics programs
to enable site owners to measure their successes, and making sites accessible
and usable.
SEOs may work in-house for an organization, or as consultants,
and search engine optimization may be only part of their daily functions. Often
their education of how search engines function come from interacting and discussing
the topics on forums, through blogs, at popular conferences and seminars, and
by experimentation on their own sites. There are few college courses that cover
online marketing from an ecommerce perspective that can keep up with the changes
that the web sees on a daily basis.
While endeavoring to meet the guidelines
posted by search engines can help build a solid foundation for success on the
web, such efforts are only a start. Many see search engine marketing as a larger
umbrella under which search engine optimization fits, but it's possible that many
who focused primarily on SEO in the past are incorporating more and more marketing
ideas into their efforts, recognizing that search engines themselves have expanded
their coverage to include RSS feeds,
video search, local results, mapping, and other novel services.
Legal
issues
In 2002, SearchKing filed suit in an Oklahoma court against
the search engine Google.
SearchKing's claim was that Google's tactics to prevent spamdexing constituted
an unfair business practice. This may be compared to lawsuits which email spammers
have filed against spam-fighters, as in various cases against MAPS and other DNSBLs.
In January of 2003, the court pronounced a summary judgment in Google's favor.