One of the things that motivated me to go into SEO professionally in the first place was an interest in how search engines actually work. Like almost everyone not in Academia or working in Silicon Valley, I had always thought of search engines as something almost magical; some robot wizards must be flying around in cyberspace making all of this work.
If you're reading this, there's probably a good chance that you think of search engines the same way that I did. In this post, I'm going to tell a very basic story of how a search engine actually approaches and views a given website. Hopefully this will de-mystify part of the process, and give you a few pointers on how you can make sure that your site can be properly viewed and understood by engines.
In all of this, the key thing to remember is that search engines are run by computer programs, and not people. As a result, any site that wants to be visible in search results needs to be able to communicate with computers first and foremost.
How Google Works (Dramatization)
Spiders, Scrapers, Robots, and Draculas: The Mysterious Agents of Search Engines
On the most basic level, a search engine is nothing more than a bunch of computer programs--each with their own specific roles. These programs work together to allow the search engines to find websites, remember something about their content, and eventually rank them for visitors. One of the most surprising things about SEO and search engines is that it turns out that none of these programs are actually all that complicated. In fact, they each contribute in small, well-defined ways to a very complex whole.
Spiders are programs that go out and find new websites. They got their name because they 'crawl' the world wide 'web'. (Computer engineers really like grabbing on to puns and never letting go.)
If you've ever spent a mindless session browsing Wikipedia, you might have some sense of what a spider does: they simply follow one link to the next, and make note of their current address. In the same way that you might mindlessly click on Wikipedia links and find yourself re-visiting the second season of Family Matters without ever actually learning or remembering anything along the way, a modern spider's main role is just to follow the links from one page to the next, making small notes about content, and moving on.
Spiders and You: Not as Terrifying Online as in Real Life
Spiders help tell search engines what's 'out there' on the internet, so if you're a site owner, it's imperative that they can find everything you want to share on your site. Spiders aren't that bright, and they're pretty lazy, so it's up to you to make sure that they can find everything easily and quickly.
You can make this easier on them by doing things like making sure that the pages on your site link to one another wherever possible. If the spider can't move from one page to another, it will get stuck and give up.
Scrapers & Bots
Continuing in a parade of search engine components with off-putting names, scrapers (also known as bots or robots) are more complicated and specialized programs that help search engines understand what your site is really about. In contrast to their frenetic cousins (the Spiders), Scrapers are much more thorough. They don't just click on a link because it's there; they take the time to read through the content and get an idea of what's there. In other words, they aren't content to learn the names of the episodes of Season Two of Family Matters, they actually take the time to remember "The Crash Course", the episode where Eddie crashed into the front of the Winslow house (Carl was NOT happy about this).
After reviewing the content of your site, scrapers will try to categorize what the page is about, allowing the search engine to know when it might be a helpful result for users. They also slice up the text on your site and save a copy for further review. Finally, they'll try to make connections between your site and what the search engine already knows about the world by looking for similar pages and content. While they aren't directly involved in ranking pages for search results, they are perhaps the most important connection between a site and a search engine.
(editor's note: We aren't really sure why Tim is so obsessed with Family Matters. For the record, we asked him to tone it down for this post, but the alternative was a surprisingly-detailed and strangely well-researched essay about the 1996 intergalactic Looney Tunes-basketball family extravaganza 'Space Jam'. It did not make any sense to anyone.)
Getting Your Site Scraped Right
If you are interested in attracting traffic from search engines, you need to make sure that the scrapers can easily read your site. Because scrapers work almost entirely with actual text, it's important to make sure that your content does not appear as images. It's also important to keep in mind that the scrapers don't view all content on your site as the same: they are particularly interested in things like title and heading tags, as well as bolded text.
Additionally, signing up for tools like Google or Bing Webmaster tools will help alert you to any potential problems that the robots are having while looking at your site.
Draculas aren't actually a thing at all (in search engines or real life). But it sounded cool, and anyway, Rule of Threes, right?
Updated 2/24/2012: Are you interested in 'what' a search engine sees on your site? I wrote a follow-up post addressing that exact question on this blog.