Sunday, May 27, 2007

Robots Deliver

We’re going to start with the basics of how the search engines work, and a major component of this is a robot, or spider, which is software that slurps up your site’s text and brings it back to be analyzed by a powerful central "engine." This activity is referred to as crawling or spidering. There are lots of different metaphors for how robots work, but we think ants make the best one. Think of a search engine robot as an explorer ant, leaving the colony with one thought on its mind: Find food. In this case, the "food" is HTML text, preferably lots of it, and to find it, the ant needs to travel along easy, obstacle-free paths: HTML links. Following these paths, the ant (search engine robot), with insect-like single-mindedness, carries the food (text) back to its colony and stores it in its anthill (search engine database). Thousands and thousands of the little guys are exploring and gathering simultaneously all over the Internet. If a path is absent or blocked, the ant gives up and goes somewhere else. If there’s no food, the ant brings nothing back.

So basically, when you think of a search engine, you really need to think of a database that holds pieces of text that have been gathered from millions of sites all over the web.


What sets that engine in motion? A search. When a web surfer enters the term "grape bubble gum" into the search engine, all of the sites that might be relevant for that term are brought to the forefront. The search engine sifts through its database for sites containing terms like "grape growers," "stock market bubble," and "gum disease." It uses a secret formula—a.k.a. search ranking algorithm—to sort the results, and in a fraction of a second, a list of relevant sites, many containing the exact phrase "grape bubble gum," will be returned in the results page.


There are lots of things that factor into the way robot search engines determine the rank for their main search results. But, just for a start, in order to be in the running for ranks, you need to provide HTML text to feed the search engines and HTML links as clear paths to the food. Keeping those robots well-fed and happy is going to be one of the biggest priorities