Monday, March 20, 2006

The "real" search engine optimization (internal site search)

One thing that has been bothering me for years in this industry is the insistance of lazy site programmers that site search apparently doesn't matter. Numerous studies indicate that people prefer search, not navigation when looking for information, at least from the macroscopic point of view.

Once there, they use navigation, but search cannot be ignored. We can see this in action by looking at several prominent sites that have garbage search engines --

All Invision Power Board sites.

Try it for something non-trivial. People expect it to work. And it usually doesn't.

I'm also amazed how many programmers declare defeat so easily and just use Google search for internal search results. This is undesirable for several reasons. Among them, it's unprofessional, but there's another reason. Google doesn't understand the structure of your site as well as you do. Perhaps you'd like to seperate or filter out news from the rest of your content, or you'd like model numbers with dashes (Google chokes on these sometimes) to pull up products easily. For this, you need your own search engine.

External search engines, even those that you install on your own server, have many of the same issues. Some also seem to index the ads on their pages. seems to do this because they use Yahoo, and Yahoo cannot filter out ad content reliably (see above).

The proper approach in my opinion, and the one that provides the most control is a fulltext search engine -- either the one that comes with mySQL, or the one that comes this postgreSQL. I cannot speak for the Microsoft people, but I'm sure there is an equivalent solution. This allows you to select exactly what is indexed, what isn't, and gives you control over how it is ranked to a certain degree. Sometimes this requires a bit of ingenuity, but it's all possible.

I have implemented:
Weighed fields: Title counts 5x as much as the body.
Stemming: ex. suicide = suicides = suicidal
Copy highlighting: The relevent text gets highlighted in your search results.

... among other things.

You can see the fruits of labor on Lawyer Seek.

Example queries:
First result ->
First result ->

People like things that work. If your search engine doesn't work, it will yield false negatives. That's worse than not having one at all in my opinion.


