Friday, March 24, 2006

DMOZ is dead

Let's face it. The internet is a very big place. Long ago in a place called "1996," directories attempted to organize the resources of the WWW into a directory structure. It worked, kind of. That was then. This is now.

The problem is that they're still used as authoratative resources for search engines, and many of them are volunteer-driven. DMOZ.org is one of these. There are thousands of people complaining daily that their submissions never get evaluated. I'm one of them.

Furthermore, DMOZ.org makes things even harder by disabling your account as an editor after 90 days of inactivity. This is volunteering. If you log in more than that, I question your motives.

And therein lies the the last problem. DMOZ attempts to make you claim your affiliations. I'm sure most editors do not. And I'm sure that in the most competitive areas (law, casinos, etc.). the majority of the editors are editing the categories, at least subtly, in a bad way. I've even heard stories of sites getting deleted by competitors.

So what makes you think that's not why your site isn't getting added?

So DMOZ is busy disabling the accounts of the volunteers, while more and more cronies are controlling the most influential directory around. Oh yeah, and Zeal.com just closed shop. Bye-bye Zeal. There goes another quality directory.

I'm just another SEM waiting for a site to get added for the company I'm working for: http://www.seegerweiss.com.

Personally, I think directories are so 1996 ...

Tuesday, March 21, 2006

Copywriting for queries, not English

Another major issue these days becomes obvious when one realizes that popular queries aren't requisitely English.

For example, when a user is looking for a "Vioxx attorney," one must consider whether this is actually English. The answer is yes, and no. Ask yourself a simple question. "Would I actually call up a law firm and say: 'Hello, do you have a Vioxx attorney?'"

Maybe.

But I'd be more likely to phrase the question "Hello, do you have attorneys who handle Vioxx?"

(BORING ENGLISH GRAMMARIAN RANT)

English is a funny language. Most of the words are derived from the romance languages (Western Europe), but the grammar is from the germanic languages (unsurprisingly German is one of these). One of the neat things that Germanic languages permit is the concept of the "construct noun." Semitic languages (Hebrew, Arabic, etc.) also have this concept.

English is also highly erratic.

So users search for "Vioxx lawyers." They also search for "MP3 player -sony" to find music players that don't pay homage to the RIAA. Just because someone types a bunch of keywords in to a query box, and it accidentally forms a valid language construction, doesn't mean it's (desirable) English. You can say it once maybe and get away with it. But after that it starts to look spammy and unprofessional.

In Spanish (and other romance languages), we would say (translated) "bus of school," not "school bus." This sounds silly in English, but that is not always the case.

Consider 2 of my favorite salad items.

1) Hearts of Palm
2) Artichoke Hearts

Notice that the first is in a "romance construction" whereas the second is in a "germanic construction."

(/BORING ENGLISH GRAMMARIAN RANT)

So why is this relevent?

Nobody says "Vioxx Lawyer!" Saying it more than once in your copy will look stupid. And, yes, keyword density counts. Not to the extent that it's worth measuring and calculating, but you'd be wise to include it 3-4 times in your copy in various inflections (tenses, plural/singular, etc).

So here is how to do it:

We construct 2 sentences (see: http://www.lawyerseek.com/Practice/Pharmaceutical-Injury-C1/Duragesic-P64/) --

Contact us regarding Duragesic; attorney consultations are free

... and hope for the best.

Lousy copy scares clients away. This won't. English is a beautiful but highly erratic language. When you copywrite, this is important to consider.

Monday, March 20, 2006

The "real" search engine optimization (internal site search)

One thing that has been bothering me for years in this industry is the insistance of lazy site programmers that site search apparently doesn't matter. Numerous studies indicate that people prefer search, not navigation when looking for information, at least from the macroscopic point of view.

Once there, they use navigation, but search cannot be ignored. We can see this in action by looking at several prominent sites that have garbage search engines --

CDW
All Invision Power Board sites.

Try it for something non-trivial. People expect it to work. And it usually doesn't.

I'm also amazed how many programmers declare defeat so easily and just use Google search for internal search results. This is undesirable for several reasons. Among them, it's unprofessional, but there's another reason. Google doesn't understand the structure of your site as well as you do. Perhaps you'd like to seperate or filter out news from the rest of your content, or you'd like model numbers with dashes (Google chokes on these sometimes) to pull up products easily. For this, you need your own search engine.

External search engines, even those that you install on your own server, have many of the same issues. Some also seem to index the ads on their pages. CNN.com seems to do this because they use Yahoo, and Yahoo cannot filter out ad content reliably (see above).

The proper approach in my opinion, and the one that provides the most control is a fulltext search engine -- either the one that comes with mySQL, or the one that comes this postgreSQL. I cannot speak for the Microsoft people, but I'm sure there is an equivalent solution. This allows you to select exactly what is indexed, what isn't, and gives you control over how it is ranked to a certain degree. Sometimes this requires a bit of ingenuity, but it's all possible.

I have implemented:
Weighed fields: Title counts 5x as much as the body.
Stemming: ex. suicide = suicides = suicidal
Copy highlighting: The relevent text gets highlighted in your search results.

... among other things.

You can see the fruits of labor on Lawyer Seek.

Example queries:
http://www.lawyerseek.com/Site-Search.html?action=show_list&query=vioxx+heart+attack&x=0&y=0
First result -> http://www.lawyerseek.com/Practice/Pharmaceutical-Injury-C1/Vioxx-P57/

http://www.lawyerseek.com/Site-Search.html?action=show_list&query=ortho+evra+lawyer&x=0&y=0
First result -> http://www.lawyerseek.com/Practice/Pharmaceutical-Injury-C1/Ortho-Evra-Patch-P48/

People like things that work. If your search engine doesn't work, it will yield false negatives. That's worse than not having one at all in my opinion.

Thursday, March 16, 2006

Old School Table Bloat

I'm old school. I still use tables. I admit it, but every once in awhile, the bloat gets out of control. Take the navigation on a new site I'm designing. There is a navigation strip on the right side with links to all the sections of the site. Alltogether there are 120 of them. Each link used to be in a table. I replaced it with divs and some CSS.

This is what it looked like before:

<div class="tbgs" onclick='_g("http://www.lawyerseek.com/Practice/Pharmaceutical-Injury-C1/Vioxx-P57/");'><b class="tbgsb"> | </b><a href="http://www.lawyerseek.com/Practice/Pharmaceutical-Injury-C1/Vioxx-P57/" class="jbx" title="Information regarding Vioxx litigation"><b>Vioxx</b></a></div>


This is what it looked like afterwards:

<div class="tbgs" onclick='_g("/Practice/Pharmaceutical-Injury-C1/Vioxx-P57/");'><b class="tbgsb"> | </b><a href="/Practice/Pharmaceutical-Injury-C1/Vioxx-P57/" class="jbx" title="Information regarding Vioxx litigation"><b>Vioxx>/b></a></div>


That doesn't look like much until you iterate 130 times in a navigation bar. To remove the "http://www.lawyerseek.com" from every link also saved about 6k and reduced load time.

Interestingly enough, this is the same principle I use when programming. When programming, I always look at the loops. If your PHP is iterating 130 times, you may also want to remove tabs and spaces in the ?> Moral of the story: Tables are bloaty, but one can achieve most of the benefits by replacing tables where they're the most troublesome.

In terms of marketing, load times can actually have a real effect. People looking for a lawyer might just like the site better when it takes fewer seconds to load.