dmoz.org, open web page index and MozDex, open search engine

In addition to Wikipedia, a number of similar projects also deserve mention. Older than Wikipedia and Nupedia is the dmoz.org Open Directory Project, which is an Internet link directory.1 The first and still most popular similar (closed) directory is an icon of the Internet, Yahoo!.2

In 1998 it looked like automated search engines had come to the end of the road. The then most popular search engine, AltaVista, couldn't cope with the explosive increase in web pages, rendering the search results more and more useless. The sheer number of web pages made it hard to find what you were looking for and the situation was made even worse by various advertising companies who had learnt how to abuse the key word directories of web pages to get to the top of the search results lists. The situation resembled the one we now have with e-mails: whatever one did, no matter how proper were the words used in the search, the top of the search results contained pornography and - well, mostly pornography, because Viagra wasn't known then.3

That meant that the future seemed to be more and more dependent on directories made by human effort, such as Yahoo!. This was the niche that Chris Tolles and Rich Skrenta came upon. To them it was clear that gathering the fast-growing Web into any sort of directory cried out for an Open Source approach. So, in June 1998 they founded a project to do just that, and called it GnuHoo - it's a small world, isn't it - or open Yahoo!. Since it wasn't an official Gnu project, they did as Richard Stallman requested and changed the name to NewHoo. Later on, Netscape, which was one of Yahoo!'s competitors, dropped its own directory project and bought NewHoo to be the basis of its own portal and made it the Open Directory. It finally found a home at dmoz.org.

Dmoz, or NewHoo, was a success from the start. In the first month alone it had scored some 31,000 links, which 400 volunteer editors had organized into 3,900 categories. Only a week after that there were 1,200 editors and 40,000 links!4

In the same year, 1998, AltaVista lost its position as the leading automated search engine and was slowly forgotten as the lead was taken over by a newcomer, Google, which thanks to a highly-developed PageRank algorithm could once again make some sense in the order of the search results. That also tipped the balance in the competition between automated search engines and edited ones in favour of the automated ones. However, it is worth noting that Google and many other search engines use dmoz as one source of information in the creation of their own database. So, whenever we use Google, we are in many ways using Open Source. First, the Google servers are based on Linux and Open Source code; and second, the Google search engine uses an Open Source source of information.5

Though the edited directories lost the fight to Google, the Open Source community didn't give up. In April 2004 a new contender joined the search engine competition: MozDex.6

MozDex is a perfectly open search engine. Not only is it based on Open Source code, but the intention is also for it to be open in the presentation of search results. How do we know that the search result Google lists first is really the best? Can we be sure there isn't some Google employee who has fiddled with the database or that they haven't sold the number one spot to whoever pays most? Even though we do trust Google, we cannot be completely certain. MozDex aims to provide search results that offer all users the opportunity to check why the links which come highest on the list really are the most relevant. Next to each search result, there is an (explain) link, which allows you to see on what basis the given search result has scored higher than the other pages in the database.

In the words of the programmer of the MozDex search engine Doug Cutting, "The number of Web search engines is decreasing. Today's oligopoly could soon become a monopoly, with a single company controlling nearly all Web search for its commercial gain. That would not be good for users of the Internet.'

At the time of writing MozDex is still an experiment and its database doesn't yet contain all the web pages on the Internet. But April 9 2004 will remain in history as the day Open Source joined the search engine competition. Google won the first round, but will Open Source make a comeback? That remains to be seen.

  1. 1. http://dmoz.org/
  2. 2. http://www.yahoo.com/
  3. 3. Viagra didn't hit the market until 1998 (http://en.wikipedia.org/wiki/Viagra) and at least in the spam that fills my inbox, Viagra ads are the largest single group.
  4. 4. Wide Open News, 12.6.1999: "License to search'. http://web.archive.org/web/20011108043741/www.wideopen.com/story/224-2.html
  5. 5. While Google servers are based on Linux and Open Source code, the Google code itself is top secret.
  6. 6. http://www.mozdex.com/

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
Anonymous friend's picture

I believe Open Source will

I believe Open Source will definitely make a comeback, there is a demand for it and sooner or later the Internet will respond to this demand. It's impressive how much things evolved on the internet in the last decade, it's difficult to have accurate predictions on internet in the future decades.
Sally, internet reputation management

Anonymous friend's picture

It has been a long road

It has been a long road getting the internet search accurate and on track and I would have to say that google is light years ahead of the competition. I do not see Bing and Yahoo getting close, even with all of the advertising they have been doing.

Jerry

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Post new comment

  • Use [fn]...[/fn] (or <fn>...</fn>) to insert automatically numbered footnotes.
  • Allowed HTML tags: <h1> <h2> <h3> <h4> <p> <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd> <br> <sup> <div> <blockquote> <pre> <img>
  • Lines and paragraphs break automatically.
  • Web page addresses and e-mail addresses turn into links automatically. (Better URL filter.)
CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.