Wikipedia 162, Brittanica 123

From Good Morning, Silicon Valley

A study published in the journal Nature Wednesday found that in a random sample of 42 science entries, the collaborative encyclopedia averaged four inaccuracies to Britannica’s three. “Only eight serious errors, such as misinterpretations of important concepts, were detected in the pairs of articles reviewed, four from each encyclopedia,” reported Nature. “But reviewers also found many factual errors, omissions or misleading statements: 162 and 123 in Wikipedia and Britannica, respectively.”

Not bad for a reference work whose open nature allows for inaccuracy, opinion and outright vandalism… Certainly, it’s testament to the innovative power of Wikipedia. “People will find it shocking to see how many errors there are in Britannica,” Michael Twidale, an information scientist at the University of Illinois at Urbana-Champaign, told Nature. “Print encyclopedias are often set up as the gold standards of information quality against which the failings of faster or cheaper resources can be compared. These findings remind us that we have an 18-carat standard, not a 24-carat one.” Editors at Brittanica wouldn’t comment on the flaws in their work, but had no trouble sounding off about those in Wikipedia. “We have nothing against Wikipedia,” said Tom Panelas, director of corporate communications at the company’s headquarters in Chicago. “But it is not the case that errors creep in on an occasional basis or that a couple of articles are poorly written. There are lots of articles in that condition. They need a good editor.”

The Net and the future of newspapers

Typically thoughtful post by Scott Rosenberg on the way the Internet is affecting newspapers. Excerpt:

The newspapers I grew up loving and that I worked for during the first half of my career represent a model that we’ve taken for granted because it’s had such longevity. But there’s nothing god-given or force-of-nature-like to the shape of their product or business; it’s simply an artifact of history that you could roll together a bundle of disparate information — news reports, stock prices, sports scores, display ads, reviews, classified ads, crossword puzzles and so on — sell it to readers, and make money.

Today that bundle has already fallen apart on the content side: there’s simply no reason for newspapers to publish stock prices, for instance; it’s a practice that will simply disappear over the next few years — it’s sheer tree slaughter. On the business side, it is beginning to fall apart, too. It just makes way more sense to do classified advertising online. And it’s cheaper, too, thanks to Craigslist, the little community (I am proud to have been a subscriber to Craig Newmark’s original mailing list on the Well back in 1994 or 1995 or whenever it was) that turned into a big deal.

Wikilaw launched

Main page here. Its goal is “to build the largest open-content legal resource in the world”. It claims there are “roughly 1,000,000 lawyers in the United States”. Pardon me while I lie down in a darkened room. It’s the thought of all those lawyers laid end to end.

I love the story about Sam Johnson and James Boswell walking together down a street behind another chap. The great Doctor pulled Boswell aside and whispered, “I don’t wish to speak ill of any other person, but I believe that man is an attorney”.

The Alexa story

John Battelle, author of an excellent book on search, has a hyperbolic post on his Blog. It begins like this…

Every so often an idea comes along that has the potential to change the game. When it does, you find yourself saying – “Sheesh, of course that was going to happen. Why didn’t I predict it?” Well, I didn’t predict this happening, but here it is, happening anyway.

In short, Alexa, an Amazon-owned search company started by Bruce Gilliat and Brewster Kahle (and the spider that fuels the Internet Archive), is going to offer its index up to anyone who wants it. Alexa has about 5 billion documents in its index – about 100 terabytes of data. It’s best known for its toolbar-based traffic and site stats, which are much debated and, regardless, much used across the web.

OK, step back, and think about that. Anyone can use Alexa’s index, to build anything. But wait, there’s more. Much more…

It’s all done with web services. And it might indeed be significant because it could enable small but ingenious players to get into the search market.

Wikipedia and QA

I’ve been following the arguments about the quality of Wikipedia entries and came on this thoughtful post by Ethan Zuckerman. Excerpt:

When I use Wikipedia to research technical topics, I generally have a positive experience, frequently finding information I would be unlikely to find in any other context, generally resolving my technical questions – “How does the GSM cellphone standard work?” with a single search. When I use Wikipedia to obtain information that I could find in a conventional encyclopedia, I often have a terrible experience, encountering articles that are unsatisfying at best and useless at worst. Generally, these experiences result from a search where I already know a little about a topic and am looking for additional, specific information, usually when I’m researching a city or a nation to provide context for a blog entry. My current operating hypothesis? Wikipedia is a fantastic reference work for stuff that doesn’t exist in other reference works, and a lousy knock-off of existing works when they do exist.

Old media and the Net

The most interesting question is not whether Friends Reunited will save ITV, but if ITV will destroy Friends Reunited. That depends on the extent to which Allen and his management team leave their acquisition alone.

Television people are constitutionally incapable of dealing with the web because they have been socially and professionally conditioned in the world of ‘push’ media with its attendant control freakery and inbuilt assumptions about the passivity and stupidity of audiences. Very little of their experience or skills are useful in a ‘pull’ medium like the web, where the consumer is active, fickle and informed, and history to date suggests that if they are put in charge of internet operations they screw up.

My guess is that Allen & Co will not be able to resist the temptation to meddle with their new toy…

The Flickr phenomenon

This morning’s Observer column

Virtually every Tom, Dick and Harry has a digital camera. And if he doesn’t, there’s probably one in his mobile phone. Which raises an interesting question: what are people doing with all these cameras? The answer: snapping everything that moves, and much that doesn’t.

But then what? At this point, options begin to narrow. You can take the storage card into Jessops, push it into a slot and pay to have your photos printed. You can upload them to your computer and view them on screen in tasteful little slideshows, perhaps to the accompaniment of a track from your music library.

You can buy an inkjet printer, pay through the nose for paper and ink cartridges, and print them out. Or you can upload them to a printing service like Ofoto or Fotango, have them deduct money from your credit card and send back nice prints on proper photographic paper.

Alternatively you can put them on Flickr (www.flickr.com). If you don’t know about Flickr, it’s time you did…

Posted in Web

The Web: bigger than we know. Bigger than we can know?

From Search Engine Watch

A new survey has made an attempt to measure how much information exists outside of the search engines’ reach. The company behind the survey is also offering up a solution for those who want tap into this “hidden” material.

The study, conducted by search company BrightPlanet, estimates that the inaccessible part of the web is about 500 times larger than what search engines already provide access to. To put that another way, Google currently claims to have indexed or know about 1 billion web pages, making it the largest crawler-based search engine, based on reported numbers. Using Google as a benchmark, that means BrightPlanet would estimate there are about 500 billion pages of information available on the web, and only 1/500 of that information can be reached via traditional search engines.

Hmmm… That was written in 2000. When it stopped bragging about the number of pages it had indexed, Google was claiming over 8 billion. Let me see, that’s 8 billion by 500, er 4,000 billion pages. Pardon me while I go and lie down in a darkened room. I wonder if Tim Berners-Lee realised what kind of monster he was unleashing when he dreamed up the Web.

Posted in Web