How the Wayback Machine works.

The Internet Archive made headlines back in November with the release of the Internet Archive’s Wayback Machine, a Web interface to the Archive’s five-year, 100-terabyte collection of Web pages. The archive is the result of the efforts of its director, Brewster Kahle, to capture the ephemeral pages of the Web and store them in a publicly accessible library. In addition to the other millions of web pages you can find in the Wayback Machine, it has direct pointers to some of the pioneer sites from the early days of the Web, including the NCSA What’s New page, The Trojan Room Coffee Pot, and Feed magazine.

There was a nice profile of Susan Sontag in last Saturday’s Guardian, which quotes a magnificent sneering attack on her by Scott McLemee in his Washington Post review of her latest book.

“Her manner now is virtually indistinguishable from that of George Steiner in his lugubrious moments as Last Intellectual, striking that solemn pose as embodiment of high seriousness – perched atop the Nintendo ruins of western civilization”.

“Software that can detect when people are lying in their e-mails sounds a bit far-fetched, but its manufacturers declare it is true.” Oh yeah! The FT is reporting that “SAS Institute, which makes fraud-detection systems for banks and phone companies, will on Monday announce a product that can sift through e-mails and other electronic text to catch elusive nuances such as tone.”

Useful piece reassessing online advertising.

“Until now most ads were evaluated by the number of click-throughs, but the Interactive Advertising Bureau (I.A.B.) has thought of ways to change that. It announced the creation of the Ad Campaign Measurement and Audit Guidelines to help bring uniformity to online advertising. Consultants from PricewaterhouseCoopers developed the guidelines for the I.A.B., and encourage the measurement of Internet branding through a combination of ad impressions, clicks, page impressions, total visits and unique visits…”

More trouble for copy-protected CDs. The Register reports that Phillips, the inventor of the CD format, is demanding that crippled CDs must be clearly labelled as such. The report claims that Phillips is insisting that that CDs including anti-copying technology should bear what is effectively a plague warning. They should in Philips’ view clearly inform users that they are copy-protected, and they shouldn’t use the “Compact Disc” logo because they are not, in Philips’ considered view, proper compact discs at all.

So Microsoft is finally conceding that the security holes in its software constitute a real problem. According to this FT.com article, Bill Gates has twigged that the company’s fetish for feature bloat has led to inattention to security holes. Quotes:

“Bill Gates, the company’s chairman, ordered the month-long production stoppage earlier this week as part of an effort to overhaul the way Microsoft creates its software. He has decided that it is time to change the processes that the company’s 7,000 software engineers follow to create Microsoft’s products.

Mr Gates believes they have paid too much attention to the special features that distinguish each new generation of software. They have paid too little heed to something far more fundamental: computer security. “They consistently put features ahead of security,” says Bruce Schneier, chief technology officer of Counterpane Internet Security, voicing a widespread complaint.

Mr Gates is now promising a fundamental change. For a month, Microsoft engineers will sift through existing software to try to find security loopholes that had been missed before. They will all undertake new training to help them catch the glitches before they arise. “

Better late than never.

My Observer column was ‘bumped’ by a last-minute advertisement — a full page taken by British Airways. (Well, given the advertising recession and the fact that the group’s revenues are running £30 million lower than they were this time last year and the management is seeking 35 redundancies from the full-time staff, perhaps the priorities are understandable.) If the piece had been published, this is what it would have said…

One of the more interesting services provided by Google, the search engine, is its ‘Zeitgeist’ feature. This maintains a record of the most popular inquiries over given periods of time and provides a fascinating window onto what Jung called the Collective Unconscious. The top 20 list of inquiries for 2001 include some of the usual suspects — for example, World Trade Center, Harry Potter, anthrax, Osama bin Laden and Lord of the Rings. But the most interesting thing is that none of these contemporary preoccupations occupies the top two spots on the Zeitgeist list. It turns out that the most popular search term in Google last year was ‘Nostradamus’, closely followed by ‘CNN’ in second place.

We will gloss over the obsession with Nostradamus which, like the Peace of God, passeth all understanding. But why should ‘CNN’ be the second most popular inquiry? Why don’t people just type ‘www.cnn.com’ (or even ‘cnn.com’) into the address box on their browsers? I’ve asked this question before and received numerous answers, many containing uncomplimentary observations about the laziness or collective IQ of the population of the United States. Entertaining though they are, these explanations miss the point — which is that the phenomenon tells us more about Google than about its users.

How come? Well, it suggests that users have concluded that the chances of Google finding a URL in one go are higher than the chances of them guessing it correctly. And this in turn implies popular recognition of what is by now conventional wisdom in the technical community — namely that in its relatively short existence Google has developed into an astonishingly powerful service. It is currently claiming to have indexed over two billion Web pages, now owns and indexes a huge archive of Usenet groups and provides specialised searches for images and pdf (Adobe portable document format) files. And despite all of this Google is still incredibly fast — not just a finding static documents, but also at indexing new content.

Google’s technological supremacy has some interesting side effects. One of them is to take the steam out of the frenzy to register domain names. Two years ago, companies were frantic to register names that were functional or relevant to their business activities — and being driven frantic by the discovery that virtually every conceivable name had already been snapped up. (The story of how the Halifax’s online banking operation ‘IF’ managed to acquire ‘www.if.com’ for a huge sum at the very last minute would make a nice feature film.) One has the suspicion that some companies even changed their names in order to match domain names that they had been able to register. Why else would the Post Office have hit on ‘Consignia’, for example? Or the former Andersen Consulting on Accenture?

What drove the frenzy was the fear that a company without a memorable domain name would effectively be invisible in cyberspace. But if Google can find you no matter how obscure your URL is, then there are fewer grounds for panicking. Of course itÅs still nicer to be able to put ‘www.if.com’ on business cards rather than ‘www.halifax-bank-uk-online.com’, but at least you have the consolation of knowing that customers will still be able to find you.

The flip side of Google’s increasing power is increased responsibility. The service has become the most indispensable tool of most Internet users and its rankings of web pages have become de facto arbiters of significance in cyberspace. Yet Google operates in an unregulated space, and wields unprecedented influence within it. We know from bitter experience — c.f. Microsoft — that absolute power corrupts absolutely. Who is going to make sure that Google stays honest? Quis custodiet and all that. (Oh and Google found 4420 pages on that phrase in 0.17 seconds.)

–ends–

Just watched a short item on Channel 4 about a chap who has invented an underwater bicycle and complains that nobody in the UK will take him seriously.