Debugging Vista and Office

John Markoff has an interesting piece in the New York Times about the frenzied efforts within Microsoft to get Vista ready to ship. The article has some interesting statistics. For example, it quotes the Gartner Group as claiming that Windows runs on 845 million computers worldwide and Office on more than 450 million.

Markoff claims that

it was the vast scale of the Windows testing program that saved the software development projects. Over the summer, the company began an extraordinary bug-tracking effort, abetted by volunteers and corporate partners who ran free copies of both Windows and Office designed to send data detailing each crash back to Microsoft computers.

The Office package, for example, has been tested by more than 3.5 million users; last month alone, more than 700,000 PC’s were running the software, generating more than 46 million separate work sessions. At Microsoft, 53,000 employee computers are running test versions.

Vista has also been tested extensively. More than half a million computer users have installed Vista test software, and 450,000 of the systems have sent crash data back to Microsoft.

One interesting question is how Microsoft manages to cope with the torrent of data that comes back from all these test versions. Markoff says that The test data from the second beta release of Vista alone generated 5.5 petabytes of information — “the equivalent of the storage capacity of 690,000 home PC’s”.

Later… James Miller points out that the last calculation — which is the work of the New York Times, not me — implies that the average home PC has a hard disk of 8GB capacity, which seems implausible. If we assume that the average PC now has 80GB of hard disk space, then 5.5 petabytes equates to 68,500 PCs. Looks like a decimal point went missing somewhere.

Ofcom’s turbulent future

This morning’s Observer column

So Ed Richards has inherited the earth, and all the media beasts therein. On the face of it, the new chief executive of Ofcom has an imposing remit. He is charged by the 2003 Communications Act (of which, as the Prime Minister’s chief policy wonk on such matters, he was undoubtedly the prime author) ‘to further the interests of citizens in relation to communications matters’, and ‘to further the interests of consumers in relevant markets, where appropriate by promoting competition’. This means he is the regulator of the electromagnetic spectrum, telecoms companies, internet service providers – and TV and radio broadcasters (except the BBC).

…Looks impressive, doesn’t it? My advice to Richards is to enjoy it while it lasts. His Ofcom empire is built on sand, and the tide is coming in…

Later… Bill Thompson wrote a nice pseudo-retrospective essay about OFCOM’s dim future.

Random thoughts

Lovely piece by Steven Levy about the randomness of the iPod shuffle algorithm.

My first iPod loved Steely Dan. So do I. But not as much as my iPod did. By 2003, among the 3,000 or so songs in my iTunes library, I had about 50 Steely Dan tracks. Yet every time I shuffled my music collection “randomly” to mix the tunes, it seemed that the Dan was weirdly over-represented. Only two or three songs after Rikki Don’t Lose That Number, I’d hear Kid Charlemagne. Then, 20 minutes later, there would be Pretzel Logic. Where was the logic in this? I didn’t keep track of every song that played every time I shuffled my tunes, but after a while I would keep a sharp ear out for what I came to call the LTBSD (Length of Time Before Steely Dan) Factor. The LTBSD Factor was always perplexingly short…

This is one of those maddening articles in which someone writes about a topic that one had thought of covering, but didn’t. I use the shuffle facility on my iPod a lot, and often wondered if it was giving truly random results. But I didn’t take the logical next step and do some digging. Levy did, which is what makes him such a good journalist.

It turns out that this is an excerpt from his forthcoming book about the iPod phenomenon (Ebury Press, November 2, according to the Guardian). If it’s anything like as good as his book on the history of the Apple Mac, it’ll be worth queueing for.

Algorithmic ‘integrity’

From Rough Type: Nicholas Carr’s Blog

Last week, CNET’s Elinor Mills reported on how a web search for “Martin Luther King” returns, as its first result on Google and as its second result on Windows Live Search, a web site (martinlutherking.org) operated by a white supremacist organization named Stormfront. The site, titled “Martin Luther King Jr.: A True Historical Examination,” refers to King as “The Beast” and says he was “just a sexual degenerate, an America-hating Communist, and a criminal betrayer of even the interests of his own people.” The site also features an essay on “Jews & Civil Rights” by former Ku Klux Klan official David Duke.

What’s remarkable, though, is not that a search algorithm might be gamed by extremists but that the owners of the algorithm might themselves defend the offensive result – and reject any attempt to override it as an assault on the “integrity” of their system….

Carr goes on to quote Google’s response to the CNET story:

At Google, a Web site’s ranking is determined by computer algorithms using thousands of factors to calculate a page’s relevance to any given query, a company representative said. The company can’t tweak the results because of that automation and the need to maintain the integrity of the results, she said. “In this particular example, the page is relevant to the query and many people have linked to it, giving it more PageRank than some of the other pages. These two factors contribute to its ranking.”

Microsoft’s response was even more robust:

The results on Microsoft’s search engine are “not an endorsement, in any way, of the viewpoints held by the owners of that content,” said Justin Osmer, senior product manager for Windows Live Search. “The ranking of our results is done in an automated manner through our algorithm which can sometimes lead to unexpected results,” he said. “We always work to maintain the integrity of our results to ensure that they are not editorialized.”

To which Carr tartly responds:

By “editorialized” he seems to mean “subjected to the exercise of human judgment.” And human judgment, it seems, is an unfit substitute for the mindless, automated calculations of an algorithm. We are not worthy to question the machine we have made. It is so pure that even its corruption is a sign of its integrity.

Googleplex working overtime?

From Good Morning Silicon Valley

What kind of a sweatshop are they running over there at Google? Just this week, the elves in the trenches have made Google Gadgets available for addition to any site, launched its Literacy Project in conjunction with LitCam and UNESCO, were apparently caught teaming up with Apple on a possible hookup of Google Maps with iPhoto, added another new batch of imagery to Google Earth, and now have released Code Search, a tool for programmers to dig through publicly available source code. (From the Unintended Consequences Department, the search engine also lets you find things like serial number generating algorithms.) That’s just this week, and it still has a day to go. Gee, I hope they’re paying those folks enough.

But, later, there’s this

The L.A. Times reports company execs have launched an initiative called “Features, Not Products,” telling engineers to stop launching so many services and focus on making the existing ones work together better. Co-founder Sergey Brin said it occurred to him this summer as he scanned the 50 or so products available across the company’s Web sites that users were probably getting overwhelmed. “It’s worse than that,” he said. “It’s that I was getting lost in the sheer volume of the products that we were releasing.” Simplicity was among the things that made Google so popular, and its success led it to snap up hundreds of smart, ambitious software engineers. “The result occurred precisely because we told these incredible engineering teams to run as fast as possible to solve new problems,” said Chief Executive Eric Schmidt. “But then that created this other problem.” Analyst Rob Enderle puts it another way: “They created a bunch of crap that they have no idea what to do with. What a huge waste of resources.” Schmidt says the plan is to make Google products easier to use by packaging services, citing plans to combine the company’s spreadsheet, calendar and word-processing programs into one suite of Web-based applications. “That is a big change in the way we run the company,” Schmidt said, describing Google’s previous attitude as, “Just get this stuff built and get it out — don’t worry about the integration.”