Bing making headway?

The NYT thinks that it might be making an impact.

SAN FRANCISCO — In late May, Microsoft unveiled Bing, its new Internet search engine, in front of an audience of skeptics: technology executives and other digerati who had gathered near San Diego for an industry conference.

To that crowd, Microsoft’s efforts to take on Google and Yahoo in the search business had become something of a laughingstock, and for good reason. Microsoft’s repeated efforts to build a credible search engine had fallen flat, and the company’s market share was near its low.

Six weeks later, Bing has earned Microsoft something the company’s search efforts have lacked: respect.

As a result, analysts say, the once-dubious prospect that Microsoft could shake up the dynamics of the search business, which is worth $12 billion in the United States alone, has become just a bit more likely…

I hope this hunch is correct. The world needs some serious competition for Google.

Google takes on Netscape’s mission

This morning’s Observer column.

The intriguing thing about the Google announcement is not that it is developing an OS, but that it is switching tack. For nearly two years the company has been developing a Linux-based OS for mobile phones under the Android label. Most of us who have used Android assumed it was only a matter of time before a version tailored for Netbooks was released.

But that is not what Google announced. There wasn’t much technical detail in the company’s blog post, but the one thing that is clear is that the new OS will be – in its words – “a natural extension of Google Chrome”. It is, they go on to say, “our attempt to rethink what operating systems should be”.

If true, we have reached a significant milestone because what the Google guys propose amounts to turning the world upside down…

Saving Texts From Oblivion

Interesting essay by Oxford University Press’s Tim Barton.

At a focus group in Oxford University Press’s offices in New York last month, we heard that in a recent essay assignment for a Columbia University classics class, 70 percent of the undergraduates had cited a book published in 1900, even though it had not been on any reading list and had long been overlooked in the world of classics scholarship. Why so many of the students had suddenly discovered a 109-year-old work and dragged it out of obscurity in preference to the excellent modern works on their reading lists is simple: The full text of the 1900 work is online, available on Google Book Search; the modern works are not.

It’s a very thoughtful, non-doctrinaire piece. “If it’s not online, it’s invisible”, he writes.

While increasing numbers of long-out-of-date, public-domain books are now fully and freely available to anyone with a browser, the vast majority of the scholarship published in book form over the last 80 years is today largely overlooked by students, who limit their research to what can be discovered on the Internet.

On the Google Books ‘agreement’, he writes:

It has taken many months for the import of the settlement to become clear. It is exceedingly complex, and its design — the result of two years of negotiations, including not just the parties but libraries as well — is, not surprisingly, imperfect. It can and should be improved. But after long months of grappling with it, what has become clear to us is that it is a remarkable and remarkably ambitious achievement.

It provides a means whereby those lost books of the last century can be brought back to life and made searchable, discoverable, and citable. That aim aligns seamlessly with the aims of a university press. It is good for readers, authors, and publishers — and, yes, for Google. If it succeeds, readers will gain access to an unprecedented amount of previously lost material, publishers will get to disseminate their work — and earn a return from their past investments — and authors will find new readers (and royalties). If it fails, the majority of lost books will be unlikely ever to see the light of day, which would constitute an enormous setback for scholarly communication and education.

The settlement is a step forward in solving the problem of “orphan works,” titles that are in copyright but whose copyright holders are elusive, meaning that no rightsholder can be found to grant permission for a title’s use. For such books, a professor cannot include a chapter in a course pack for students; a publisher cannot include an excerpt in an anthology; and no one can offer a print or an electronic copy for sale. Making those books available again is a clear public good. Google’s having exclusive rights to use them, as enshrined in the current settlement, however, is not.

If the parties to the settlement cannot themselves solve this major problem, then at a minimum Congress should pass orphan-works legislation that gives others the same rights as Google — an essential step if Google is not to gain an unfair advantage. Despite significant advocacy, Congress has failed to legislate on this issue for 20 years; we at Oxford hope the specter of Google having exclusive rights to use orphan works will spur heightened public debate and Congress to immediate action.

Google: waving, not drowning

This morning’s Observer column.

From the outset, Google clearly had plans for Ajax. The evidence was in the steady accretion of Gmail features like instant messaging, audio – and then video – chat, and so on. But until the end of last month we were still unsure about where all this was headed.

Now we know. It’s called Google Wave. It’s described as “a real-time communication platform which combines aspects of email, instant messaging, wikis, web chat, social networking and project management to build one elegant, in-browser communication client”. Translation: it’s a sophisticated set of tools enabling people to work collaboratively across the internet. And ‘real-time’ means exactly that: in most cases what you type appears – as you type it – on other people’s screens…

Google Wave: the gist

Here’s a useful outline of the main features of Google Wave. In essence it’s “a real-time communication platform” which “combines aspects of email, instant messaging, wikis, web chat, social networking, and project management to build one elegant, in-browser communication client”.

Main features:

  • Real-time: In most instances, you can see what someone else is typing, character-by-character.
  • Embeddability: Waves can be embedded on any blog or website.
  • Applications and Extensions: Just like a FacebookFacebook reviewsFacebook reviews application or an iGoogle gadget, developers can build their own apps within waves. They can be anything from bots to complex real-time games.
  • Wiki functionality: Anything written within a Google Wave can be edited by anyone else, because all conversations within the platform are shared. Thus, you can correct information, append information, or add your own commentary within a developing conversation.
  • Open source: The Google Wave code will be open source, to foster innovation and adoption amongst developers.
  • Playback: You can playback any part of the wave to see what was said.
  • Natural language: Google Wave can autocorrect your spelling, even going as far as knowing the difference between similar words, like “been” and “bean.” It can also auto-translate on-the-fly.
  • Drag-and-drop file sharing: No attachments; just drag your file and drop it inside Google Wave and everyone will have access.

    While these are only a few of the many features of Google Wave, it’s easy to see why people are extremely excited.

  • Google proposes giving librarians a say in price of access to orphan works

    From today’s NYTimes.

    SAN FRANCISCO — In a move that could blunt some of the criticism of Google for its settlement of a lawsuit over its book-scanning project, the company signed an agreement with the University of Michigan that would give some libraries a degree of oversight over the prices Google could charge for its vast digital library.

    Google has faced an onslaught of opposition over the far-reaching settlement with authors and publishers. Complaints include the exclusive rights the agreement gives Google to publish online and to profit from millions of so-called orphan books, out-of-print books that are protected by copyright but whose rights holders cannot be found.

    The Justice Department has also begun an inquiry into whether the settlement, which is subject to approval by a court, would violate antitrust laws.

    Google used the opportunity of the University of Michigan agreement to rebut some criticism.

    “I think that it’s pretty short- sighted and contradictory,” said Sergey Brin, a Google co-founder and its president of technology. Mr. Brin said the settlement would allow Google to offer widespread access to millions of books that are largely hidden in the stacks of university libraries.

    “We are increasing choices,” Mr. Brin said. “There was no option prior to this to get these sorts of books online.”

    Under Google’s plan for the collection, public libraries will get free access to the full texts for their patrons at one computer, and universities will be able to buy subscriptions to make the service generally available, with rates based on their student enrollment.

    The new agreement, which Google hopes other libraries will endorse, lets the University of Michigan object if it thinks the prices Google charges libraries for access to its digital collection are too high, a major concern of some librarians. Any pricing dispute would be resolved through arbitration.

    WolframAlpha: correction

    Hmmm… Seems that I was wrong. WolframAlpha isn’t really a competitor to Google, or indeed a search engine in the normal sense of the term. Or so the NYT maintains.

    WolframAlpha, a powerful new service that can answer a broad range of queries, has become one of the most anticipated Web products of the year. But its creator, Stephen Wolfram, wants to make something clear: Despite the online chatter comparing it to Google, his service is not intended to dethrone the king of search engines.

    “I am not keen on the hype,” said Mr. Wolfram, a well-known scientist and entrepreneur and the founder of Wolfram Research, a company in Champaign, Ill., that has been quietly developing WolframAlpha.

    Mr. Wolfram’s service does not search through Web pages, and it will not help with movie times or camera shopping. Instead it computes the answers to queries using enormous collections of data the company has amassed. It can quickly spit out facts like the average body mass index of a 40-year-old male, whether the Eiffel Tower is taller than Seattle’s Space Needle, and whether it is high tide in Miami right now.

    WolframAlpha, which is expected to be available to the public at wolframalpha.com in the next week, is not a finished product. It is an early working version of a project that has been years in the making and will continue to evolve over years, if not decades. As such, there is much it cannot answer now.

    Wolfram Alpha vs Google

    At last, some data. David Talbot got a login id from Wolfram and ran some comparative tests. For example:

    SEARCH TERM: 10 pounds kilograms

    WOLFRAM ALPHA: The site informed me that it interpreted my search term as an effort to multiply "10 pounds" by "1 kilogram" and gave me this result: 4.536 kg2 (kilograms squared) or 22.05 lb2 (pounds squared).

    GOOGLE: Google gave me links to various metric conversion sites.

    Tentative conclusion: the semantic web is still a long way off. The problem of search is only about 5% solved. Google accounts for 3% of that. Mr Talbot’s experiments suggest that Wolfram isn’t going to move it much beyond 6%. Still, it’s progress. And Google could do with some competition,

    How Google does it

    If you came on US Patent #7508978 you might stifle a yawn. Certainly you’d never suspect that it might be a design for radically changing our communications environment. Here’s what the Abstract says:

    Detection of grooves in scanned images

    A system and method locate a central groove in a document such as a book, magazine, or catalog. In one implementation, scores are generated for points in a three-dimensional image that defines a surface of the document. The scores quantify a likelihood that a particular point is in the groove. The groove is then detected based on the scores. For example, lines may be fitted through the points and a value calculated for the lines based on the scores. The line corresponding to the highest calculated value may be selected as the line that defines the groove.

    Eh? And yet it turns out that this is the basis for Google’s amazingly efficient book-scanning technology.
    In a lovely blog post, Maureen Clements explains how:

    Turns out, Google created some seriously nifty infrared camera technology that detects the three-dimensional shape and angle of book pages when the book is placed in the scanner. This information is transmitted to the OCR software, which adjusts for the distortions and allows the OCR software to read text more accurately. No more broken bindings, no more inefficient glass plates. Google has finally figured out a way to digitize books en masse. For all those who’ve pondered “How’d They Do That?” you finally have an answer.

    LATER: How the Internet Archive scans books. As you can see from the movie, it’s pretty labour-intensive, despite the robotics.

    Amazon, Google and Juvenal: Quis custodiet…

    Jeff Bezos’s mantra from the moment he founded Amazon was “get big quick”. We’re beginning to see just how big it’s getting.

    Last week an investment analyst estimated that Amazon now ‘facilitates’ (and takes a cut from) a third of all e-commerce transactions in the US.

    Then the gay and lesbian community discovered how powerful Amazon’s database can be when what the company later described as an “embarrassing and ham-fisted cataloging error” effectively banned 57,310 listings of so-called ‘adult’ books and DVDs by making them invisible. This saga was well covered on the Web. See, for example, Clay Shirky’s admirable apologia (for being seduced by righteousness), Bill Thompson’s BBC column and Rory Cellan-Jones’s early blog post on the subject.

    Looming over all this, of course, is a Really Big Question. Companies like Amazon and Google have acquired enormous power. Both can effectively render significant chunks of our culture invisible at the click of a mouse. But they are public corporations, answerable only to their shareholders — if at all. (Actually, in Google’s case, the two-tier shareholding structure means that the company’s leaders are not accountable even to their shareholders.) So, as the Roman poet Juvenal famously observed: Quis custodiet ipsos custodes? (Who will guard the guards themselves?) To date, we’ve avoided the question, arguing that if companies step out of line then in a competitive market they will pay the penalty for messing us around. So, if Google was deliberately skewing search results (so the argument runs) then the market would detect that and people would go to other search engines. I suspect we’ve moved beyond that comforting point. So the question remains: who will keep these online behemoths honest?