Archive for the 'Search' Category

When ignorance is bliss

[link] Sunday, August 3rd, 2008

This morning’s Observer column

Sometimes, ignorance is bliss. We saw two examples of this last week. The first came when a new search engine - Cuil (www.cuil.com) - was unveiled. The launch was an old-style PR operation. Some influential bloggers and mainstream reporters had been briefed in advance, and whispers were circulating in cyberspace that this would be Something Big. Cuil would be the ‘Google Killer’ everyone had been waiting for.

Evidence for this hypothesis was freely cited. The venture was the brainchild of ‘former Google employees’: nudge, nudge. At least one of them had been at Stanford, the university that nurtured the founders of both Yahoo and Google: wink, wink. It had indexed no fewer than 121 billion web pages, compared with Google’s measly 40 billion: Wow! Cuil had already received $33m in venture funding! Cue trumpets.

So many people were taken in by this that when cuil.com finally opened for business the site was swamped…

How big is the web?

[link] Saturday, July 26th, 2008

Nobody really knows, but here is an interesting post on the Official Google Blog…

We’ve known it for a long time: the web is big. The first Google index in 1998 already had 26 million pages, and by 2000 the Google index reached the one billion mark. Over the last eight years, we’ve seen a lot of big numbers about how much content is really out there. Recently, even our search engineers stopped in awe about just how big the web is these days — when our systems that process links on the web to find new content hit a milestone: 1 trillion (as in 1,000,000,000,000) unique URLs on the web at once!

How do we find all those pages? We start at a set of well-connected initial pages and follow each of their links to new pages. Then we follow the links on those new pages to even more pages and so on, until we have a huge list of links. In fact, we found even more than 1 trillion individual links, but not all of them lead to unique web pages. Many pages have multiple URLs with exactly the same content or URLs that are auto-generated copies of each other. Even after removing those exact duplicates, we saw a trillion unique URLs, and the number of individual web pages out there is growing by several billion pages per day.

So how many unique pages does the web really contain? We don’t know; we don’t have time to look at them all! :-) Strictly speaking, the number of pages out there is infinite — for example, web calendars may have a “next day” link, and we could follow that link forever, each time finding a “new” page. We’re not doing that, obviously, since there would be little benefit to you. But this example shows that the size of the web really depends on your definition of what’s a useful page, and there is no exact answer…

First European Privacy Seal awarded

[link] Wednesday, July 16th, 2008

Here’s an interesting development — a search engine that really takes privacy seriously.

The first European privacy seal was presented today to search engine ixquick.com by the European Data Protection Supervisor Peter Hustinx on the occasion of the 30th anniversary of data protection legislation in Schleswig-Holstein.

According to the citation:

Ixquick is a meta-search engine which forwards search requests of its users to several search engines, gathers and combines their results and presents the results to the requesting users. Privacy is ensured by using several data-minimization techniques: personal data like IP addresses are deleted within 48 hours, after which they are no longer needed to prevent possible abuse of the servers. The remaining (non-personal) data are deleted within 14 days. Ixquick serves as a proxy, i.e. IP addresses of users are not disclosed to other search engines.

Hmmm… Bet that won’t appeal to the British Home Office.

Thanks to Gerard for the link.

Flash pages to be searchable

[link] Thursday, July 3rd, 2008

From Technology Review

The Web would be useless without search engines. But as good as Google and Yahoo are at finding online information, much on it remains hidden, or difficult to rank in search results. On Tuesday, however, Adobe took a major step toward opening up tens of millions of pages to Google and Yahoo. The company has provided the search engines with a specialized version of its Flash animation player that reveals information about text and links in Flash files. It’s a move that could be a boon to advertisers, in particular, who have traditionally had to choose between building a site that’s aesthetically pleasing and one that can be ranked in a Web search.

The new software is required only to index Flash files, not to play them, says Justin Everett-Church, senior product manager for Adobe Flash Player. Web surfers don’t need to download a new Flash player, and content providers don’t have to change the way they write applications. “For end users, they’re going to see a lot more results and a lot better results,” says Everett-Church. “The perfect result may have been out there but trapped in a SWF [Shockwave Flash file]. But now they can find it.”

Q: Where has Obama spent $3.5 million so far this year? A: Google ads

[link] Thursday, May 29th, 2008

From ClickZ

Barack Obama’s campaign spent at least $3.47 million on online advertising related purchases between January and April. The biggest recipient of the Democratic Presidential hopeful’s online ad dollars was Google.

The search giant scored over 82 percent of money spent on online media buys for the Illinois Senator’s campaign this year through April, according to information compiled from Federal Election Commission filings. More than $2.8 million was paid to Google, as listed by Obama for America in its itemized FEC reports.

After spending about $640,000 in January on online advertising, the campaign pumped its online ad budget up to over $1.9 million in February. Expenditures tapered to about $888,000 the following Month. Filings show spending of only around $234,000 in April. However, previous monthly reports suggest more April online ad payments will be reported in the future; Google didn’t even appear in April spending data supplied by the campaign…

How to find John Kelly

[link] Monday, April 21st, 2008

John Kelly has been studying the search engine queries that bring people to his (excellent) blog

The majority of keyword searches involve some variation on “John Kelly blog”, but they’re not the ones that remind us how the fetishes, pathologies and strange obsessions of humankind are catalogued every day on the world wide web.

For example, after writing about my family’s trip to Prague - a trip that I feel moved to point out was 100% prostitute-free - someone from the United Arab Emirates found my blog by Googling “hooker sex apartments near wenceslas square”. I just love that construction: “hooker sex apartments”. It sounds like something an estate agent would put on a brochure: “The property is located in a desirable area, close to schools, shopping and hooker sex apartments.”

If you blog about the British tabloid press, as I sometimes do, you will have occasion to use the words “penis” and “breast”. And that will guarantee more than a few searches along the lines of “penis grab off” (some kind of martial arts move, evidently) and “how to grab a woman’s breast without getting in trouble”…

Turkey flights

[link] Sunday, February 10th, 2008

This morning’s Observer column

It’s the metaphors and similes that get me. It’s a shotgun marriage, declared one commentator, ‘with Google holding the gun’. Putting Microsoft and Yahoo together, said another, was like trying to produce an eagle from an alliance of two turkeys.

This is unfair. Microsoft isn’t a turkey, but a profitable, boring mastodon that entertains fantasies about being able to fly. Yahoo, for its part, is an ageing hippy who invented hang- gliding but aspired to fly 747s and then discovered that he wasn’t very good at it. The mastodon hopes that by employing the hippy it will learn to hang-glide. The hippy’s feelings about the whole deal are plain for all to see…

Update: The NYT (and lots of other sources) claim that the Yahoo board has decided to reject the Microsoft bid, on the grounds that it undervalues the company. Ho!

If this is true then what’s likely to happen is that (a) some big Yahoo shareholders will revolt and (b) Microsoft will wage a proxy war with the aim of eplacing the Yahoo board at the next AGM. This one will run and, er, ruin.

Google’s loss is the Digger’s Gain

[link] Friday, February 1st, 2008

I always thought the MySpace/Google deal was a work of genius — for Rupert Murdoch. It’s beginning to look as though I was right.

The stock market may be fretting over Google’s disappointing earnings, but somewhere Rupert Murdoch is smiling.

One of the weaknesses that Google’s management highlighted in its conference call was advertising on social networks. The company said its traffic acquisition cost, the money it pays to sites on which it places ads, rose in the fourth quarter because of required minimum payments it must make to certain sites.

“We have found that social networking inventory is not monetizing as well as we would like,” said George Reyes, Google’s chief financial officer, implying that the sites on which the minimum payments are due were social networks. By far, the largest social network on which Google sells ads is MySpace, which is owned by Mr. Murdoch’s News Corp. In 2006, Google agreed to a three-year deal to sell ads on MySpace, committing to pay a minimum of $900 million.

People involved in that deal said that Google never assumed that it would earn its $900 million back from that deal, but it appears to be losing even more than it had expected.

Social Search

[link] Friday, February 1st, 2008

From Technology Review

Now a company called Delver, which presented at Demo earlier this week, is working on a search engine that uses social-network data to return personalized results from the larger Web.

Liad Agmon, CEO of Delver, says that the site connects information about a user’s social network with Web search results, “so you are searching the Web through the prism of your social graph.” He explains that a person begins a search at Delver by typing in her name. Delver then crawls social-networking websites for widely available data about the user–such as a public LinkedIn profile–and builds a network of associated institutions and individuals based on that information. When the user enters a search query, results related to, produced by, or tagged by members of her social network are given priority. Lower down are results from people implicitly connected to the user, such as those relating to friends of friends, or people who attended the same college as the user. Finally, there may be some general results from the Web at the bottom. The consequence, says Agmon, is that each user gets a different set of results from a given query, and a set quite different from those delivered by Google…

Xerox Enters Search Market

[link] Monday, June 25th, 2007

From TechCrunch

Xerox announced its entry into the search market this week with FactSpotter, document search software that is claimed to go beyond conventional keyword search.

FactSpotter is text mining software that combines a linguistic engine that allows users to make queries in everyday language. FactSpotter looks for the keywords contained in a query along with the context those words have.

According to Xerox, FactSpotter is capable of combing through almost any document regardless of the language, location, format or type; take advantage of the way humans think, speak and ask questions; and discriminate the results highlighting just a handful of relevant answers instead of returning thousands of unrelated responses…

Sounds interesting. But…

FactSpotter will not be coming to a browser near anyone, anytime shortly. Xerox plans to launch FactSpotter next year as part of the paid Xerox Litigation Service platform and has no plans for a wider or public release.