An algorithmic approach to truth?

Apropos our research project’s recent symposium on virality, and in particular the relative speeds of online dissemination of truths and untruths, this paper from Google researchers is interesting. At the moment, Google ranks search results using a proprietary algorithm (or, more likely, set of algorithms) which perform some kind of ‘peer review’ of web pages. The essence of it seems to be that pages that are linked to extensively are ranked more highly than pages with fewer inbound links. This has obvious drawbacks in some cases, particularly when conspiracist thinking is involved. A web page or site which proposes a sensationalist interpretations for a major newsworthy event, for example, may be extensively quoted across the Internet, even though it might be full of misinformation or falsehoods.

The Google researchers have been exploring a method of evaluating web pages on the basis of factual accuracy. “A source that has few false facts is considered to be trustworthy”, they write. “The facts are automatically extracted from each source by information extraction methods commonly used to construct knowledge bases.” They propose a way to compute a “trustworthiness score” – Knowledge-Based Trust (KBT) — using fairly abstruse probabilistic modelling.

The paper reports that they tested the model on a test database and concluded that it enabled them to compute “the true trustworthiness levels of the sources”. They then ran the model on a database of 2.8B facts extracted from the web, and thereby estimated the trustworthiness of 119M webpages. They claim that “manual evaluation of a subset of the results confirms the effectiveness of the method”.

If this finding turns out to be replicable, then it’s an interesting result. The idea that ‘truth’ might be computable will keep philosophers amused an occupied for ages. The idea of a ‘fact’ is itself a contested notion in many fields, because treating something as a fact involves believing a whole set of ‘touchstone theories’. (Believing the reading on a voltmeter, for example, means believing a set of theories which link the movement of the needle on the dial to the underlying electrical phenomenon that is being measured.) And of course the Google approach would not be applicable to many of the pages on the Web, because they don’t make factual assertions or claims. It might, however, be useful in studying online sources which discuss or advocate conspiracy theories.

Even so, it won’t be without its problems. In an interesting article in Friday’s Financial Times, Robert Shrimsley points out that the Google approach is essentially using “fidelity to proved facts as a proxy for trust[worthiness]”. This works fine with single facts, he thinks, but runs into trouble with more complex networks of factual information.

And what about propositions that were originally regarded as ‘facts’ but were later invalidated. “In 1976, “, Shrimsley writes,

“the so-called Birmingham Six were officially guilty of bombings that killed 21 people. Fifteen years later their convictions were quashed and they were officially innocent. This took place in a pre-internet world but campaigns to overturn established truths take time and do not always start on sober, respected news sites. The trust score could make it harder for such campaigns to bubble up.”

And of course we’re still left with the question of what is established truth anyway.

Memex 1.1

John Naughton's online diary

An algorithmic approach to truth?