Who’s Messing with Wikipedia?

This diagram summarises the editing activity on the wikipedia page about Global Warming. I produced it by running the entry through an intriguing new web service described by Technology Review.

Despite warnings from many high-school teachers and college professors, Wikipedia is one of the most-visited websites in the world (not to mention the biggest encyclopedia ever created). But even as Wikipedia’s popularity has grown, so has the debate over its trustworthiness. One of the most serious concerns remains the fact that its articles are written and edited by a hidden army of people with unknown interests and biases.

Ed Chi, a senior research scientist for augmented social cognition at the Palo Alto Research Center (PARC), and his colleagues have now created a tool, called WikiDashboard, that aims to reveal much of the normally hidden back-and-forth behind Wikipedia’s most controversial pages in order to help readers judge for themselves how suspect its contents might be.

Wikipedia already has procedures in place designed to alert readers to potential problems with an entry. For example, one of Wikipedia’s volunteer editors can review an article and tag it as ‘controversial’ or warn that it ‘needs sources.’ But in practice, Chi says, relatively few articles actually receive these tags. WikiDashboard instead offers a snapshot of the edits and re-edits, as well as the arguments and counterarguments that went into building each of Wikipedia’s many million pages.

This is a great idea.

Call it a micki

Doc Searls has been thinking about what’s missing in wiki technology.

Wikis are flat. All topics are at the same level. This is fine for an encyclopedia, but lousy for, say, projects. Joint efforts such as ProjectVRM are not flat. They have topics and subtopics. These change and move around, and this is where an outliner like MORE is so handy. With a few keystrokes you can move topics up and down levels, back and forth between higher-level headings… You can hoist any single topic up and work on that as if it were a top level. You can clone a topic or a piece of text and edit it in two places at once. I could go on, but trust me: it freaking rocked. There was no faster way to think or type. Hell, I’m typing this in one of its decendents: an OPML editor, also written by Dave Winer.

Anyway, just wanted to say, here in the midst of an unrelated local conversation, that wiki that works like MORE remains on the top of my software wish list for the world. Trust me: it would make the world a much more sensible place. And make both individual and group work a helluva lot easier.

Source.

Dave Winer is interested. I’d put money on the proposition that something useful will come from this.

MORE was the most useful piece of software I’ve ever used. It ran on all the early Macs I owned. For years after OS X came out I retained the Mac Classic emulator for just one purpose — so that I could run MORE. I stopped only after Q discovered OmniOutliner, which is pretty good — and still the best tool for thinking on my machine. But it only works on my stuff: a wiki tool which would bring that kind of functionality to collaborative documents would be a killer web app.

The sting in the long tail

This morning’s Observer column.

'Scorpions', says Wikipedia, 'are eight-legged venomous arachnids. They have a long body with an extended tail with a sting.' Staff of the Internet Watch Foundation (IWF), the self-appointed monitor of 'child sexual abuse content hosted worldwide' and of 'criminally obscene and incitement to racial hatred content hosted in the UK', may well find themselves in rueful agreement about the sting. Except that what they've discovered is that Wikipedia also has one.

Pause for a review of recent events…

Managing spikes

Fascinating post about current traffic patterns on the Net.

Lately, I see more sudden eyeballs and what used to be an established trend seems to fall into a more chaotic pattern that is the aggregate of different spike signatures around a smooth curve. This graph is from two consecutive days where we have a beautiful comparison of a relatively uneventful day followed by long-exposure spike (nytimes.com) compounded by a short-exposure spike (digg.com):

The disturbing part is that this occurs even on larger sites now due to the sheer magnitude of eyeballs looking at today’s already popular sites. Long story short, this makes planning a real bitch.

And the interesting thing is perspective on what is large… People think Digg is popular — it is. The New York Times is too, as is CNN and most other major news networks — if they link to your site, you can expect to see a dramatic and very sudden increase in traffic. And this is just in the United States (and some other English speaking countries)… there are others… and they’re kinda big.

What isn’t entirely obvious in the above graphs? These spikes happen inside 60 seconds. The idea of provisioning more servers (virtual or not) is unrealistic. Even in a cloud computing system, getting new system images up and integrated in 60 seconds is pushing the envelope and that would assume a zero second response time. This means it is about time to adjust what our systems architecture should support. The old rule of 70% utilization accommodating an unexpected 40% increase in traffic is unraveling. At least eight times in the past month, we’ve experienced from 100% to 1000% sudden increases in traffic across many of our clients.

Dr Internet

This morning’s Observer column

A detailed academic study some years ago estimated that 4.5 per cent of all internet searches were health-related, which at the time translated into 16.7 million health-related queries a day. Again, I’m sure that number has gone up.

All of which suggests that people worry a lot about their health and see the web as a great way of becoming better informed. The medical profession is, to put it mildly, not over the moon. The more literate practitioners shake their heads and quote Mark Twain’s adage: ‘Be careful about reading health books. You may die of a misprint.’ But others are more righteous and wax indignant about what they see as the errors and misinformation peddled by many sites that purport to deal with health issues…

LATER: I had a moving email from a correspondent who lives on the other side of the world. I’m quoting it verbatim except that I’ve anonymised it.

In 2005, my second son, arrived in a hurry. We were living in Southern Japan at the time.
Following what had been another wonderful pregnancy, my wife and I were not at all prepared for the shock of his arrival and his condition.

It turned out our son was born with congenital cytomegalovirus. Once I learned the name of what it was that had ravaged his body, I obviously turned to the Internet, including PubMed. Alright, so I am familiar with research, indices, journals et cetera, I was at the time an Associate Prof., and I am fairly well read. So perhaps I am not your average punter, but nevertheless within 24hrs I had read almost all there was on the research and treatment of cCMV.

The good folks at the National hospital, had similarly gone off to look this one up. But their research was almost entirely based on what was in-print in Japanese.

We had both come to similar, but not identical conclusions. One, and only one treatment was available, a chemotherapy over the course of 6 weeks may save his sight and his hearing. The Registrar wanted to begin immediately. I said No.
I had read about the dangers of the chemo in seriously compromised infants, and through the internet had managed to reach doctors at Mayo, U. of Alabama, Melbourne Sick Kids, Sydney Royal, and Great Ormond Street, never mind almost 100 parents of kids with cCMV, through a list-serv.

The overwhelming advice, albeit guarded, and with lots of back-out clauses, essentially said, ‘Wait, let the infant recover from the trauma of birth, treat some of the minor conditions, and in a week or so’s time – then start the chemo. If you start it now – he will die.’.

It was a very hard call – going against the doctor’s advice in Japan. But I did. To say they were not happy is a bit of an understatement. I subsequently moved my son a few days later to a newer prefectural hospital, and another NICU team.
He began the chemo course at 10 days old, in a much stronger condition, and he got through it. He can see, and despite being told he was going to be severely deaf, he can hear.

While I could have rung around using old fashioned telephones, there is no way I could have been as informed, and armed with knowledge without the Internet.

I have no doubt it saved his life.

My son is now 3, and is truly the happiest child you could care to meet.

STILL LATER: Jeff Jarvis picked up on the column and added:

In my book, I argue that – as with other apparent problems in industries – there is opportunity here. Doctors should act as curators, selecting the best information for their patients and making sure they are better informed.

Cyberchondria

Wow! Microsoft Research has just published a research study on what happens when people seek health information on the Web. Abstract:

The World Wide Web provides an abundant source of medical information. This information can assist people who are not healthcare professionals to better understand health and disease, and to provide them with feasible explanations for symptoms. However, the Web has the potential to increase the anxieties of people who have little or no medical training, especially when Web search is employed as a diagnostic procedure. We use the term cyberchondria to refer to the unfounded escalation of concerns about common symptomatology, based on the review of search results and literature on the Web. We performed a large-scale, longitudinal, log-based study of how people search for medical information online, supported by a large-scale survey of 515 individuals’ health-related search experiences. We focused on the extent to which common, likely innocuous symptoms can escalate into the review of content on serious, rare conditions that are linked to the common symptoms. Our results show that Web search engines have the potential to escalate medical concerns. We show that escalation is influenced by the amount and distribution of medical content viewed by users, the presence of escalatory terminology in pages visited, and a user’s predisposition to escalate versus to seek more reasonable explanations for ailments. We also demonstrate the persistence of post-session anxiety following escalations and the effect that such anxieties can have on interrupting user’s activities across multiple sessions. Our findings underscore the potential costs and challenges of cyberchondria and suggest actionable design implications that hold opportunity for improving the search and navigation experience for people turning to the Web to interpret common symptoms.

Dr Google

This is interesting — Google Flu Trends…

We have found a close relationship between how many people search for flu-related topics and how many people actually have flu symptoms. Of course, not every person who searches for “flu” is actually sick, but a pattern emerges when all the flu-related search queries from each state and region are added together. We compared our query counts with data from a surveillance system managed by the U.S. Centers for Disease Control and Prevention (CDC) and discovered that some search queries tend to be popular exactly when flu season is happening. By counting how often we see these search queries, we can estimate how much flu is circulating in various regions of the United States.

There’s a nice animation on the site showing how official health data lags Google searches.

The NYT has a report on this today.

Excerpt:

Tests of the new Web tool from Google.org, the company’s philanthropic unit, suggest that it may be able to detect regional outbreaks of the flu a week to 10 days before they are reported by the Centers for Disease Control and Prevention.

In early February, for example, the C.D.C. reported that the flu cases had recently spiked in the mid-Atlantic states. But Google says its search data show a spike in queries about flu symptoms two weeks before that report was released. Its new service at google.org/flutrends analyzes those searches as they come in, creating graphs and maps of the country that, ideally, will show where the flu is spreading.

The C.D.C. reports are slower because they rely on data collected and compiled from thousands of health care providers, labs and other sources. Some public health experts say the Google data could help accelerate the response of doctors, hospitals and public health officials to a nasty flu season, reducing the spread of the disease and, potentially, saving lives.

“The earlier the warning, the earlier prevention and control measures can be put in place, and this could prevent cases of influenza,” said Dr. Lyn Finelli, lead for surveillance at the influenza division of the C.D.C. From 5 to 20 percent of the nation’s population contracts the flu each year, she said, leading to roughly 36,000 deaths on average.

WebPolitics 2.0

This morning’s Observer column

A few days ago we had the extraordinary spectacle of a Republican presidential candidate complaining that his rival had more money to spend on TV advertising than he had. To those of us who grew up in an era when conservatives always had more money and controlled the dominant communications media, this was truly extraordinary. It summoned up memories of Adlai Stevenson, George McGovern, Michael Foot and Neil Kinnock running doomed, underfunded campaigns against opponents who had cash to burn and the best PR expertise money could buy…

MORE: Fascinating video interview with Jascha Franklin-Hodge — cofounder of Blue State Digital, which built Obama’s online social-networking tools — describes how the president-elect’s social-networking strategy made for a well-oiled Election Day effort. And how it can be used in government.

Wikipedia offline

Wikipedia produces a downloadable version of the encyclopedia aimed at the schools, with content relevant to the national curriculum. Great idea, and one that could have some serious applications in developing countries where schools have difficulty getting a workable internet connection. The blurb describes it as

a free, hand-checked, non-commercial selection from Wikipedia, targeted around the UK National Curriculum and useful for much of the English speaking world. It has about 5500 articles (as much as can be fitted on a DVD with good size images) and is about the size of a twenty volume encyclopaedia (34,000 images and 20 million words). Articles were chosen from a list ranked by importance and quality generated by project members. This list of articles was then manually sorted for relevance to children, and adult topics were removed. Compared to the 2007 version some six hundred articles were removed and two thousand more relevant articles (of now adequate quality) were added. SOS Children volunteers then checked and tidied up the contents, first by selecting historical versions of articles free from vandalism and then by removing unsuitable sections. External links and references are also not included since it was infeasible to check all of these.

The project is a joint venture with SOS Children’s Villages.

Thanks to BoingBoing for the link.