Excavating AI

Fabulous essay by Kate Crawford and Trevor Paglen, uncovering the politics and biases embedded in the guge image databases that have been used for training machine learning software. Here’s how it begins:

You open up a database of pictures used to train artificial intelligence systems. At first, things seem straightforward. You’re met with thousands of images: apples and oranges, birds, dogs, horses, mountains, clouds, houses, and street signs. But as you probe further into the dataset, people begin to appear: cheerleaders, scuba divers, welders, Boy Scouts, fire walkers, and flower girls. Things get strange: A photograph of a woman smiling in a bikini is labeled a “slattern, slut, slovenly woman, trollop.” A young man drinking beer is categorized as an “alcoholic, alky, dipsomaniac, boozer, lush, soaker, souse.” A child wearing sunglasses is classified as a “failure, loser, non-starter, unsuccessful person.” You’re looking at the “person” category in a dataset called ImageNet, one of the most widely used training sets for machine learning.

Something is wrong with this picture.

Where did these images come from? Why were the people in the photos labeled this way? What sorts of politics are at work when pictures are paired with labels, and what are the implications when they are used to train technical systems?

In short, how did we get here?

The authors begin with a deceptively simple question: What work do images do in AI systems? What are computers meant to recognize in an image and what is misrecognised or even completely invisible? They examine the methods used for introducing images into computer systems and look at “how taxonomies order the foundational concepts that will become intelligible to a computer system”. Then they turn to the question of labeling: “how do humans tell computers which words will relate to a given image? And what is at stake in the way AI systems use these labels to classify humans, including by race, gender, emotions, ability, sexuality, and personality?” And finally, they turn to examine the purposes that computer vision is meant to serve in our society and interrogate the judgments, choices, and consequences of providing computers with these capacities.

This is a really insightful and sobering essay, based on extensive research.

Some time ago Crawford and Paglen created an experimental website — ImageNet Roulette — which enabled anyone to upload their photograph and then pulled up from the ImageNet database how the person would be classified based on their photograph. The site is now offline, but the Guardian journalist Julia Carrie Wong wrote an interesting article about it recently in the course of which she investigated how it would classify/describe her from her Guardian byline photo. Here’s what she found.

Interesting ne c’est pas? Remember, this is the technology underpinning facial recognition.

Do read the whole thing.

Quote of the Day

”For all the progress made, it seems like almost all important questions in AI remain unanswered. Many have not even been properly asked yet.”

Francois Chollet

Posted in AI

Quote of the Day

“It’s absurd to believe that you can become world leader in ethical AI before becoming world leader in AI first”

Ulrike Franke, policy fellow at the European Council on Foreign Relations.

Google’s big move into ethics-theatre backfires.

This morning’s Observer column:

Given that the tech giants, which have been ethics-free zones from their foundations, owe their spectacular growth partly to the fact that they have, to date, been entirely untroubled either by legal regulation or scruples about exploiting taxation loopholes, this Damascene conversion is surely something to be welcomed, is it not? Ethics, after all, is concerned with the moral principles that affect how individuals make decisions and how they lead their lives.

That charitable thought is unlikely to survive even a cursory inspection of what is actually going on here. In an admirable dissection of the fourth of Google’s “principles” (“Be accountable to people”), for example, Prof David Watts reveals that, like almost all of these principles, it has the epistemological status of pocket lint or those exhortations to be kind to others one finds on evangelical websites. Does it mean accountable to “people” in general? Or just to Google’s people? Or to someone else’s people (like an independent regulator)? Answer comes there none from the code.

Warming to his task, Prof Watts continues: “If Google’s AI algorithms mistakenly conclude I am a terrorist and then pass this information on to national security agencies who use the information to arrest me, hold me incommunicado and interrogate me, will Google be accountable for its negligence or for contributing to my false imprisonment? How will it be accountable? If I am unhappy with Google’s version of accountability, to whom do I appeal for justice?”

Quite so. But then Google goes and doubles down on absurdity with its prestigious “advisory council” that “will consider some of Google’s most complex challenges that arise under our AI Principles, such as facial recognition and fairness in machine learning, providing diverse perspectives to inform our work”…

Read on

After I’d written the column, Google announced that it was dissolving its ethics advisory council. So we had to add this:

Postscript: Since this column was written, Google has announced that it is disbanding its ethics advisory council – the likely explanation is that the body collapsed under the weight of its own manifest absurdity.

That still leaves the cynical absurdity of Google’s AI ‘principles’ to be addressed, though.

Media credulity and AI hype

This morning’s Observer column:

Artificial intelligence (AI) is a term that is now widely used (and abused), loosely defined and mostly misunderstood. Much the same might be said of, say, quantum physics. But there is one important difference, for whereas quantum phenomena are not likely to have much of a direct impact on the lives of most people, one particular manifestation of AI – machine-learning – is already having a measurable impact on most of us.

The tech giants that own and control the technology have plans to exponentially increase that impact and to that end have crafted a distinctive narrative. Crudely summarised, it goes like this: “While there may be odd glitches and the occasional regrettable downside on the way to a glorious future, on balance AI will be good for humanity. Oh – and by the way – its progress is unstoppable, so don’t worry your silly little heads fretting about it because we take ethics very seriously.”

Critical analysis of this narrative suggests that the formula for creating it involves mixing one part fact with three parts self-serving corporate cant and one part tech-fantasy emitted by geeks who regularly inhale their own exhaust…

Read on

How our view of AI is skewed by industry hype

The Reuters Institute In Oxford has just published a really valuable study of how AI is covered in mainstream media, based on an analysis of eight months of reporting on AI in six mainstream UK news outlets.

The study’s basic conclusion is that UK media coverage of artificial intelligence is dominated by industry products, announcements and research. Coverage frequently amplifies self-interested assertions of AI’s value and potential, while positioning the technology primarily as a private commercial concern and undercutting the role of public action in addressing AI.

Key findings:

  • Nearly 60% of articles were focused on new industry products, announcements and initiatives that include AI, from smart phones or running shoes, to sex robots or brain preservation. Outlets also regularly covered industry promotional events, start-ups, buyouts, investments, and conferences.

  • One third (33%) of articles were based on industry sources – mostly CEOs or other senior executives – six times as many as those from government and nearly twice as many as those from academia.

  • 12% of articles referenced the technology entrepreneur, Elon Musk.

  • AI products are often portrayed as a relevant and competent solution to a range of public problems, from cancer and renewable energy, to coffee delivery. Journalists or commentators rarely question whether AI-containing technologies are the best solutions to such problems or acknowledge ongoing debates concerning AI’s potential effects.

  • Media coverage of AI is being politicised: right-leaning news outlets highlight issues of economics and geopolitics; left-leaning news outlets highlight issues of ethics, including discrimination, algorithmic bias and privacy.

The report’s lead author, J. Scott Brennen, observed that

“by amplifying industry’s self-interested claims about AI, media coverage presents AI as a solution to a range of problems that will disrupt nearly all areas of our lives, often without acknowledging ongoing debates concerning AI’s potential effects. In this way, coverage also positions AI mostly as a private commercial concern and undercuts the role and potential of public action in addressing this emerging public issue.”

That sounds just about right to me. This is a terrific piece of work.

Posted in AI

How companies are addressing machine learning

From an O’Reilly newsletter:

In a recent O’Reilly survey, we found that the skills gap remains one of the key challenges holding back the adoption of machine learning. The demand for data skills (“the sexiest job of the 21st century”) hasn’t dissipated—LinkedIn recently found that demand for data scientists in the US is “off the charts,” and our survey indicated that the demand for data scientists and data engineers is strong not just in the US but globally.

With the average shelf life of a skill today at less than five years and the cost to replace an employee estimated at between six and nine months of the position’s salary, there’s increasing pressure on tech leaders to retain and upskill rather than replace their employees in order to keep data projects (such as machine learning implementations) on track. We’re also seeing more training programs aimed at executives and decision makers, who need to understand how these new ML technologies can impact their current operations and products.

Beyond investments in narrowing the skills gap, companies are beginning to put processes in place for their data science projects, for example creating analytics centers of excellence that centralize capabilities and share best practices. Some companies are also actively maintaining a portfolio of use cases and opportunities for ML.

Note the average shelf-life of a skill and then ponder why the UK government is not boosting the Open University.