Sunday 26 January, 2020

What the Clearview AI story means

This morning’s Observer column:

Ultimately, the lesson of Clearview is that when a digital technology is developed, it rapidly becomes commodified. Once upon a time, this stuff was the province of big corporations. Now it can be exploited by small fry. And on a shoestring budget. One of the co-founders paid for server costs and basic expenses. Mr Ton-That lived on credit-card debt. And everyone worked from home. “Democracy dies in darkness” goes the motto of the Washington Post. “Privacy dies in a hacker’s bedroom” might now be more appropriate.

Read on

UPDATE A lawsuit — seeking class-action status — was filed this week in Illinois against Clearview AI, a New York-based startup that has scraped social media networks for people’s photos and created one of the biggest facial recognition databases in the world.


Privacy is a public good

Shoshana Zuboff in full voice:

”The belief that privacy is private has left us careening toward a future that we did not choose, because it failed to reckon with the profound distinction between a society that insists upon sovereign individual rights and one that lives by the social relations of the one-way mirror. The lesson is that privacy is public — it is a collective good that is logically and morally inseparable from the values of human autonomy and self-determination upon which privacy depends and without which a democratic society is unimaginable.”

Great OpEd piece.


The winding path


Why the media shouldn’t underestimate Joe Biden

Simple: Trump’s crowd don’t. They think he’s the real threat. (Which explains the behaviour that’s led to Trump’s Impeachment.) David Brooks has some sharp insights into why the chattering classes are off target About this.

It’s the 947th consecutive sign that we in the coastal chattering classes have not cured our insularity problem. It’s the 947th case in which we see that every second you spend on Twitter detracts from your knowledge of American politics, and that the only cure to this insularity disease is constant travel and interviewing, close attention to state and local data and raw abject humility about the fact that the attitudes and academic degrees that you think make you clever are actually the attitudes and academic degrees that separate you from the real texture of American life.

Also, the long and wide-ranging [NYT interview)(https://www.nytimes.com/interactive/2020/01/17/opinion/joe-biden-nytimes-interview.html) with him is full of interesting stuff — like that he thinks that Section 230 of the Communications Decency Act (that’s the get-out-of-gaol card for the tech companies) should be revoked. I particularly enjoyed this observation by Brooks: “ Jeremy Corbyn in Britain and Bernie Sanders here are a doctoral student’s idea of a working-class candidate, not an actual working person’s idea of one.”


Linkblog

*

We shape our tools, and afterwards…

In his provocative LARB piece on the intrinsic conservatism of machine learning, Cory Doctorow pointed me to “Instant Recall”, Molly Sauter’s lovely essay, about how the Web has given us “a suite of products and services to programmatically induce reminiscence.”

Apps like Timehop, which presents time-traveled posts from across your social media profiles, or Facebook’s “On This Day” Memories, are attempts to automate and algorithmically define reminiscence, turning the act of remembering into a salable, scalable, consumable, trackable product suite. As the work of memory keeping is offshored, Instagram by Instagram, to social media companies and cloud storage, we are giving up the work of remembering ourselves for the convenience of being reminded.

What’s going on, Sauter says, is that we are being algorithmically fed virtual ‘madelaines’ (those buttery cakes that when when dipped in hot tea were the catalyst for the memories that make up Proust’s À la recherche du temps perdu.) She contrasts this with psychologist Dan McAdams’s contention that remembering is a generative, creative process that is essential for a happy life. What’s important, McAdams argues, is

the creation and maintenance of life narratives, dynamically evolving situated performances that integrate lives in time, providing “an understandable frame for disparate ideas, character, happenings, and other events that were previously set apart.” These stories are subject to constant additive revision, as through living we continually add more material and revise the material available to us, rethinking and rewriting memories as we age. The process of remembering memories rewrites them, revises them, and this ability to re-envision ourselves is a central part of the creation of seemingly stable life narratives that allow for growth and change.

Sauter argues that if we were, somehow, to lose this ability “to both serendipitously and intentionally encounter and creatively engage with our memories, perhaps we would then also lose that re-visionary ability, leaving us narratively stranded amidst our unchanging, unconnected memories”.

It’s a great essay, well worth reading in full. What I like most about it is the way it reminds one of the deeper ways in which digital technology is changing us. “We shape our tools”, as one of Marshall McLuhan’s buddies put it, “and afterwards they shape us“.

In a way, Mark Twain was right when he said that “the older I get, the more clearly I remember things that never happened”.

US immigration uses Google Translate to scan people’s social media for ‘bad’ posts

This is not a good idea. And in contravenes Google’s own advice — which is that anyone using its translation technology add a disclaimer that translated text may not be accurate.

According to a report from ProPublica, USCIS uses these tools to help evaluate whether refugees should be allowed into the US. In so doing, agency personnel are putting their trust in an untrustworthy algorithm to make entry decisions that may have profound consequences for the health and welfare of those seeking admission to the country.

“The translation of these social media posts can mean life or death for refugees seeking to reunite with their family members,” said Betsy Fisher, director of strategy for the International Refugee Assistance Project (IRAP),” in an email to The Register. “It is dangerous to rely on inadequate technology to inform these unreasonable procedures ostensibly used to vet refugees.”

To demonstrate the inaccuracy of Google Translate, ProPublica asked Mustafa Menai, who teaches Urdu at the University of Pennsylvania, to translate a Twitter post written in Urdu. By Menai’s estimation, an accurate English translation would be, “I have been spanked a lot and have also gathered a lot of love (from my parents).”

Google Translate’s rendering of the post is, “The beating is too big and the love is too windy.”

The moral: Translate is wonderful; but don’t bet your life on it.

Source: The Register

Excavating AI

Fabulous essay by Kate Crawford and Trevor Paglen, uncovering the politics and biases embedded in the guge image databases that have been used for training machine learning software. Here’s how it begins:

You open up a database of pictures used to train artificial intelligence systems. At first, things seem straightforward. You’re met with thousands of images: apples and oranges, birds, dogs, horses, mountains, clouds, houses, and street signs. But as you probe further into the dataset, people begin to appear: cheerleaders, scuba divers, welders, Boy Scouts, fire walkers, and flower girls. Things get strange: A photograph of a woman smiling in a bikini is labeled a “slattern, slut, slovenly woman, trollop.” A young man drinking beer is categorized as an “alcoholic, alky, dipsomaniac, boozer, lush, soaker, souse.” A child wearing sunglasses is classified as a “failure, loser, non-starter, unsuccessful person.” You’re looking at the “person” category in a dataset called ImageNet, one of the most widely used training sets for machine learning.

Something is wrong with this picture.

Where did these images come from? Why were the people in the photos labeled this way? What sorts of politics are at work when pictures are paired with labels, and what are the implications when they are used to train technical systems?

In short, how did we get here?

The authors begin with a deceptively simple question: What work do images do in AI systems? What are computers meant to recognize in an image and what is misrecognised or even completely invisible? They examine the methods used for introducing images into computer systems and look at “how taxonomies order the foundational concepts that will become intelligible to a computer system”. Then they turn to the question of labeling: “how do humans tell computers which words will relate to a given image? And what is at stake in the way AI systems use these labels to classify humans, including by race, gender, emotions, ability, sexuality, and personality?” And finally, they turn to examine the purposes that computer vision is meant to serve in our society and interrogate the judgments, choices, and consequences of providing computers with these capacities.

This is a really insightful and sobering essay, based on extensive research.

Some time ago Crawford and Paglen created an experimental website — ImageNet Roulette — which enabled anyone to upload their photograph and then pulled up from the ImageNet database how the person would be classified based on their photograph. The site is now offline, but the Guardian journalist Julia Carrie Wong wrote an interesting article about it recently in the course of which she investigated how it would classify/describe her from her Guardian byline photo. Here’s what she found.

Interesting ne c’est pas? Remember, this is the technology underpinning facial recognition.

Do read the whole thing.