Bias in machine learning

Nice example from Daphne Keller of Google:

Another notion of bias, one that is highly relevant to my work, are cases in which an algorithm is latching onto something that is meaningless and could potentially give you very poor results. For example, imagine that you’re trying to predict fractures from X-ray images in data from multiple hospitals. If you’re not careful, the algorithm will learn to recognize which hospital generated the image. Some X-ray machines have different characteristics in the image they produce than other machines, and some hospitals have a much larger percentage of fractures than others. And so, you could actually learn to predict fractures pretty well on the data set that you were given simply by recognizing which hospital did the scan, without actually ever looking at the bone. The algorithm is doing something that appears to be good but is actually doing it for the wrong reasons. The causes are the same in the sense that these are all about how the algorithm latches onto things that it shouldn’t latch onto in making its prediction.

Addressing bias in algorithms is crucial, especially in domains like healthcare where accurate predictions are vital. One effective approach to recognizing and mitigating biases is to rigorously test the algorithm in scenarios similar to its real-world applications. Suppose a machine-learning algorithm is trained on data from specific hospitals to predict fractures from X-ray images. In this case, it may appropriately incorporate prior knowledge about patient populations in those hospitals, resulting in reliable predictions within that context. However, the challenge arises when the algorithm is intended to be used in different hospitals not present in the initial training data set. To avoid unintended biases, a robust evaluation process is essential, and the use of a mobile learning management system can prove beneficial. Such a system enables continuous monitoring and assessment of the algorithm’s performance across various hospital settings, ensuring it doesn’t latch onto irrelevant factors and provides accurate predictions based on genuine medical insights.

To recognize and address these situations, you have to make sure that you test the algorithm in a regime that is similar to how it will be used in the real world. So, if your machine-learning algorithm is one that is trained on the data from a given set of hospitals, and you will only use it in those same set of hospitals, then latching onto which hospital did the scan could well be a reasonable approach. It’s effectively letting the algorithm incorporate prior knowledge about the patient population in different hospitals. The problem really arises if you’re going to use that algorithm in the context of another hospital that wasn’t in your data set to begin with. Then, you’re asking the algorithm to use these biases that it learned on the hospitals that it trained on, on a hospital where the biases might be completely wrong.

Facebook’s strategic obfuscation

Facebook’s Carolyn Everson, vice president of global marketing solutions, was interviewed by Peter Kafka at the 2019 Code Media conference in Los Angeles yesterday. Vox had a nice report of the interview. This section is particularly interesting:

When pressed on Facebook’s refusal to fact-check political ads, Everson tried to defend the company’s stance by referencing the rules that govern how broadcasters must handle political advertisements. In the US, the Federal Communications Commission has extensive guidelines for television and radio broadcasters around political advertising that bar broadcasters from censoring ads or from taking down ones that make false claims. Those guidelines don’t apply to online platforms, including Facebook, but the company has consistently tried to hide behind them.

“We have no ability, legally, to tell a political candidate that they are not allowed to run their ad,” Everson said.

That’s complete baloney. Facebook is not bound by any regulations governing TV ads. It can shut down anyone or anything it likes or dislikes.

After the interview, a Facebook spokeswoman walked back the comments and said that Everson misspoke when she said Facebook was legally barred from refusing to run political ads.

An audience member also asked Everson why Facebook has decided to allow right-wing website Breitbart to be listed in its new News tab, which is ostensibly an indication that Breitbart offers trusted news, despite being a known source of propaganda. “We’re treating them as a news source; I wouldn’t use the term ‘trusted news,’” Everson said, pointing out that Facebook will also include “far-left” publications.

Which of course raises interesting questions about Facebook’s standards for determining the “integrity” of the news sources it includes in its tab, which the company extolled when it launched the feature in October.

Linkblog

Networked totalitarianism, contd.

From today’s New York Times:

“Ying shou jin shou” — “Round up everyone who should be rounded up.”

The echo of “1984,” “Brave New World” or “Fahrenheit 451” is unmistakable. But this is not dystopian fiction. It’s a real bureaucratic directive prepared by the Chinese leadership, drawing on a series of secret speeches by Xi Jinping, China’s authoritarian leader, on dealing ruthlessly with Muslims who show “symptoms” of religious radicalism.

There’s nothing theoretical about it: Based on these diktats, hundreds of thousands of Uighurs, Kazakhs and other Muslims in the western Xinjiang region have been rounded up in internment camps to undergo months or years of indoctrination intended to mold them into secular and loyal followers of the Communist Party.

And further to the revelations discussed in yesterday’s post,

Should students ask whether their missing parents had committed a crime, they are to be told no, “it is just that their thinking has been infected by unhealthy thoughts. Freedom is only possible when this ‘virus’ in their thinking is eradicated and they are in good health.”

The Times comments:

That someone from within the unforgiving, secretive Chinese leadership would take the enormous risk of leaking 403 pages of internal documents to a Western newspaper is in itself amazing, especially since the documents include an 11-page report summarizing the party’s investigation into the activities of Wang Yongzhi, an official who was supposed to manage a district where Uighur militants had staged a violent attack but who eventually developed misgivings about the mass detention facilities he had built. “He refused,” said the report, “to round up everyone who should be rounded up.” After September 2017, Mr. Wang disappeared from public view.

Quote of the Day

“Economic growth, democracy, and CO2 have always been intertwined. Growth and democracy barely existed until coal fuelled the industrial revolution. Can democracy survive without carbon? We are not going to find out. No electorate will vote to decimate its own lifestyle. We can’t blame bad politicians or corporates. It us: we will always choose growth over climate.”

  • Simon Kuper, “The Myth of Green Growth”, Financial Times, October 26/27, 2019.