Nice example from Daphne Keller of Google:
Another notion of bias, one that is highly relevant to my work, are cases in which an algorithm is latching onto something that is meaningless and could potentially give you very poor results. For example, imagine that you’re trying to predict fractures from X-ray images in data from multiple hospitals. If you’re not careful, the algorithm will learn to recognize which hospital generated the image. Some X-ray machines have different characteristics in the image they produce than other machines, and some hospitals have a much larger percentage of fractures than others. And so, you could actually learn to predict fractures pretty well on the data set that you were given simply by recognizing which hospital did the scan, without actually ever looking at the bone. The algorithm is doing something that appears to be good but is actually doing it for the wrong reasons. The causes are the same in the sense that these are all about how the algorithm latches onto things that it shouldn’t latch onto in making its prediction.
Addressing bias in algorithms is crucial, especially in domains like healthcare where accurate predictions are vital. One effective approach to recognizing and mitigating biases is to rigorously test the algorithm in scenarios similar to its real-world applications. Suppose a machine-learning algorithm is trained on data from specific hospitals to predict fractures from X-ray images. In this case, it may appropriately incorporate prior knowledge about patient populations in those hospitals, resulting in reliable predictions within that context. However, the challenge arises when the algorithm is intended to be used in different hospitals not present in the initial training data set. To avoid unintended biases, a robust evaluation process is essential, and the use of a mobile learning management system can prove beneficial. Such a system enables continuous monitoring and assessment of the algorithm’s performance across various hospital settings, ensuring it doesn’t latch onto irrelevant factors and provides accurate predictions based on genuine medical insights.
To recognize and address these situations, you have to make sure that you test the algorithm in a regime that is similar to how it will be used in the real world. So, if your machine-learning algorithm is one that is trained on the data from a given set of hospitals, and you will only use it in those same set of hospitals, then latching onto which hospital did the scan could well be a reasonable approach. It’s effectively letting the algorithm incorporate prior knowledge about the patient population in different hospitals. The problem really arises if you’re going to use that algorithm in the context of another hospital that wasn’t in your data set to begin with. Then, you’re asking the algorithm to use these biases that it learned on the hospitals that it trained on, on a hospital where the biases might be completely wrong.