Deep-fat data frying

This morning’s Observer column:

The tech craze du jour is machine learning (ML). Billions of dollars of venture capital are being poured into it. All the big tech companies are deep into it. Every computer science student doing a PhD on it is assured of lucrative employment after graduation at his or her pick of technology companies. One of the most popular courses at Stanford is CS229: Machine Learning. Newspapers and magazines extol the wonders of the technology. ML is the magic sauce that enables Amazon to know what you might want to buy next, and Netflix to guess which films might interest you, given your recent viewing history.

To non-geeks, ML is impenetrable, and therefore intimidating…

Read on

So what was Google smoking when it bought Boston Dynamics?

This morning’s Observer column:

The question on everyone’s mind as Google hoovered up robotics companies was: what the hell was a search company doing getting involved in this business? Now we know: it didn’t have a clue.

Last week, Bloomberg revealed that Google was putting Boston Dynamics up for sale. The official reason for unloading it is that senior executives in Alphabet, Google’s holding company, had concluded (correctly) that Boston Dynamics was years away from producing a marketable product and so was deemed disposable. Two possible buyers have been named so far – Toyota and Amazon. Both make sense for the obvious reason that they are already heavy users of robots and it’s clear that Amazon in particular would dearly love to get rid of humans in its warehouses at the earliest possible opportunity…

Read on

The next Brain Drain

The Economist has an interesting article on how major universities are now having trouble holding on to their machine-learning and AI academics. As the industrial frenzy about these technologies mounts, this is perfectly understandable, though it’s now getting to absurd proportions. The Economist claims, for example, that some postgraduate students are being lured away – by salaries “similar to those fetched by professional athletes” – even before they complete their doctorates. And Uber lured “40 of the 140 staff of the National Robotics Engineering Centre at Carnegie Mellon University, and set up a unit to work on self-driving cars”.

All of which is predictable: we’ve seen it happen before, for example, with researchers who have data-analytics skillsets. But it raises several questions.

The first is whether this brain brain will, in the end, turn out to be self-defeating? After all, the graduate students of today are the professors of tomorrow. And since, in the end, most of the research and development done in companies tends to be applied, who will do the ‘pure’ research on which major advances in many fields depend?

Secondly, and related to that, since most industrial R&D is done behind patent and other intellectual-property firewalls, what happens to the free exchange of ideas on which intellectual progress ultimately depends? In that context, for example, it’s interesting to see the way in which Google’s ownership of Deepmind seems to be beginning to constrain the freedom of expression of its admirable co-founder, Demis Hassabis.

Thirdly, since these technologies appear to have staggering potential for increasing algorithmic power and perhaps even changing the relationship between humanity and its machines, the brain drain from academia – with its commitment to open enquiry, sensitivity to ethical issues, and so on – to the commercial sector (which traditionally has very little interest in any of these things) is worrying.

Levelling the playing field

This morning’s Observer column:

Whenever regulators gather to discuss market failures, the cliche “level playing field” eventually surfaces. When regulators finally get around to thinking about what happens in the online world, especially in the area of personal data, then they will have to come to terms with the fact that the playing field is not just tilted in favour of the online giants, but is as vertical as that rockface in Yosemite that two Americans have finally managed to free climb.

The mechanism for rotating the playing field is our old friend, the terms and conditions agreement, usually called the “end user licence agreement” (EULA) in cyberspace. This invariably consists of three coats of prime legal verbiage distributed over 32 pages, which basically comes down to this: “If you want to do business with us, then you will do it entirely on our terms; click here to agree, otherwise go screw yourself. Oh, and by the way, all of your personal data revealed in your interactions with us belongs to us.”

The strange thing is that this formula applies regardless of whether you are actually trying to purchase something from the author of the EULA or merely trying to avail yourself of its “free” services.

When the history of this period comes to be written, our great-grandchildren will marvel at the fact that billions of apparently sane individuals passively accepted this grotesquely asymmetrical deal. (They may also wonder why our governments have shown so little interest in the matter.)…

Read on

Big Data and intriguing correlations

Yesterday I gave a talk about so-called ‘Big Data’ to a group of senior executives. At one stage I used the famous Walmart pop-tart discovery as an example of how organisations sometimes discover things they didn’t know by mining their data. But now comes an equally intriguing data-mined discovery — from Alibaba:

Earlier this summer, a group of data crunchers looking at underwear sales at Alibaba came across a curious trend: women who bought larger bra sizes also tended to spend more (link in Chinese). Dividing intimate-apparel shoppers into four categories of spending power, analysts at the e-commerce giant found that 65% of women of cup size B fell into the “low” spend category, while those of a size C or higher mostly fit into the “middle” or higher group.

Alibaba_bra_data

The explanation might be fairly straightforward: it could be that the data merely demonstrate that younger women have less spending power, for instance. But Alibaba is deep into this data-mining stuff. The report claims that last year the company set up a Big Data unit with 800 employees. It also quotes a Gartner factoid that currently less than 5% of ecommerce companies are using data analytics.

Facebook, ethics and us, its hapless (and hypocritical?) users

This morning’s Observer column about the Facebook ’emotional contagion’ experiment.

The arguments about whether the experiment was unethical reveal the extent to which big data is changing our regulatory landscape. Many of the activities that large-scale data analytics now make possible are undoubtedly “legal” simply because our laws are so far behind the curve. Our data-protection regimes protect specific types of personal information, but data analytics enables corporations and governments to build up very revealing information “mosaics” about individuals by assembling large numbers of the digital traces that we all leave in cyberspace. And none of those traces has legal protection at the moment.

Besides, the idea that corporations might behave ethically is as absurd as the proposition that cats should respect the rights of small mammals. Cats do what cats do: kill other creatures. Corporations do what corporations do: maximise revenues and shareholder value and stay within the law. Facebook may be on the extreme end of corporate sociopathy, but really it’s just the exception that proves the rule.

danah boyd has a typically insightful blog post about this.

She points out that there are all kinds of undiscussed contradictions in this stuff. Most if not all of the media business (off- and online) involves trying to influence people’s emotions, but we rarely talk about this. But when an online company does it, and explains why, then there’s a row.

Facebook actively alters the content you see. Most people focus on the practice of marketing, but most of what Facebook’s algorithms do involve curating content to provide you with what they think you want to see. Facebook algorithmically determines which of your friends’ posts you see. They don’t do this for marketing reasons. They do this because they want you to want to come back to the site day after day. They want you to be happy. They don’t want you to be overwhelmed. Their everyday algorithms are meant to manipulate your emotions. What factors go into this? We don’t know.

But…

Facebook is not alone in algorithmically predicting what content you wish to see. Any recommendation system or curatorial system is prioritizing some content over others. But let’s compare what we glean from this study with standard practice. Most sites, from major news media to social media, have some algorithm that shows you the content that people click on the most. This is what drives media entities to produce listicals, flashy headlines, and car crash news stories. What do you think garners more traffic – a detailed analysis of what’s happening in Syria or 29 pictures of the cutest members of the animal kingdom? Part of what media learned long ago is that fear and salacious gossip sell papers. 4chan taught us that grotesque imagery and cute kittens work too. What this means online is that stories about child abductions, dangerous islands filled with snakes, and celebrity sex tape scandals are often the most clicked on, retweeted, favorited, etc. So an entire industry has emerged to produce crappy click bait content under the banner of “news.”

Guess what? When people are surrounded by fear-mongering news media, they get anxious. They fear the wrong things. Moral panics emerge. And yet, we as a society believe that it’s totally acceptable for news media – and its click bait brethren – to manipulate people’s emotions through the headlines they produce and the content they cover. And we generally accept that algorithmic curators are perfectly well within their right to prioritize that heavily clicked content over others, regardless of the psychological toll on individuals or the society. What makes their practice different? (Other than the fact that the media wouldn’t hold itself accountable for its own manipulative practices…)

Somehow, shrugging our shoulders and saying that we promoted content because it was popular is acceptable because those actors don’t voice that their intention is to manipulate your emotions so that you keep viewing their reporting and advertisements. And it’s also acceptable to manipulate people for advertising because that’s just business. But when researchers admit that they’re trying to learn if they can manipulate people’s emotions, they’re shunned. What this suggests is that the practice is acceptable, but admitting the intention and being transparent about the process is not.

Our Kafkaesque world

This morning’s Observer column.

When searching for an adjective to describe our comprehensively surveilled networked world – the one bookmarked by the NSA at one end and by Google, Facebook, Yahoo and co at the other – “Orwellian” is the word that people generally reach for.

But “Kafkaesque” seems more appropriate. The term is conventionally defined as “having a nightmarishly complex, bizarre, or illogical quality”, but Frederick Karl, Franz Kafka’s most assiduous biographer, regarded that as missing the point. “What’s Kafkaesque,” he once told the New York Times, “is when you enter a surreal world in which all your control patterns, all your plans, the whole way in which you have configured your own behaviour, begins to fall to pieces, when you find yourself against a force that does not lend itself to the way you perceive the world.”

A vivid description of this was provided recently by Janet Vertesi, a sociologist at Princeton University. She gave a talk at a conference describing her experience of trying to keep her pregnancy secret from marketers…

Read on

Big Data and the Hype Cycle

This morning’s Observer column.

As the “big data” bandwagon gathers steam, it’s appropriate to ask where it currently sits on the hype cycle. The answer depends on which domain of application we’re talking about. If it’s the application of large-scale data analytics for commercial purposes, then many of the big corporations, especially the internet giants, are already into phase four. The same holds if the domain consists of the data-intensive sciences such as genomics, astrophysics and particle physics: the torrents of data being generated in these fields lie far beyond the processing capabilities of mere humans.

But the big data evangelists have wider horizons than science and business: they see the technology as a tool for increasing our understanding of society and human behaviour and for improving public policy-making. After all, if your shtick is “evidence-based policy-making”, then the more evidence you have, the better. And since big data can provide tons of evidence, what’s not to like?

So where on the hype cycle do societal applications of big data technology currently sit? The answer is phase one, the rapid ascent to the peak of inflated expectations, that period when people believe every positive rumour they hear and are deaf to sceptics and critics…

Read on