Wednesday 19 August, 2020

Posted on August 19, 2020 by jjn1

Quote of the Day

“He is the man who sits in the outer office of the White House hoping to hear the President sneeze”.

H.L. Mencken, writing about the Vice President, 29 January 1956.

Musical alternative to the morning’s radio news

Anne-Sophie Mutter, Daniel Barenboim, Yo-Yo Ma:
Beethoven: Triple Concerto in C Major, Op. 56 No. 2

5 minutes and 22 seconds of pure bliss.

Note: I’ve decided that the embedded links that I’ve been providing up to now create more problems for some readers than they’re worth. So henceforth each musical interlude will just have a simple URL link. Often, simplest is best.

A European at Stanford

Terrific New Yorker profile of the Dutch politician (and former MEP) Marietje Schaake and what she found when she entered the belly of the Silicon Valley beast.

In conversation and lectures, Schaake often describes herself as an alien, as if she were an anthropologist from a distant world studying the local rites of Silicon Valley. Last fall, not long after she’d settled in, she noticed one particularly strange custom: at parties and campus lectures, she would be introduced to people and told their net worth. “It would be, like, ‘Oh, this is John. He’s worth x millions of dollars. He started this company,’ ” she said. “Money is presented as a qualification of success, which seems to be measured in dollars.” Sometimes people would meet her and launch directly into pitching her their companies. “I think people figure, if you’re connected with Stanford, you must have some interest in venture capital and startups. They don’t bother to find out who you are before starting the sales pitch.”

These experiences spoke to a pervasive blurring between the corporate and the academic, which she saw almost everywhere at Stanford. The university is deeply embedded in the corporate life of Silicon Valley and has been directly enriched by many of the companies that Schaake would like to see regulated more heavily and broken apart; H.A.I., according to one of its directors, receives roughly thirteen per cent of its pledged gifts from tech firms, and a majority of its funding from individuals and companies. The names of wealthy donors on buildings and institutes,the department chairs endowed by corporations, the enormous profits from high tuition prices—none of this happened at her alma mater, the University of Amsterdam, where tuition is highly subsidized and public funding supports the operating expenses of the university. (The University of Amsterdam, of course, is not internationally known as an incubator of startups and a hotbed of innovation.) Beyond Stanford, the contrasts seemed just as stark. Roughly sixty per cent of housing in Amsterdam is publicly subsidized. The main street running through Palo Alto, by contrast, is lined with dozens of old R.V.s, vans, and trailers, in which many semi-homeless service workers live. The public middle school in Menlo Park, where Schaake now resides, has students who are homeless, although the area’s average home value is almost $2.5 million.

When I first heard that she was going to Stanford, I feared for her sanity. Having read this, I think she’ll be ok. Her bullshit detector is still in good working order.

Scream if you want to go faster: Johnson in Cummingsland

This is my long read of the day. Terrific essay by Rachel Coldicutt

The emergence of a patchwork of UK innovation initiatives over the last few months is notable. Rather than fiddling with increments of investment, there is a commitment to large-scale, world-leading innovation and enthusiasm for the potential of data.

But there is also a culture of opacity and bluster, a repeated lack of effectiveness, and a tendency to do secret deals with preferred suppliers. Taken together with the lack of a public strategy, this has led to a lot of speculation, a fair few conspiracy theories, and a great deal of concern about the social impact of collecting, keeping, and centralising data.

But it seems very possible that there is actually no big plan — conspiratorial or otherwise. In going through speeches and policy documents, I have found no vision for society —save the occasional murmur of “Levelling Up” — and plenty of evidence of a fixation with the mechanics of government.

This is a technocractic revolution, not a political one, driven by a desire to obliterate bureaucracy, centralise power, and increase improvisation.

And this obsession with process has led to a complete disregard for outcomes.

The thing about Cummings — and the data-analytics crowd generally — is that they know nothing of how society actually works, and subscribe to a crippled epistemology which leads them to think that the more data you have, the more perfect your knowledge of the world.

Actually, most of them don’t even realise they have an epistemology.

Furloughed Brits got paid not to work—but two-thirds of them worked anyway

From Quartz:

Economists at the universities of Oxford, Zurich, and Cambridge looked into the UK furlough program, which supports one-third of the country’s workforce, accounting for more than 9 million jobs, furloughed by mid-June 2020. Under the scheme, the UK government pays workers up to 80% of their salary for a limited period of time, allowing companies to retain them without paying them—though companies were allowed to top up the government money.

Until July 1st, the plan also specifically prohibited workers from working for their employers when on the scheme. But the researchers, who surveyed over 4,000 people in two waves in April and May 2020, discovered a striking fact: Only 37% of furloughed workers reported doing no work at all for their employers during that time.

In some sectors, the imperative to work definitely came from employers. In the the sector termed “computer and mathematical,” 44% of those surveyed said they had been asked to work despite being furloughed.

But it also seems that many employees chose to work because they wanted to. Two-thirds of all workers said they had done some work despite being on furlough, even though only 20% were actually asked to. Perhaps unsurprisingly, those on higher salaries, those able to work from home, and those with the most flexible contracts were most likely to do some work.

One in five college students don’t plan to go back this fall

As the coronavirus pandemic pushes more and more universities to switch to remote learning — at least to start — 22% of college students across all four years are planning not to enroll this fall, according to a new College Reaction/Axios poll.

Among other things, the report claims that 20% of Harvard undergraduates have decided to defer for a year. Harvard (a hedge fund with a nice university attached) can ride out that kind of dropout, but many poorer institutions will struggle.

Source: Axios

Summer books #8

Puligny-Montrachet: Journal of a Village in Burgundy by Simon Loftus, Daunt Books, 2019.

If you like France, or wine, or (like me) both then you’ll enjoy this eccentric but utterly charming social history of the village (well, pair of villages) from which some of the country’s finest dry white wine comes. Loftus is a very good social historian, and his account of what are, in most respects, unglamorous villages is both affectionate and unsentimental. Some good friends of mine, when driving home to Holland from Provence, always used to have an overnight stop in Puligny, from which they would depart the following morning with a car boot full of the most wonderful wine. My fond hope is that, when the plague recedes a bit, we might one day do the same.

This blog is also available as a daily email. If you think this might suit you better, why not subscribe? One email a day, delivered to your inbox at 7am UK time. It’s free, and there’s a one-click unsubscribe if you decide that your inbox is full enough already!

The Feynman trap

Posted on January 13, 2019 by jjn1

Gary Smith, writing in Wired:

Nobel laureate Richard Feynman once asked his Caltech students to calculate the probability that, if he walked outside the classroom, the first car in the parking lot would have a specific license plate, say 6ZNA74. Assuming every number and letter are equally likely and determined independently, the students estimated the probability to be less than 1 in 17 million. When the students finished their calculations, Feynman revealed that the correct probability was 1: He had seen this license plate on his way into class. Something extremely unlikely is not unlikely at all if it has already happened.

The Feynman trap—ransacking data for patterns without any preconceived idea of what one is looking for—is the Achilles heel of studies based on data mining. Finding something unusual or surprising after it has already occurred is neither unusual nor surprising. Patterns are sure to be found, and are likely to be misleading, absurd, or worse.

Lots of other examples.

The moral? “Good research begins with a clear idea of what one is looking for and expects to find. Data mining just looks for patterns and inevitably finds some.”

How companies are addressing machine learning

Posted on January 3, 2019 by jjn1

From an O’Reilly newsletter:

In a recent O’Reilly survey, we found that the skills gap remains one of the key challenges holding back the adoption of machine learning. The demand for data skills (“the sexiest job of the 21st century”) hasn’t dissipated—LinkedIn recently found that demand for data scientists in the US is “off the charts,” and our survey indicated that the demand for data scientists and data engineers is strong not just in the US but globally.

With the average shelf life of a skill today at less than five years and the cost to replace an employee estimated at between six and nine months of the position’s salary, there’s increasing pressure on tech leaders to retain and upskill rather than replace their employees in order to keep data projects (such as machine learning implementations) on track. We’re also seeing more training programs aimed at executives and decision makers, who need to understand how these new ML technologies can impact their current operations and products.

Beyond investments in narrowing the skills gap, companies are beginning to put processes in place for their data science projects, for example creating analytics centers of excellence that centralize capabilities and share best practices. Some companies are also actively maintaining a portfolio of use cases and opportunities for ML.

Note the average shelf-life of a skill and then ponder why the UK government is not boosting the Open University.

Deep-fat data frying

Posted on October 16, 2016 by jjn1

This morning’s Observer column:

The tech craze du jour is machine learning (ML). Billions of dollars of venture capital are being poured into it. All the big tech companies are deep into it. Every computer science student doing a PhD on it is assured of lucrative employment after graduation at his or her pick of technology companies. One of the most popular courses at Stanford is CS229: Machine Learning. Newspapers and magazines extol the wonders of the technology. ML is the magic sauce that enables Amazon to know what you might want to buy next, and Netflix to guess which films might interest you, given your recent viewing history.

To non-geeks, ML is impenetrable, and therefore intimidating…

Read on

So what was Google smoking when it bought Boston Dynamics?

Posted on June 5, 2016 by jjn1

This morning’s Observer column:

The question on everyone’s mind as Google hoovered up robotics companies was: what the hell was a search company doing getting involved in this business? Now we know: it didn’t have a clue.

Last week, Bloomberg revealed that Google was putting Boston Dynamics up for sale. The official reason for unloading it is that senior executives in Alphabet, Google’s holding company, had concluded (correctly) that Boston Dynamics was years away from producing a marketable product and so was deemed disposable. Two possible buyers have been named so far – Toyota and Amazon. Both make sense for the obvious reason that they are already heavy users of robots and it’s clear that Amazon in particular would dearly love to get rid of humans in its warehouses at the earliest possible opportunity…

Read on

The next Brain Drain

Posted on April 12, 2016 by jjn1

The Economist has an interesting article on how major universities are now having trouble holding on to their machine-learning and AI academics. As the industrial frenzy about these technologies mounts, this is perfectly understandable, though it’s now getting to absurd proportions. The Economist claims, for example, that some postgraduate students are being lured away – by salaries “similar to those fetched by professional athletes” – even before they complete their doctorates. And Uber lured “40 of the 140 staff of the National Robotics Engineering Centre at Carnegie Mellon University, and set up a unit to work on self-driving cars”.

All of which is predictable: we’ve seen it happen before, for example, with researchers who have data-analytics skillsets. But it raises several questions.

The first is whether this brain brain will, in the end, turn out to be self-defeating? After all, the graduate students of today are the professors of tomorrow. And since, in the end, most of the research and development done in companies tends to be applied, who will do the ‘pure’ research on which major advances in many fields depend?

Secondly, and related to that, since most industrial R&D is done behind patent and other intellectual-property firewalls, what happens to the free exchange of ideas on which intellectual progress ultimately depends? In that context, for example, it’s interesting to see the way in which Google’s ownership of Deepmind seems to be beginning to constrain the freedom of expression of its admirable co-founder, Demis Hassabis.

Thirdly, since these technologies appear to have staggering potential for increasing algorithmic power and perhaps even changing the relationship between humanity and its machines, the brain drain from academia – with its commitment to open enquiry, sensitivity to ethical issues, and so on – to the commercial sector (which traditionally has very little interest in any of these things) is worrying.

Levelling the playing field

Posted on February 1, 2015 by jjn1

This morning’s Observer column:

Whenever regulators gather to discuss market failures, the cliche “level playing field” eventually surfaces. When regulators finally get around to thinking about what happens in the online world, especially in the area of personal data, then they will have to come to terms with the fact that the playing field is not just tilted in favour of the online giants, but is as vertical as that rockface in Yosemite that two Americans have finally managed to free climb.

The mechanism for rotating the playing field is our old friend, the terms and conditions agreement, usually called the “end user licence agreement” (EULA) in cyberspace. This invariably consists of three coats of prime legal verbiage distributed over 32 pages, which basically comes down to this: “If you want to do business with us, then you will do it entirely on our terms; click here to agree, otherwise go screw yourself. Oh, and by the way, all of your personal data revealed in your interactions with us belongs to us.”

The strange thing is that this formula applies regardless of whether you are actually trying to purchase something from the author of the EULA or merely trying to avail yourself of its “free” services.

When the history of this period comes to be written, our great-grandchildren will marvel at the fact that billions of apparently sane individuals passively accepted this grotesquely asymmetrical deal. (They may also wonder why our governments have shown so little interest in the matter.)…

Read on

Big Data and intriguing correlations

Posted on November 13, 2014 by jjn1

Yesterday I gave a talk about so-called ‘Big Data’ to a group of senior executives. At one stage I used the famous Walmart pop-tart discovery as an example of how organisations sometimes discover things they didn’t know by mining their data. But now comes an equally intriguing data-mined discovery — from Alibaba:

Earlier this summer, a group of data crunchers looking at underwear sales at Alibaba came across a curious trend: women who bought larger bra sizes also tended to spend more (link in Chinese). Dividing intimate-apparel shoppers into four categories of spending power, analysts at the e-commerce giant found that 65% of women of cup size B fell into the “low” spend category, while those of a size C or higher mostly fit into the “middle” or higher group.

The explanation might be fairly straightforward: it could be that the data merely demonstrate that younger women have less spending power, for instance. But Alibaba is deep into this data-mining stuff. The report claims that last year the company set up a Big Data unit with 800 employees. It also quotes a Gartner factoid that currently less than 5% of ecommerce companies are using data analytics.

Facebook, ethics and us, its hapless (and hypocritical?) users

Posted on July 6, 2014 by jjn1

This morning’s Observer column about the Facebook ’emotional contagion’ experiment.

The arguments about whether the experiment was unethical reveal the extent to which big data is changing our regulatory landscape. Many of the activities that large-scale data analytics now make possible are undoubtedly “legal” simply because our laws are so far behind the curve. Our data-protection regimes protect specific types of personal information, but data analytics enables corporations and governments to build up very revealing information “mosaics” about individuals by assembling large numbers of the digital traces that we all leave in cyberspace. And none of those traces has legal protection at the moment.

Besides, the idea that corporations might behave ethically is as absurd as the proposition that cats should respect the rights of small mammals. Cats do what cats do: kill other creatures. Corporations do what corporations do: maximise revenues and shareholder value and stay within the law. Facebook may be on the extreme end of corporate sociopathy, but really it’s just the exception that proves the rule.

danah boyd has a typically insightful blog post about this.

She points out that there are all kinds of undiscussed contradictions in this stuff. Most if not all of the media business (off- and online) involves trying to influence people’s emotions, but we rarely talk about this. But when an online company does it, and explains why, then there’s a row.

Facebook actively alters the content you see. Most people focus on the practice of marketing, but most of what Facebook’s algorithms do involve curating content to provide you with what they think you want to see. Facebook algorithmically determines which of your friends’ posts you see. They don’t do this for marketing reasons. They do this because they want you to want to come back to the site day after day. They want you to be happy. They don’t want you to be overwhelmed. Their everyday algorithms are meant to manipulate your emotions. What factors go into this? We don’t know.

But…

Facebook is not alone in algorithmically predicting what content you wish to see. Any recommendation system or curatorial system is prioritizing some content over others. But let’s compare what we glean from this study with standard practice. Most sites, from major news media to social media, have some algorithm that shows you the content that people click on the most. This is what drives media entities to produce listicals, flashy headlines, and car crash news stories. What do you think garners more traffic – a detailed analysis of what’s happening in Syria or 29 pictures of the cutest members of the animal kingdom? Part of what media learned long ago is that fear and salacious gossip sell papers. 4chan taught us that grotesque imagery and cute kittens work too. What this means online is that stories about child abductions, dangerous islands filled with snakes, and celebrity sex tape scandals are often the most clicked on, retweeted, favorited, etc. So an entire industry has emerged to produce crappy click bait content under the banner of “news.”

Guess what? When people are surrounded by fear-mongering news media, they get anxious. They fear the wrong things. Moral panics emerge. And yet, we as a society believe that it’s totally acceptable for news media – and its click bait brethren – to manipulate people’s emotions through the headlines they produce and the content they cover. And we generally accept that algorithmic curators are perfectly well within their right to prioritize that heavily clicked content over others, regardless of the psychological toll on individuals or the society. What makes their practice different? (Other than the fact that the media wouldn’t hold itself accountable for its own manipulative practices…)

Somehow, shrugging our shoulders and saying that we promoted content because it was popular is acceptable because those actors don’t voice that their intention is to manipulate your emotions so that you keep viewing their reporting and advertisements. And it’s also acceptable to manipulate people for advertising because that’s just business. But when researchers admit that they’re trying to learn if they can manipulate people’s emotions, they’re shunned. What this suggests is that the practice is acceptable, but admitting the intention and being transparent about the process is not.

Big Data: the Rorschach Blot de nos jours

Posted on June 8, 2014 by jjn1

An Observer essay on one of the obsessions of our times. Published today.

Memex 1.1

John Naughton's online diary

Category Archives: Big Data

Wednesday 19 August, 2020

Quote of the Day

Musical alternative to the morning’s radio news

A European at Stanford

Scream if you want to go faster: Johnson in Cummingsland

Furloughed Brits got paid not to work—but two-thirds of them worked anyway

One in five college students don’t plan to go back this fall

Summer books #8

The Feynman trap

How companies are addressing machine learning

Deep-fat data frying

So what was Google smoking when it bought Boston Dynamics?

The next Brain Drain

Levelling the playing field

Big Data and intriguing correlations

Big Data: the Rorschach Blot de nos jours