DK_en 1x01 - You're looking gay, today

Can AI really guess whether you are gay? So far, it's a bullshit claim, nonetheless, some are making it. They claim to be warning us, but with their bad science and ethics, they lend credibility to the very practices they claim to fear.

Share
DK_en 1x01 - You're looking gay, today
Photo by Quino Al / Unsplash

Episode first aired on September 14, 2017. Listen to the audio on Spreaker.com

There's been quite some buzz online, so today's title is, inevitably:

You're Looking gay, Today

MUSIC: Village People

You may remember some Michal Kosinsky, a Polish student gone to Cambdridge where
he stumbled upon a database of online personality profiles. This was somebody
else's research, collected through the MyPersonality Facebook app.

Kosinsky, though, had the intuition to match each participant's Facebook likes
against his computed personality profile. Then, having discovered that some
correlation existed, Kosinsky made the big jump, to ditch the personality
profiles and directly use Facebook likes as a direct proxy for the personality
profile. A bold move, in research terms. A bit of a hazard, in layman's terms.
If you ask me, it's a festering turd disguised as research.

As things go, such pile of shit won its author a teaching position at
Stanford. School of Business, yeah, but still effing Stanford.

That was then and this is now. You surely have come across one of these titles:

As has already been pointed out, the claim is bullshit. Note, I don't mean the
headings. For all that can be said about bad science press, this is not the
case. These titles accurately report the main claim of the research they cover.
In this instance the press is accurately, even if not critically enough,
reporting some piece of news. It's just too bad the piece of news in question is
more disproportionate claim, more wishful thinking than actual research. Maybe
get brighter referees next time, guys.

Let's start at the beginning, the paper's abstract. I quote:

We show that faces contain much more information about sexual orientation than
can be 32 perceived and interpreted by the human brain.

End quote.
Now, this statement is false. There is nothing in the research (or at least in
the paper) that supports this specific claim:

  • the algorithm does not identify gays (even if the paper repeatedly
    states it does)
  • the algorithm does point out the more gay-looking between two specific
    pictures, one of an out gay, the other of a straight person
  • the sexual orientation of the people in the pictures was self-professed; you
    draw your own conclusions, eh?
  • gay-looking is what the neural network has been trained to spot by being
    given several thousand photos of out gays, both male and female; the photos have
    been scraped from a dating site which the authors will not divulge "for fear of
    ill-intentioned people replicating their research; now,how thoughtful of them
  • the pictures are not taken under controlled conditions, they are scraped from a
    dating site, under the unwarranted assumption that they will provide a
    truthful, objective representation of the subject's appearance. Instead,
    conventions may play a greater role than sexual orientation; in other terms, on
    a dating site you want to come out unequivocally as what you claim to be,
    whatever your sexual orientation; and what you claim to be is not necessarily
    you, but your idea of the most marketable version of yourself. so, we can
    reasonably expect that dating site pictures are much more stereotyped than real
    life ones or, once again, from pictures taken under controlled conditions, as
    befit proper research

of course, the research tells us nothing regarding this issue.

  • granted, there is a definite possibility that this research does tell us
    something regarding profile picture choice on dating sites, rather than on
    sexual orientation; but this is not what the research claims
  • citing issues of availability, all faces were caucasian, which is how
    Americans who've probably never been in the Caucasus call white people to
    avoid sounding race-oriented.

last, but not least, the research pitched a trained algorithm against untrained
humans, specifically Amazon Mechanical Turk workers.

There's also a bunch of other highly problematic issues, of course, from the
laughable choice to cast sexual orientation as a binary, to the complete
disregard for any possible role of social constructs, but all this has been
covered in the best possible way by Greggor Mattson in a post titled "Artificial Intelligence Discovers Gayface which you really should read.

So I won't spoil other methodological surprises, because I want to focus on
something else.

MUSIC: GLORIA GAYNOR - I WILL SURVIVE

Approximately the last two pages of the paper (plus several other pages of notes,
made available separately) are used to stress that the research is
well-intentioned, and actually means to warn us against the possibility that
intolerant regimes may use similar methods for repressive purposes.

Well, other than being a perfect example of preemptive excuse, the claim
comes from the same Kosinsky who was an advisor for Faception, a startup that
intended to use face recognition technology for nothing less than spotting
terrorists.

Anyway, if the stated intention held, there really was no need to scrape a
dating website, apply dubious methodology and present statistically weak
evidence. It would have been much more effective to collect data in controlled
laboratory conditions, and see what kind of conclusions could be reached.

From such solid evidence, it would have been much easier to say:

you know what, similar if less precise data can be obtained from surveillance
cameras or even publicly-accessible websites, so maybe we could all ratchet up
our privacy settings a bit

But it wouldn't have been big data enough, and it would not have made the news.

During the weekend, two associations for gay rights have asked Stanford and the
media to reject the research as unfounded. Kosinsky, curiously the second author
after a student, has expressed surprise.

He has also added that rejecting their research on ideological grounds risks
hurting the very people one is trying to help.

Once again, this is bullshit, uttered in the best possible
passive-aggressive language available.

For one thing, it's difficult to understand how surprise can strike somebody who
spends the last two pages of a research paper, plus a dozen more in a separate
document, building aa preemptive defense.

And then, the point is not whether to "publish and risk aiding the bad guys" or
not. The point is that this research has huge methodological flaws and, if this
were not enough, makes wild claims that are simply not proven anywhere.

Just imagine a paper titled "An AI can cook better than an armless
eight-year-old". What this research does is just that,it compares comparing
apples and oranges or, more specifically, a trained Neural Net (look ma, Deep
Learning!) with untrained humans.

Oh, regarding the training set: even if we buy the idea that dating site
profile pics are good, a sample of 50% gay/50% straight/100% white is completely
unrealistic.

Given all this, the algorithm still manages to have a 20% of false positives.
Quite an algorithm.

And we're not even touching issues like assuming sexuality is binary or, God
forbid, if this is an ethically sound approach to a deeply problematic issue.

MUSIC: ELTON JOHN - CROCODILE ROCK

Well, I have issues with stuff like this posing as research:

  • as we said, the algorithm is correct 81% accurate. Sounds like much? It isn't.
    It's actually very little, but here lies the rub: the sample was 50% straight
    and 50% gay. In these conditions 81% ain't bad. A quick calculation on 1,000
    people:
  • of the 500 straight, 400 are correctly labeled as such, and 100 (incorrectly)
    as gay
  • of the 500 gay, 400 are correctly labeled as such, and 100 incorrectly as
    straight.

Really, it doesn't look too bad.

But. In the real world, gay people are not 50% of the population. Percentages
vary, but most research hovers around a 4% figure.

Let's redo the math for 1,000 people, 96% straight, 4% gay:

  • of 960 straight people, 768 are labelled as straight, 192 as gay
  • of the 40 gay, 32 are labelled as such, while 8 are incorrectly labelled
    as straight.

Now, let's put ourselves in the shoes of a highly repressive government, of the
kind Kosinsky fears to aid by publishing his results: Iran, China, or Texas, or
Alabama, make your pick. What does such a government see? It sees an algorithm
that wipes out almost all gays, at the cost of some collateral damage.

MUSIC: COMMUNARDS - TELL ME WHY (?)

Now, we need to take one last step. The root problem here has nothing to do with
gays. Not because I am personally for freedom in orifice choice, but for the
fact that each of us is the gay, the black, the terrorist, the insubordinate,
the undesirable of somebody else.

What this paper does, and this cannot be excused, either scientifically or
ethically, is to promote the idea that people discrimination by algorithm, even
with the limits it shows, is after all a feasible approach.

This paper should not be, as it is, "Deep Neural Networks Are More Accurate Than
Humans At Detecting Sexual Orientation From Facial Images".

This is a bullshit title, and it claims something the authors apparently deeply
long for, but of which there is no trace whatsoever in the paper.

The title should have been something like:

" A Neural Network Is Slightly Better Than Untrained Humans At Spotting
Something Possibly Correlated with Sexual Orientation On An Unrealistic
Dataset, But Only An Idiot Could Go With So Many False Positives, So Let's All
Just Go Have A Drink Instead"

This title would have been accurate, scientifically as well as ethically.

But such a title would not be sexy, would not make the news, and quite possibly
would not get published.

This is not only junk science at a level unheard of since the good old times of
phrenology and physiognomy. This is fucking marketing.

Once more, the media have been played by a smart researcher with the ethical
stature of a non-stick frying pan. Somebody who, though claiming authorship for
actually writing the paper, curiously prefers to appear as a co-author after a
student, but still responds to interviews and gets most of the limelight.

Remember this name, Michal Kosinsky. I'm sure we'll hear more from him.
Unfortunately.

Oh, and another thing.
Last March PLOS Computational Biology published "Ten Rules for Responsible Big
Data Research". Rule number 5 said: Consider the strenghts and limitations of
your data; big does not automatically mean better.

'nuff said.