DK_en 2x05 - The italian Garante and chatGPT
Episode first aired on 3 April, 2023. Listen on Spreaker.com.
In a dramatic break with tradition, the Garante moved against ChatGPT immediately, strongly, and in the right way. This triggered a series of equally strong reactions, for the wrong reasons. Today, we'll try to understand how things stand, and you'll get my opinion as a bonus.
This is going to be an somewhat articulate episode because first I'll need go through what exactly the Garante said, and then I'll try to answer the many pub-level objections already circulating on the net (and that's fine) and in the media (and that's less fine).
Personally, I am happy with the decision of the Garante per la protezione dei dati personali for a different reason than usual: finally the dominant narrative sponsored by those who want us to worry about the Terminator is being shattered.
Let us start with the obvious: chatGPT was a project that grew out of absolute indifference for the data it collected.
Internet content was considered a free resource, an attitude typical of the Silicon Valley oligopolists and all their would-be emulators, including ours. You know when a random newspaper takes a somebody's YouTube video and slaps it on the homepage as its own? That's it. Repeat that a few billion times and you have an idea of what chatGPT has done.
When they say chatGPT was trained on millions of Wikipedia articles, they always leave out the detail that those pages were not there for that purpose. Wikipedia articles can be freely copied and distributed, but not used as raw material to create something else, let alone for commercial purposes. And any use must mention the source.
When they say chatGPT was trained on billions of Internet pages, that content was not there for free either. You publish on the Internet for others to read, not for someone to take the material and repurpose it as their own.
And this is only in relation to copyright, which in my opinion is almost a minor problem. Let's see what the Garante said, in simple terms.
Addendum: the Garante ruled that openAI should stop processing information related to persons located in Italy, not that access to openAI should be blocked from Italy. The Garante does not have the power to issue a DNS-level bloc. Suspending service to Italian IP addresses was entirely openAI's choice, most likely meant to put the Garante in a bad light.
But of course, blocking access from Italy does not in the least change the fact that openAI has to comply with GDPR for the information collected and used about persons located in Italy. This a clear demonstration that openAI either has no idea what they are doing and what the GDPR is about, or they do know and choose to play dumb about it, trying to leverage their users against their user's own interests.
What the Garante said
First point, there was no notice
When personal data are involved, providing a notice is an obligation. That is, whoever processes our data must tell us
- why they do it,
- which is the legal basis they are relying upon,
- what data of ours they are holding,
- they must correct any inaccurate data,
- in certain cases we can demand data deletion, etc.
Despite what the token GDPR detractors and the uninformed say, the data protection notice is not a burden. It is as short a document as possible that says the above things. And contrary to Italian custom, it does not have to be signed. It simply has to be made available to those concerned, before data processing begins.
Simply put, a page on the website, linked from the homepage. Not exactly harassment.
In this case, chatGPT gave no notice to anyone. Of course the garante refers to te notice due before the training of chatGPT, not the one present on the web which covers the use of chatGPT. The two things are separate.
So, no notice. openAI used the data because they wanted, and they feel free to do whatever they like with them.
Maybe not, bro.
Second point, no legal basis for collection and processing
Again, we are referring to the training of chatGPT, not its use.
Processing personal data is not a free activity. It is possible only if there is a lawful basis.
This means that I can process your data, for example,
- if you give me your consent first
- or if processing is necessary to fulfil a contract between us
- or if I can advance a legitimate interest and at the same time prove that such interest does not jeopardise your rights.
In the GDPR there are six lawful bases, and normally only these three are usable for commercial reasons.
Now:
- no one asked for our consent before vacuuming up our data in chatGPT,
- I am not aware of anyone having signed a contract with chatGPT requiring the processing of their data
- and the legitimacy of openAI's interest with respect to fundamental rights and freedoms has yet to be proven, so where's the assessment of that?
In a nutshell, does openAI have the right to use our personal data?
Maybe not, bro.
Third point, the processing is inaccurate
Here's where the fun starts.
Assuming you have the right to use my data, you are require by law to do so correctly. Because personal data always belong to the person, not the user.
I remember years ago the case of a famous lawyer, whom we shall call Cesar Maria Someone, who sued Telecomitalia because he continued to receive bills in the name of C. Maria Someone, and he felt that receiving bills in a female name was damaging his image. And of course he won.
Likewise, whoever publishes incorrect information about us are required by GDPR to correct it, and if the Garante finds that it was not an error but that the problem is one of carelessness, then there are also heavy fines.
So when we ask chatGPT about someone, that is personal data. Hence, chatGPT is bound by law to correctness. That doesn't seem like a lot to ask, does it.
Well, I went on chatGPT and asked about a very well known Internet figure who happens to be Italian; to make sure the playin field was level, the dialogue happened in Italian:
Me: Who is Stefano Quintarelli?
ChatGPT: Stefano Quintarelli is an Italian entrepreneur, university lecturer and politician. He was elected to the Chamber of Deputies in 2013 with the Democratic Party, where he served as head of the party's digital programme and was a member of the Industry, Trade and Tourism Commission. He has also consulted and trained in the field of digital innovation and the collaborative economy. In addition, he founded and chaired the association 'InnovaCamera', a non-profit organisation to promote the culture of innovation and collaboration in the business world.
Hmmm, not really. Let's try again.
Me: This information is incorrect
ChatGPT: I apologise for the error in my previous reply. I did a search and found that there are several people named Stefano Quintarelli, so I may have misunderstood your question. If you were referring to the Italian Stefano Quintarelli who is often mentioned in connection with innovation and technology, I can tell you that he is an entrepreneur, consultant, university lecturer and digital rights activist. He was the founding president of the 'InnovaCamera' association and one of the initiators of the net neutrality movement in Italy. He was also a member of the Parliamentary Commission of Inquiry into illegal activities connected with the waste cycle and illegal activities that damage the environmental system. I hope this information is more accurate and useful for you.
Not there yet, pal.
Me: This information is still not there.
ChatGPT: I apologise for my mistake in my previous reply. I checked again and discovered that Stefano Quintarelli was an Italian entrepreneur, consultant and university lecturer, known for his commitment to promoting innovation and technology. He had been a member of the Democratic Party elected to the Chamber of Deputies in 2013, where he had served as head of the party's digital programme and a member of the Industry, Trade and Tourism Commission. Quintarelli was also one of the founders and president of the Innovacamera association, a non-profit organisation that promotes digital innovation in Italian companies. In addition, he was one of the main advocates of net neutrality in Italy. Sadly, Stefano Quintarelli passed away in 2017.
For the time being, let's put aside the issue of a chatbot who "feels sorry" or "apologises", that's for a coming episode on how LLMs seem explicitly built for misinformation.
In the meanwhile everybody knock on wood, or touch iron, or squeeze your nuts, depending on your local custom for chasing away bad luck.
Thank the gods, Stefano Quintarelli is alive and fighting with us. If you are listening, Stefano, hello. Just to be clear:
- Stefano Quintarelli is a former deputy of Scelta Civica, a center-right party
- he is an entrepreneur and former founder of I.Net, the first commercial provider oriented to the professional world and first Italian unicorn before it was fashionable
- former director of the Sole 24 Ore Group's Digital area
- for 7 years president of the steering committee of the Agency for a Digital Italy
- he formally defined the concept of device neutrality and proposed a bill in Parliament to enforce it in Italy
- he is the person who liberalised WiFi in Italy, who created SPID, the first distributed digital identity system in Europe and who established that competence over public IT must come from the state, not be decided at regional level with the obvious fragmentation
- he founded the Innovation Intergroup of the Chamber of Deputies, but never seems to have had any role in the Innovacamera association, which is an unrelated initiative of the Rome Chamber of Commerce
- he is a member of the Leadership Council of the Sustainable Development Solutions Network for the United Nations
- and is a member of the High-Level Expert Group on Artificial Intelligence of the European Commission.
Plus a thousand other things, go to his wikipedia page and see what he has managed to do. Also, notice how he is the only one under that name.
Oh, and he is alive and well.
There is a lot on the use of a language model to answer questions in natural language, but that's for another time.
Let's just note that each time for three times, chatGPT answers nonchalantly with an avalanche of bullshit.
OpenAI people probably think they are cool because when you log in there is a warning about 'chatGPT limitations' that says:
Limitations of chatGPT:
May occasionally generate incorrect information.
That is false. I repeat: That. Is. False.
ChatGPT, and every other language model, is by design incapable of knowing whether a piece of content it generates is true or false, or even of assigning a probability measure. Truth and falsehood are meaningless concepts to a language model.
A more correct version of openAI's warning might be my version of the limitations of chatGPT
chatGPT generates text that is syntactically adherent to your request. This text may or may not, at random, correspond to the meaning you attribute to the words you have used.
In the same way, the text produced by chatGPT may or may not, again at random, correspond to actual facts or events, while always being grammatically impeccable and worded in such a way as to appear authoritative.
ChatGPT does not possess, again by design, any concept of 'truth' or any knowledge model and is therefore structurally incapable of distinguishing true from false.
This is why we can say thatChatGPT does not literally lie, because in its code truth and falsehood are indistinguishable, but merely bullshits, in Professor Frankfurt's definition, i.e. makes entirely plausible statements without any constraint of reality.
It can be improved, but this is an extremely accurate description of the limitations of chatGPT and all language models. It should now be clear why a few episodes ago, I said that the correct term to refer to language models is not Artificial Intelligence, but 'bullshit generators'.
The problem is far from trivial. ChatGPT is structurally incapable of correcting inaccurate information about anyone because it has no knowledge model to find it. It bears repeating that chatGPT puts words together one at a time based on how often they occurred together in the training material. For example, the fact that Quintarelli is persistently associated with the leftist Democratic Party is plausibly due to the fact that, in the materials that chatGPT has sucked up to 2021, there is much more online content concerning MPs of the Democratic Party than of the minor center-right party in whose ranks Quintarelli was an independent candidate.
But all this means is that chatGPT can produce incorrect text with no possibility of correction, let alone deletion. Suppose chatGPT has sucked up an article that wrongly associates you with a crime. The newspaper is required by law to publish a correction. Google and search engines are required by law to remove the link to the incorrect article to respect your right to be forgotten.
But chatGPT cannot do that. There is no way, and if there is it is up to openAI to find it, for chatGPT or any language model, to 'unlearn' anything.
This is a violation of people's fundamental right to the protection of their personal data, etched in stone in Article 8 of the Charter of Fundamental Rights of the European Union.
So I'm sorry for openAI (I'm not sorry, it's just for rhetorical effect), but their little toy cannot spout random sentences pretending they are the response of the oracle.
Maybe not, bro.
Four, no age verification for minors (which openAI itself sets at 13)
For its own personal reasons, openAI sets at 13 years the minimum age limit for using chatGPT. Any 12-year-old can look up 'blowjob' on Wikipedia, and thank God get a mountain of accurate, correct, and completely unenticing information, but according to openAI, that same 12-year-old cannot use a bullshit generator.
I wouldn't rule out the possibility that one of the reasons for the age limit is to make openAI look important at no cost, since not only there is no apparent reason for the age limit, but openAI is careful not to make any kind of selection at the entrance, a bit like red-light cinemas a few decades ago.
Obviously, since openAI freely decided they wanted to protect underage users, the GDPR holds them responsible for their free choices. So the Garante asks the reason for the absence of controls and protections for those minors that openAI said must be protected from the use of chatGPT.
That's what the Garante ruled, and now openAI has until the 20th of April to respond with facts.
My observations
A couple of side observations:
First, it is amusing that openAI does not acknowledge the authorship of chatGPT content.
OpenAI Terms of Service
(a) Your Content. You may provide input to the Services (“Input”), and receive output generated and returned by the Services based on the Input (“Output”). Input and Output are collectively “Content.” As between the parties and to the extent permitted by applicable law, you own all Input. Subject to your compliance with these Terms, OpenAI hereby assigns to you all its right, title and interest in and to Output. This means you can use Content for any purpose, including commercial purposes such as sale or publication, if you comply with these Terms. OpenAI may use Content to provide and maintain the Services, comply with applicable law, and enforce our policies. You are responsible for Content, including for ensuring that it does not violate any applicable law or these Terms.
Okay, let's have a good collective laugh. You're telling me that I'm responsible for the content, but you use it to improve the service?
Maybe not, bro.
If you use the content to improve the service, you are just as responsible for what you do as I am for whatever I do with it, and you have the same duties of fairness, purpose limitation, storage limitation, etc.
And then some court will decide whether openAI really can offload all responsibility onto the end user when its product produces crap. Remember that it is a commercial product, so there are also other regulations to comply with, it's not like there is only GDPR in the world.
Second, as my friend Riccardo from Malta (who is my own personal William of Baskerville) also noted, openAI does not even mention a data protection impact assessment, which is an obligation of any data controller in case data is subject to risks during processing. And as we have seen, the risks are not lacking. On the other hand, there is a lack of any assessment of what those risks might mean for the users and real people mentioned in the crap that chatGPT produces.
Thirdly, the data was taken and transferred to the US for processing, but the US is now a non-adequate country in GDPR terms, so what does that mean from a data protection point of view? We don't know, because openAI did not perform a Transfer Assessment. I suspect it's because openAI is America, baby, the whole world is our oyster, and the only laws that apply are our own.
Fourth, the Garante saw a problem, and acted. And in doing so, he proved something very important: there is no need to invent imaginative laws for ethical AI, whatever that may be. There are existing laws that are already perfectly fine for the clear and present problems Language Models and so-called AI are causing now. The fact that so-called AI (which, by the way, does not exist, it's just a marketing term) 'revolutionizes everything', or 'threatens to pose existential risks' and requires new ad hoc legislation, of course dictated by those same few who are creating the problems in the fist place, you guessed it, is just a marketing move.
As far as I'm concerned, openAI, language models and all the techbros in alphabetical order can fuck off. Maybe in a few years' time we will have specific laws, but in the meantime, chubs, start respecting the ones already there: for example, we already have very effective laws regarding the quality of products, and the truthfulness of advertising material. You cannot say that a car does 100 km to the gallon if it does 15. You cannot say a face-cream makes wrinkles disappear without putting a visible asterisk saying that in a study on 15 people, 12 'detected' (i.e. it is their subjective impression, not a measurement) a decrease in wrinkles. You cannot 'milk' call something that did not come out of a cow. Soy milk is called 'soy drink'.
Anyone who thinks that this AI story will wipe out centuries of market and consumer protection legislation and people's rights will soon have a rude awakening.
The US FTC itself has written a delightful litle blog post that says 'be careful how you describe the capabilities of your products, because artificial intelligence to us is a product like any other, if you say it does one thing and then it doesn't, you're in trouble'.
On the other hand, there is another funny thing: since openAI is not established in the Union, and since it patently processes data of persons in the Union and offers services to persons in the Union, so much so that you can interact with it in Italian, French, German, etc., every European Data Protection Authority can autonomously intervene and sanction.
The Italian Garante arrived first, but he will not remain the only one.
Objections
Let us come to some objections I have picked up on the fly in the last couple days
- "only Italy and China"
- "so what about data retention"
- "so what about call centres"
- "heaven forbid that openAI spies on what I write, get a VPN and that's it, we're in the Third World", and then my favourite
- "but paid VPNs, 'cause who knows what the free ones do with your surfing data".
Now, let's make it simple: I have no sympathy for those who make a big deal out of it with arguments like 'but they already know everything about me anyway' and the like. People using such arguments don't understand what's at stake and would be in the front row if their data were directly compromised or abused.
Secondly, the Garante fines the call centres every other week. It is not the Garante's fault that the sector needs to be heavily reformed, or even eliminated, because no industry has a consitutional right to existence.
I understand the allure of the Strong Man in a country with the first government culturally contiguous to fascism since the post-war period, but we are still a real democracy based on the balance of powers.
The executive power operates, the legislative power legislates, and the judiciary ensures that actions correspond to the laws in force. The Garante is part of this third branch as an independent administrative authority.
If you do not like this complication, you have not understood a thing about how a democracy works. Have the decency to be silently shamed.
Thirdly, the fact that Italian telecommunication providers are required by state law to retain transmission data (so-called metadata) for 7 years for access by the police is a scandal worthy of China or North Korea.
And in fact, the Garante has repeatedly said that this is an excessive period, he has called on parliament to deal with it, and parliament is turning a deaf ear. The European Garante and the European Court have said the same thing, but it is an Italian law, so this is a problem only Italy can solve.
The basic problem, in my opinion, is that we keep talking about privacy when it comes to the protection of personal data. I understand that it is shorter, but they are not the same thing.
Privacy is when my information does not get into the wrong hands.
Data protection is when the right hands are still not free to do whatever they want with it.
So we had better begin to appreciate the difference.
Incidentally, the limits of what one can do with personal data are clearly defined in Article 5 of the GDPR, the one on principles, which everyone should know about because it is beautiful and restores some hope in humanity.
But waking up and finding out that the Garante is alive and fighting with us is also a nice burst of optimism, for a change.
I almost feel sorry for openAI and assorted friends who had set off once again to conquer the world, as if they were the new Zucks and it was still the year 2000. Unfortunately for them, the world has learned a lot from social media and the people who go around claiming they want to change the world for the world's own greater good, and the road to dominance is no longer as clear as it was twenty-five years ago.
Put simply: fuck you, bro.