Ben Daubney

Thoughts on theft, and AI's social contract


This post is part of a series for #WeblogPoMo2025. Read the introduction here.


The Australian government runs a website designed to connect "First Nations peoples with Australian Government programs through stories, news and announcements."

On your first visit, you are met with a disclaimer:

The Australian Government acknowledges the Traditional Owners of Country throughout Australia and acknowledges their continuing connection to land, waters and community. We pay our respects to the people, the cultures and the Elders past and present.

The wording appears in the footer of every page too.

Even from the other side of the world it's very clear that Australia has had a public and heartfelt reckoning with its past over the previous decade. Messages like this appear in album sleeves and during the credits of movies and TV shows. A country grown wealthy by seizing and exploiting land which had been tended in a common good model by the local population is openly declaring a mea culpa. An uninformed read of the Australian Human Rights Commission's website outlining their efforts in 'making sure that every Australian – Indigenous and non-Indigenous – has the opportunities and choices they need to lead full and healthy lives' suggests they're making real efforts to acknowledge, apologise, and work towards a common good.

It's difficult work. Painful work. I'm sure there is plenty of controversy and it doesn't go far enough. These sorts of initiatives rarely do.

But it offers hope. A big modern democracy knows how it became affluent and wants to do the right thing.

Land was stolen. It was terrible. There is now an acknowledgement and there is effort to do right.


My extremely crude understanding of AI models is as follows:

Taking the "how are you today?" example, if I listen to a million conversations and read a million books and watch a million videos I'll get a good sense of how people normally answer that question and build a predictive mathematical model:

If: "how are you today?", then most likely: "I am well, thank you".

All generative AI models work on that basic premise. If I go to ChatGPT and ask what I should have for dinner, it'll perform a calculation based on everything it has ever ingested and processed to come out with a likely answer. If I ask it politely it'll assume formality so probably suggest something more complex. If I use slang or incomplete phrasing it'll suggest takeaway.

Take input, suggest most likely output.


To build an AI model therefore requires having reference data, conversations, things it can read or look at. Generative AI cannot suggest the most probable response to "how are you today?" if it has never encountered the basics of that sentence before. What does "how" mean in this situation? Who is the "you"? Why "today"? Does the inclusion of a question mark denote something?

It can only provide a response when it understands the context based on experience. The more experience a model is exposed to, the better and more human-like its response.

But computers have no experience.

Humans do. Humans write about it, record it, sing and paint and draw and express it. They take time and effort and energy to do so.

We know that those recorded experiences are valuable. We have copyright laws. We have collectively agreed that if a person creates a book or a painting or an idea then they own it, they are free to sell it, sell access to it, or give it freely.

We own the experiences we turn into record. They are ours.

And they were stolen.

Stolen without permission. Stolen without regret. Stolen over and over again.

The content of 191,000 books were stolen without the writers' knowledge or consent.

30 billion social media photos were stolen without the knowledge or consent of either the photographers or the subjects.

An estimated hundreds of billions of internet users likely had their personally identifiable information scraped from the web to train AI without their knowledge or consent.

Getty Images states that over 12 million copyrighted images were stolen to train AI models without their - or the photographers' - knowledge or consent.

If you have ever put words on the internet - a Facebook post, a messageboard post, a TripAdvisor review, a pseudonymous blog about your heartache and loss, a fuck-you comment on a news article, anything at all - your words have almost certainly been found by one of these big technology companies, copied, and taken away for their own use.

Stolen.

If you make a living from writing words or creating images, your work has almost certainly been stolen without regard for consequences that might have for you.


And, jeez, the utter, shameful hubris of this theft.

I return over and over to this article in which Microsoft's head of AI states:

"I think that with respect to content that’s already on the open web, the social contract of that content since the ‘90s has been that it is fair use. Anyone can copy it, recreate with it, reproduce with it. That has been “freeware,” if you like, that’s been the understanding."

This is resolutely not true.

Remember when YouTube was a free-for-all for posting copyrighted content, and how it now has ads every three minutes so that the corporate copyright owners get their piece of silver? Have you noticed that anywhere where people put significant time and energy to create communities or content has some sort of paywall no matter how soft? Wasn't it always the case that art was put online next to banner ads?

What sort of weasel words are being used when Mustafa Suleyman speaks of a 'social contract'? At an extremely generous reading, it's true that the web has been about connections, hyperlinks between one place and another, sociability-through-linkage. But there's reciprocality involved - I want to talk to the politics of Australia's land grab and I do so by quoting and linking back to source.

There's no 'social contract' when corporations steal vast sets of our data - knowingly illegally - to make an AI model for their own gain.


Let's be generous again for a moment and entertain the idea that maybe there is a social contract here.

Our data was stolen to train big AI models. Whether through guilt or sheer grandstanding, almost every major AI model - OpenAI, DeepSeek, Llama, etc - can be downloaded and run for free by almost anyone. You could say, if you're an AI apologist, that these companies are "giving back".

And there's an argument to be made that the conversational approach to computing that AI enables means that information is now more easily accessible.

You can write some text saying "what is considered to be the single best simpsons episode by most people" and get a cogent, quick, understandable and fully explained response back. You don't need to know what IMDB or Wikipedia is, or how to format a query in a Google-friendly way. Simple ask, simple response.

And realistically, what data sources could a good model use without tearing through loads of novels and reference works and social media posts? Isn't it that or nothing?

Isn't this all for the common good?


When Australia was first discovered by Dutch explorers in 1606, it was a land tended and lived by the commons.

"This is for everyone."

Others came, saw the value in what was there, took it for their own gain, and marvelled at how the country they had created was - in their opinion - better for everyone. Only centuries later did they acknowledge that maybe this wasn't the case.

Australia has matured. The Internet has not.

We are due our time too. And we deserve more than just acknowledgement.


#AI #WeblogPoMo2025 #main #technology