If you're in Canada, the Government of Canada has an open consultation on copyright policy as it relates to AI tools.
If you're a Canadian artist especially, I highly encourage you to weigh in. The deadline has been extended to January 15th, 2024.
Here are my answers, if you would like to copy or reference anything.
Technical Evidence
<...>In your area of knowledge or organization, what is the involvement of humans in the development of AI systems?<...>
As a software engineer by profession and an artist by hobby, I interact with AI from many angles.
The name "AI" itself is a marketing term, ascribing a sense of "intelligence" and humanity to a tool in a way that is not accurate to how it functions. These tools are are "models"; they ingest data, mutate it according to an opaque and labyrinthine set of rules, and produce an output. Through a process of "training", these opaque mutations are continuously tweaked -- consuming a large amount of electricity and computing power in the process -- until the outputs satisfy the engineer's specifications.
The consequences of this fundamental nature are twofold:
-
"AI" is not "intelligent" in the sense that it can intuit its own conclusions, understand the principles of a field of study, or develop its own knowledge. A human can understand math by applying the principles of addition, subtraction, and so on; an AI would attempt to understand math by memorizing every single number and combination of numbers, because it can only see correlations in its training data and attempt to reproduce them. This is why, until very recently, even the most advanced AI products would experience rounding errors in basic addition and subtraction when working with very large numbers.
-
Humans do not confidently understand how models end up producing the outputs they do. The steps between input and output are deliberately arcane and opaque, which is how models can produce unexpected outputs that surprise end users. It is also why every AI model is a time bomb of undiscovered edge cases and failures-in-waiting, such as how ChatGPT was tricked into revealing sensitive personal information by being asked to repeat a word forever: https://www.vice.com/en/article/88xe75/chatgpt-can-reveal-personal-information-from-real-people-google-researchers-show
In my job, AI models are used to find correlations between attributes, which is a strength of this type of tool. However, by and large, the development of AI tools has a culture of trying to obfuscate and get ahead of genuine ethical concerns. Especially in the field of generative AI tools, developers are openly excited about consuming copyrighted material in order to undercut the labour of the artists and writers without whom they would have no training data. There is a sense of trying to shape the legal conversation around a fictitious danger of an "AI overlord" in order to distract from the mundane danger of corporate greed and disrespect for labour.
Text and Data Mining
<...>What would more clarity around copyright and TDM in Canada mean for the AI industry and the creative industry?<...>
Above all, a legal framework around AI must emphasize that training data is "opt-in" only.
Developers of generative AI tools operate in a "forgiveness before permission" culture, hoping to make it too troublesome and too time-consuming to withdraw consent for online content to be used as training data. This is because without a huge volume of data ingested for free, these tools simply would not work, or they would produce laughable results that do not justify their huge investment.
However, creators generally intuitively understand that such tools are consuming their work in an effort to supplant and undercut them, not to help them, which is why they were not asked. This is the hole that must be plugged in any policy around AI and copyright.
Authorship and Ownership of Works Generated by AI
<...>Are there approaches in other jurisdictions that could inform a Canadian consideration of this issue?<...>
The US Copyright Office's ruling that AI-generated work cannot be copyrighted due to lack of human authorship is sound.
Works that are completely AI-generated cannot be meaningfully copyrighted, because they are the output of a tool. Many times, the users of AI products find that they cannot reproduce previous input after an update to the AI products they were using. This is an indication that they are not the result of authorship, but akin to taking and using the work of another.
Infringement and Liability regarding AI
<...>What are the barriers to determining whether an AI system accessed or copied a specific copyright-protected content when generating an infringing output?<...>
It can be difficult to determine whether an AI product used copyrighted training data "after the fact", as these tools are excellent at laundering their input. Action around infringement and liability must be taken at the point where data is ingested into the tool to begin with.