When I was looking for more information about what recourse I might have against the Software Heritage Archive, who has been putting code that deadnames me and other people into their archive, I encountered this announcement of theirs:
They pivoted to AI! They were blockchain bros in 2018 and they pivoted to AI in 2023! Of course they did!
Now you don't have to be a European who changed their name to have an action you can take against them. Everyone who has put code on GitHub has a claim against them.
Their press release says "ethical" several times in hopes that it becomes true, but they have taken all the code they could possibly scrape from GitHub except sometimes leaving out GPL code. Apparently what makes it "ethical" is that you can ask for an opt-out, and they will try to remember to get around to removing the code you specified from later versions. Though of course we've seen how the Archive operates -- they'll most likely say "oh, we removed your code, but not this identical copy of it that we also have", or "we endeavor in the future to be able to remove your code".
Here are some calls to action:
- If you've ever put code on GitHub, check Am I In The Stack? (edit: more direct link), which will say what they've scraped from your GitHub namespace in particular.
- If you enjoy futility, send them an opt-out request.
- If you have a HuggingFace login or you can stomach getting one, log in to HuggingFace and report the dataset for copyright infringement. It's on the three inconspicuous vertical dots on the right sidebar.
- Send a takedown notice to takedown@softwareheritage.org, demanding that they take your code out of all versions of their AI
modeldataset, as well as their archive, because they have almost certainly violated your license -- even if it's open source, especially if it's open source -- and therefore they no longer have any right to your code.
No matter what they say, they are not following your license unless your license is equivalent to the public domain.