Dex

Big hearted fluffdragon...

...fictional ex-90s platformer mascot, nerd, plural, ΘΔ.



arborelia
@arborelia

When I was looking for more information about what recourse I might have against the Software Heritage Archive, who has been putting code that deadnames me and other people into their archive, I encountered this announcement of theirs:

They pivoted to AI! They were blockchain bros in 2018 and they pivoted to AI in 2023! Of course they did!

Now you don't have to be a European who changed their name to have an action you can take against them. Everyone who has put code on GitHub has a claim against them.

Their press release says "ethical" several times in hopes that it becomes true, but they have taken all the code they could possibly scrape from GitHub except sometimes leaving out GPL code. Apparently what makes it "ethical" is that you can ask for an opt-out, and they will try to remember to get around to removing the code you specified from later versions. Though of course we've seen how the Archive operates -- they'll most likely say "oh, we removed your code, but not this identical copy of it that we also have", or "we endeavor in the future to be able to remove your code".

Here are some calls to action:

  • If you've ever put code on GitHub, check Am I In The Stack? (edit: more direct link), which will say what they've scraped from your GitHub namespace in particular.
  • If you enjoy futility, send them an opt-out request.
  • If you have a HuggingFace login or you can stomach getting one, log in to HuggingFace and report the dataset for copyright infringement. It's on the three inconspicuous vertical dots on the right sidebar.
  • Send a takedown notice to takedown@softwareheritage.org, demanding that they take your code out of all versions of their AI model dataset, as well as their archive, because they have almost certainly violated your license -- even if it's open source, especially if it's open source -- and therefore they no longer have any right to your code.

No matter what they say, they are not following your license unless your license is equivalent to the public domain.


StrawberryDaquiri
@StrawberryDaquiri
This page's posts are visible only to users who are logged in.

You must log in to comment.

in reply to @arborelia's post:

and scraping everyone's code to train up an LLM thingie is somehow relevant to the purported "mission"? how does this help with the ostensible mission of preservation? ugh, I'd think all the corporate blather about "missions" was a complete joke if I didn't seem like executives actually pretend to take their sense of mission seriously, when the mood strikes them anyway ~Chara

This is almost certainly why they based their whole archive on a very stupid immutable data structure and said it's like a blockchain. Because that was the thing you needed to do to get money from rich people in the late 2010s

The only repo of mine that they grabbed contains five text files of probably-out-of-copyright poetry, but I sent in a request for removal anyway because screw ‘em, I didn’t say they could use it.

oh cool the stack manages to include: two open sourced projects that require reproducing the license along with whatever you do with it, and one project with no license, which means by default the license is "all rights reserved"

i'm furious and will be following the outlined steps