arborelia
@arborelia

I have had some partial success sending the Software Heritage Archive a takedown notice. They have at least removed one repository of mine from their website. I don't know yet if it's removed from their dataset, or will be.

This repository (which happens to have been my 2020 Advent of Code solutions) was unquestionably mine, not theirs to use in any way, because I never put any kind of license on it.

I actually had no idea that the scope of their archive had grown so large that it included ephemera like this, but it makes sense that now they want as much code as they can possibly hoard so they can feed it to a HuggingFace AI dataset.

I sent this message to dpo@inria.fr and takedown@softwareheritage.org:

The Software Heritage Archive contains an infringing copy of my code:

https://archive.softwareheritage.org/browse/origin/?origin_url=https://github.com/arborelia/advent2020

The copyright on this code belongs strictly to me. GitHub merely has permission to host it under their terms of service. It is not available for distribution under any terms. It is not licensed for use for any purpose.

You must cease and desist using, copying, and distributing this code. You must remove this code and all copies of it from your archive, including where it appears in data exports and derived datasets such as "The Stack", within 30 days.

If you've put a repository on GitHub, if you didn't put a license on it, and if you see that they've made a copy of it on https://archive.softwareheritage.org/, I encourage you to do the same. (You could also check Am I in The Stack?, except it went down.)


StrawberryDaquiri
@StrawberryDaquiri
This page's posts are visible only to users who are logged in.

You must log in to comment.

in reply to @arborelia's post:

Happy to hear they got at least one repo down.

Am I in The Stack?

My paranoid self believes that clicking a button to check if I'm in their stack also adds me to the queue of next accounts to scrape from.
Because of course I'd only check if I want in, right?

As a side-note, I do believe that SourceHut is the only major source they're not copying from, which is something SH's ToS forbid. Looking at their public forge addition list where it is listed as "suspended," I like to think they're scared to touch it or got IP-banned.

Noticed that the issues in that opt out repo for "the stack" have been totally ignored; there are issues open from a year ago, and the repos they chose to opt out are still in the latest version. Gross.