Dex

Big hearted fluffdragon...

...fictional ex-90s platformer mascot, nerd, plural, ΘΔ.



xkeeper
@xkeeper

from what i've heard, it's a ban handed down directly from the man himself. that answers that question! maybe it was because i told someone to fuck off for accusing me of being the same group that "doxxed" nc. lol

i wrote some crap on twitter about it, but the details:

  • i have access to the formerly-affiliated-with-rhdn discord server's staff channel, and was given that access shortly after the site was made read only, to help coordinate possible transition efforts.
  • there is no evidence of the supposed "doxxing". the closest thing to that was the staff trying to find nightcrawler's email address, because the site had been down for an extended period and he was nowhere to be found. (remember: the discord was created in 2017. nightcralwer did not ever actually join it until basically when he shut the site down!)
  • the costs of the site do not make sense. i host tcrf.net, a fairly large wiki, imo. our total storage is about 350GB (media) + 65GB (OS+files+database), our bandwidth is under 7 TB/month. the rhdn archive was 12 GB. our base costs do not exceed $200/month, while their S3 bucket costs $200+/mo -- and that's just the images/files, not the actual web hosting, and it's supposedly much cheaper than what it was before!! these costs do not make ANY sense.
  • the person who hosts that s3 bucket? and pays for it? not nightcralwer, and also not welcome on the forums -- yep, they're banned too!

nightcrawler is a fuckin coward and him comparing himself or his treatment to Near is beyond tasteless, it is infuriating.

edit:
i should mention that the person who is currently paying for the s3 bucket has made it clear to nightcrawler that they are turning that off on the 20th (two weeks from when they sent the notice). so, that'll be exciting!


xkeeper
@xkeeper

according to the formerly-associated-with-rhdn discord's staff chat, nightcrawler is just outright ip banning them and deleting their accounts now, so seems nightcrawler is in full meltdown mode.

sure hope someone archived the forum before today, and ideally before he made a huge swath of it patreon-only a while ago, but. oh well.

not much you can do when someone's intent on burning down their library.


You must log in to comment.

in reply to @xkeeper's post:

the details are really stupid and i'm piecing this together from a lot of puzzle pieces over the last few months (but last week especially):

  1. the files were uploaded to s3 to offload costs to a community/staff member, pending "figuring out a better solution/migrating the site".
  2. at some point, for the creation of the "archive" that nightcrawler uploaded, he scraped the contents of the s3 bucket. yes, he literally. downloaded his own files. again. and then uploaded them all to the ia.
  3. ??????????
  4. profit

the contents of the archive are more or less the contents of the bucket. it's just that the rest of the site -- database, code, etc -- isn't part of the bucket, just the images and files.

everybody loses!

this is why the site notice on data crystal recently changed to say that the domain will start redirecting, and i might just update it to start doing that now, considering. christ.

well, it was good while it lasted

JFC! fwiw, the MySQL dump that was part of the IA upload looks more or less intact to me, except it was missing user data, which also meant it was missing full contributor data (e.g. credits for translations and hacks). I wrote a lil' scraper last weekend and scraped the contributor data as JSON, so we should now have a complete record of the data.

Are you 100% sure? Because of the way the site works, contributor profiles are broken.

For example, my user profile only links stuff that used my patches as a base, but does not link to my own patches.

I just looked at a bunch of examples and built up something to scrape as much from the user pages as possible, so I'm very much not 100% sure.

My understanding is that for all of the community pages, there are two sections: 1) "Releases" is hacks/translations/etc. released directly by that person/group and 2) "Contributions" is for crediting people for individual roles on releases.

The Releases are actually in the MySQL dump because they're 1:1 with a hack/translation/etc, but they're just foreign keys without the actual user data, so getting that user profile information would fill in the gaps there.

Contributor data wasn't in the MySQL dump, so I tried to grab as much as possible. Those have links to the releases, so we can match them by the ID for the release.

I definitely could have missed something, though. I see on your page you're credited with way more translations than credits listed, for example.

there have been many different parallel attempts to scrap the site (i think knowing you did it brings my running tally to at least 4 competent ones and twice as many incompetent ones)

it's funny, because nightcrawler reportedly hates people scraping his site, and it's why he put captchas and cloudflare everywhere. so what does he do? release an incomplete archive that requires people to scrape together a complete one on their own, by scraping the site

it's impressive how every decision he makes is the worst one

You know what really gets me? NC had set up Cloudflare to explicitly block IA's IPs for YEARS!!! The last Wayback captures are from like 2018. I tried a bunch of tricks to get past it and archive the pages in full, but I couldn't get past the Cloudflare captcha.

I don't get the pricing still. I've been trying to piece this together, and the 11 GB of data which might be uncompressed to like 30 GB don't cost that much to host. Was it some weird choice of hosting? Daily backups and stuff? Monthly hosting for 50 GB is $1.5 if the calculator is to be believed.

However, S3 also charges for data egress. Up to 10 TB/month is $0.09 per GB. To arrive to 200 USD it needs to be 2 TB per month. This doesn't really make sense for me. However, this might be an quirk of AWS pricing. Even DigitalOcean doesn't charge that much, it gives a S3-compatible file storage for $5/month, first TB out per month is free, and then $0.01/GB.

Does this mean that it's hosted on AWS and the RHDN files are being downloaded in the volume of 2 TB/month?

no the data really is about maybe 14 gb total, it's largely already compressed archives.

i have no idea how the pricing works out, like i said, nothing i've come up with even gets in the same ballpark. it's the same as the dude who hosted the unfiction site / forum claiming a static webpage and phpbb2 forum were somehow costing over a thousand dollars a month.

also consider that homie has all but encouraged people to constantly scrape and download the site by making its future uncertain and constantly hinting at imminent shutdown/failure

It actually now kinda makes sense for me, because AWS S3 costs $0.09 for a gigabyte of egress, and just by poking through images I see that they're all hosted there; and I don't see them being behind any caching solution like CloudFront/Cloudflare. There's literally no X-Cache and shit in the headers.

So just standard browsing around by random users from internet will run up the bill, unfortunately.

Like. Right now, the main page. https://s3-external-1.amazonaws.com/romhacking-hacks/hacks/gba/images/titles/8027titlescreen.png 255 KB animated PNG. No caching at all.

:despair:

i'm not sure where they get their data, b/c for example tcrf does not have any sort of trackers (other than an image references for the mw / cc badges) and yet it still provides information, so... yeah.

like i'm sure there's some way to tell, but we explicitly do not farm out to google analytics or any other service at all

the person who hosts that s3 bucket? and pays for it? not nightcralwer, and also not welcome on the forums -- yep, they're banned too!

This isn't just stupidity, it's advanced stupidity

It's like having your car paid for by the guy that you just beat up and put in a hospital: something that can only end in your car being repossessed.

in reply to @xkeeper's post:

It's probably better to give up on him completely and disregard him for now, because nothing in the behaviour will change. I think the better way is to figure out how to grab as much data as possible and to put the new website up with the staff that has accountability. Probably the biggest thing that will be lost is user accounts and tying uploads to accounts, but in general that should be manageable / accepted as a sacrifice.

Is there's anything that's going right now and is coordinated w/r/t this?