• he/him

I occasionally write long posts but you should assume I'm talking out of my ass until proved otherwise. I do like writing shit sometimes.  

 

50/50 chance of suit pictures end up here or on the Art Directory account. Good luck.

 

Be 18+ or be gone you kids act fuckin' weird.

 

pfp by wackyanimal


 

I tag all of my posts complaining about stuff #complaining, feel free to muffle that if you'd like a more positive cohost experience.

 


 
Art and suit stuff: @PlumPanAD

 


 
"DMs":
Feel free to message as long as you have something to talk about!


So we all know you should just pirate stuff. Pay people for what they do, but also if you care about something you must Maintain Your Own Copy and not trust whoever you bought it from to do so. I phrased that carefully there, not "download it", but maintain a copy. You have a chunk of data and it needs to be taken care of, it needs to be the same when you request it as when you stored it and ideally needs to be able to survive catastrophic failure if possible.

Doing this with a lot of data fucking sucks.

(Below has turned into an extended ramble about maintaining data locally. I did very little research to fact check the dumb shit I say so please take all of this with a grain of salt. If you see any crossed out sections, it's shit where I was wrong and got called out. I somewhat avoided giving solutions to the problems posed, but don't trust me with your data.)


So, let's assume you have a fairly modest collection that comes in under 1TB worth of data. This is the best possible scenario, an amount of data that will fit not only on lots of hard drives but even flash storage. Great! Now what?

Well, first thing to think about are backups. Easy enough, get an external hard drive and just copy the data over, right? Well that works fine enough the first time, but what about the second time? Do you want to delete things on the backup drive that you deleted locally, like a TV show you didn't want to watch again? Worse still, do you keep the drive plugged in all the time so the backup can run automatically? Or do you remember to pull the drive out and plug it in once a month to run the backup?

The scariest scenario though, is that a file gets corrupted locally and you overwrite that corrupted local copy onto the backup. That would be no good. The only way to really fight against bit rot on a local level is using a more modern filesystem that is designed to help prevent it by verifying data when read. ZFS, btrfs, or if you absolutely must stay in windows land, maybe ReFS. But at this point, we start going down the deep dark hole.


Do you want to run any of those filesystems locally? Even here on cohost you're probably still using Windows (bing!), and it's a lot easier to access data if it's on the network anyway... so we start looking at a NAS.

So the software part of a NAS is pretty easy, actually. If you go to the store and buy a NAS it will be ready to present a share on your network however you want it and it might even have an app for your hateful locked down devices! If you decide to make your own, there's plenty of robust and heavily documented software stacks (really entire OSes) like truenas and unraid that can juggle plenty of disks and help you run other services like multimedia library software or a torrent client. Hardware, on the other hand, presents a problem.

You see, any NAS you go into the store and buy will be hilariously expensive. Yeah you can get something for $100-200, that either holds two disks max or has a single undersized hard drive built in. The bigger names are happy to charge you well over $500 for something that holds 4 disks with some tiny embedded hardware inside to run the services. Worse still, most of these devices don't do much to help with bit rot, the whole reason we ended up in this hole to begin with. Hell, QNAP even has a whole page talking about why they use ext4 over btrfs despite the fact that ext4 is probably the wrong solution for most people (as well as some good ol fashioned FUD). I'd be willing to bet most people getting a NAS like that care more about data corruption that pure performance. This means, unfortunately, looking at the DIY route.

And honestly having done a tiny bit of research, just plugging a couple of USB mechanical drives into a rpi doesn't seem like an awful idea. They can run ZFS as they have enough memory to run a couple of TBs of storage if set up right, and it's not a bad setup if you don't mind, well, setting it up. You can even a bit more than the current cost of a pi to get a USB JBOD and have a bunch (5? 8???) of disks all attached by a single USB port, leaving another port free to perhaps plug in a separate drive to move your backup at least a little bit "off site", even if that's over in a closet. It's far from an ideal setup but... shit I don't think it's worse than running LVM raid with ext4 and calling it good lol.

So, what if you have more than a TB or so of data to maintain? Personally I put the threshold around what a single reasonable priced drive can hold, so right now that's somewhere around 12TB give or take. Under that and you only need to buy a few drives to keep copies, above that and you start needing multiple drives just to hold one copy. 10TB+ a LOT of data, but if you keep a LOT of media around, you can eat that up pretty quick.

Things just get more hellish from here.


If you want to maintain more than a cheap hard drive's worth of data, you need an array. This brings us to one of the fundamental values of RAID. Redundant Array of Inexpensive Disks. Now that was originally in the context of 1988, when "Inexpensive" meant under $10/MB, and the end result was creating an array of 100MB drives with, theoretically, more performance and reliability than a gigantic quad actuator IBM mainframe disk. Very fun read, by the way (waybackmachine).

Nowadays huge monolithic disks do not exist, at least not rotating ones. Instead we have varying degrees of unreliable mechanical storage being pushed to frankly quite insane lengths in order to store the data they do. It's been almost 10 years since the first 10TB hard drive came out, and we've yet to see the recently promised 30TB drives. There's only 3 companies left making full size max capacity hard drives, one of them (toshiba) kinda just handed the keys to the market when two of the largest ones (wd, hgst) merged. This is all to say, hard drives are stagnant, and they don't drop price nor grow in size like they used to. In 2012 I got a very good deal on a used 2TB drive, around $85. 10 years before that it'd be a very good deal on roughly 100GB or so of disk. Now almost 12 years after the fact, that might get you a used 10TB. And that's on a stinky used disk from ebay, new disks are an absolute joke price wise.

Quick aside here, I'm going to shit all over the worst article I've seen at Anandtech ever. This is their version of "just get RTX" but no one cares except me.

To be fair, a lot of this is regurgitating what the hard drive companies say. I'm here to tell you, if you're just storing data in bulk locally and the most taxing IO you ever do is seeding torrents and having a few different people watching movies at once, none of this crap matters. Rated time between failures, rated workloads, load unload cycles... all hogwash. You should treat any mechanical hard drive the same: Expect it to fail and plan to deal with it when it does. Yeah those Toshiba X300s that only have 600k MTBF hours are probably built a bit cheaper, but the part about ruling out their use for NASes? Bullshit. All that crap you hear on youtube about how the expensive drives are rated for more vibration handling and you should only use WD Red Plus for 8 bays and instead use WD Red Pro for 24 bays? Bullshit. Warranties are a nice gesture, but if you get a replacement it's going to be a refurbished drive. Chances are you'll want to run your NAS with the same disks longer than 5 years, too. The only thing that matters is avoiding SMR drives (an exercise left for the reader), otherwise treat every disk like what it is: A hateful little box of chaos that wants to flip bits and corrupt your data, and eventually just explode. A timebomb.

And if you're doing that, you might as well buy cheap and buy often.


So now the problem we now have is we want to run a bunch of dumb, dirty disks in a ZFS array, effectively modern reliable RAID, and we need to get them attached to the computer somehow. Once you get above about 4 or 5 disks, this becomes a bit of a problem. USB JBODs exist (more on that later), but they only go so far. Finally, we end up at the back of the cave, the path everyone who wishes to store too much data ends up at: old enterprise gear.

For drives: used SAS drives come in under $10/TB. They'll probably be at least as reliable as those really suspect whitelabel drives on amazon1, likely much more. Actually let me rephrase that a bit: Do not spend more than $10/TB. If you are willing to tolerate smaller drives, you can even get down to or below $5/TB for 4TB drives right now, but those have potential to have a lot of hours on them. For reference, the "wow super cheap" new drives in the above article were $12/TB and everything else was above $16/TB. Buy often too: make sure you have one or two spare drives that you know are not DOA ready to go. At some point a drive WILL fail, you WILL need to replace it, have it on hand. Maybe do a test replacement before you put data on your array and write down the whole process, too.

For enclosures: you're thinking about that old ATX case that has 10 3.5 bays inside, right? Think about this for a moment: if you wire up 8+ drives inside there and one fails, what does the process of replacing it look like? Trying to fish the drive out from a pile of cables without unplugging any of the other drives while you're already in a degraded array state? You really want a hot swap enclosure. If you're willing to brave the hells of USB you could start stacking USB JBODs, but they can be surprisingly expensive for what they are and there's a good chance they won't play with SAS. This brings us to whole ass rack mount disk shelves. There's tons of them out there (at least if you live in the US, a small perk in exchange for our fucked up healthcare) and I'm not going to get into specific models, but they can (and should) be had for $100 or less including shipping and with caddies, controllers, and PSUs. These are big hunks of metal too, but aside from the small subset of people that want this much storage at home they are utterly worthless. Sometimes you might get lucky and find one full of drives, but it's difficult to find one with drives of a useful capacity size. Anything 4TB or higher is rare, at least at sane ($10/TB) prices. This will likely be for a 12 or 24 bay unit, but they come in all shapes and sizes including 2.5" bay units (useless unless you're using flash) and 40+ bay tray shelves (cool but actually useless). If it's full of drives, it should be priced about what the drives would cost alone, and expect a couple of them to arrive DOA.

None of those big rackmount things use USB, they tend to have a SFF connector on the back to connect to a SAS HBA. Again, there's already plenty written on the internet about this. $30-40 gets you a 12GB/s HBA, if you're lucky your shelf's controller(s) can do 12GB/s too. Don't worry about the individual disks' speed rating. HBAs get hot, if you're not using an actual rackmount server then put a fan on it. HBAs can have all different kinds of connectors and so can the disk shelf; don't get the wrong one. They also can have internal connectors if you ignored me and shoved all of your drives inside an old case anyway.

The problem with all this? Same as with all rackmount stuff, heat and noise. Those dirt cheap SAS drives are almost always 7200rpm and probably draw over 10w each, which adds up fast. You're not running SAS on a pi without some hilariously expensive custom board thing, so you need a whole PC of some sort alongside this. It doesn't need to be super fast, but it does need to have some IO for the HBA and the network card you'll buy later. The thought might be to get a big rackmount server to match the big rackmount shelf, but those tend to eat gobs of power new or old, plus make an awful racket, and are an utter waste for most NAS use. Some shelves have fan control but none of them will actually run quiet enough that you'd want to live in the same room as them. The solution to THAT problem is getting into hardware modification, along with trying not to cook your drives. You like soldering and working inside power supplies?

And the worst part? Anecdotally, and "common knowledge", says that the single best way to keep mechanical hard drives working long term is to not heat cycle them. That means never turning them off, never spinning them down. Leaving them running 24/7. It seems to work, but it sucks. Any power and noise problem you have will never go away. Oh, and plan have all of that on a battery backup, unless you never ever ever have power issues. Disk shelves hate cheap UPS power too.

At least it helps you seed.


And now we're back to the hole we came in: backups. RAID isn't a backup, nor is ZFS. When you get past a single reasonable size hard drive worth of data, you're just not backing it up. Not cheaply. LTO? Even if you can afford it, LTO-9 is 18TB uncompressed per tape. Not worth it. More hard drives? Sure, if you trust heat cycling them in and out of cold storage for backups. Off site? Find a friend willing to hold an entire second shelf you built and let you upload TBs of encrypted backups there every month? Rent actual rack space in a real ass datacenter? Pay for a cloud service that will charge you out the ass should you ever need to pull those backups down? It all sucks.

In my opinion, and this is one of the very few things I'd go as far as to call a fact in this whole rant, the only reliable way to back up data is to share it. Why host an encrypted backup on an entire second array at someone else's house when you could just give them a copy of the same TV shows and movies you're trying to maintain. Yeah they could just be throwing it on an NTFS formatted disk that will one day click one, cough, and chkdsk will proceed to clobber every single thing there, but as long as there's one good copy then it can be rebuilt. The point here is that torrents are probably the best way to back up mass amounts of data we have now.

And they're still not perfect. Maintaining torrent libraries over time sucks. Wanting to have your files organized and named one way while the source of the torrents does another thing sucks. The problems can be solved but it's a ton of work, it's tedious. Plus lots of things are primarily available through private trackers that don't use magnet links. If the tracker goes down permanently, as is likely to happen some day, the swarm is gone. Someone will stand up a new tracker, but all of those torrents need to be migrated over. It's still not a resilient solution, but I do think it's the least bad option for non private data. Thankfully, most people are not generating 10s of TBs worth of personal data that they wish to maintain.

Video creators out there, well I hope you make enough money to afford a lot of disks. Sorry.


What's the point of all this? I don't know. I spent many years just putting my data on a single external hard drive at a time, leaving the hard drive on and hoping nothing bad happened. I've had bit rot, filesystem clobbers, incorrect backups, and even just not being able to find files because I'm disorganized. I've done some crazy shit to save data and I've just straight up lost shit I really would have rather not lost. If you're always trying to have more data than you can maintain, which most do, you're going to lose some of it. Very very few people, even among us nerds, will actually do backups or worry about how we maintain our data too hard. It's just a LOT of work and it's easier to throw it on a drive and hope you get lucky. Since I've gone ahead and written all of this, I'll try and put some actual useful information here now.

First, use ZFS. I'm going to be looking into this myself for funsies but I believe it would actually still be useful to run ZFS on a single detachable disk. I don't know how it will deal with the hard drive eventually spitting out bad data (copies=2 or even 3 should in theory help, while sacrificing a big chunk of capacity) but it should least know it's happening and not give me bad data. You don't need to boot on it or whatever but for a drive designed to just hold data, it's probably worth using ZFS. On windows? Sorry, you're just fucked. Or maybe you could add the pool in WSL, fuck if I know. Btrfs? Might be ok, I hear everything from "wow it has features even ZFS doesn't have!" to "I don't trust btrfs at all!!!" so go do your own research.

Second: share the data you can. Torrents are great and I love them, but also just sending copies to friends. Uploading stuff to different sites. Hell go find a fellow nerd that has similar tastes than you and maybe see if they'll be willing to set up their own big storage fuckbox in exchange for it being filled up with all the interesting fun sharable stuff you grab.

Third, and I'm real bad about this, but do enough organization to at least have all of the data you can't replace in the same spot, and maintain that the best. You do art? Save every single source file you ever make and back that up. Personal photos? For sure keep copies. Even stuff like personal writing. Especially anything that's small, that takes up MBs or GBs instead of TBs, find a way to actually duplicate and verify that data. If it's not sharable, look into encryption too. There's a LOT of work in this part, way more than I'm leading on, but anything done on a computer can be automated. You can, with a lot of effort, plug in a disk and wait for the computer to do the work. If losing data would negatively impact you bad enough, it's an effort you should put in. Just be sure to document the hell out of what you did, because if you do it right you won't touch it for years.

Finally, I'm avoiding talking about "cloud" backups because I personally do not trust them. They're too expensive to store gobs of public data and there's many ways they could decide not to give you private data. If you don't trust a streaming service not to decide you can't watch your favorite show anymore, you shouldn't trust a cloud provider not to lock you out of your account because of an internal error on their side or just, accidentally nuke your data. Maintaining your own data copies was the whole point, and while sharing is cool, I don't need to explain why you shouldn't trust a large business.


  1. Oh yeah you bet I saw those things. $90 for a 12TB is low enough that I will give one a shot at some point but the potential of them having had their SMART data modified (!) plus being SATA is enough to keep me from diving in headfirst. Honestly it's so sketchy that I'm morbidly curious. ZFS only lets you use so many drives in parity, I don't know if raidz3 is safe enough for those fuckers.

You must log in to comment.

in reply to @plumpan's post:

Been dealing with a lot of this in a theoretical sense.

The way I've been dealing with it is just have multiple copies of everything. One big back up spinning disk drive, my programs/os m.2, and an m.2s and ssd with similar things.

I don't do anything special to keep these files up to date or ensure they're safe; I've just backed them up enough times to multiple locations to where I think they'd be safe. If anything, I don't care all that much about losing data.