So I have a few tasks that need done which need, realistically, very little computer to do. One of them is archiving a couple of twitch streams with yt-dlp.
What do you do when you only want to use a tiny bit of computer? You use a Pi. Because they use very little power, my Pi 4 uses maybe a half dozen watts or so. I like using less power, partially because my electricity bills cost too damn much but also because using less power is Important and it's something everyone should care about. It's slow, but jobs like this run overnight and it doesn't matter if it takes longer to run.
But Pis cause me nothing but problems.
I had this whole setup working really well some time ago, kind of thing that ran for years without me touching it. It was long enough ago that youtube-dl was being used for the actual downloads, and because downloading and verifying which files were already downloaded was so slow, I actually had a python script that would grab the full video list, and break that up into groups and thread them out to keep the CPU busy. This worked well until I decided to update the OS one day and everything broke, as you do. So I had to rewrite the whole setup.
It turns out since I set all of this up, yt-dlp works way better for everything now, and there's a -N flag that just threads out the actual HLS chunk downloads for you. Also the video list download and checking runs WAY faster too. So all I need is one command now! Hooray!
The problem is, the Pi's network stack just, fucking sucks? This thing can not download a damn VOD to save it's life. Like it can but, if I crank -N up too high (think 12), I don't max out the CPU or upset the CDN, I start getting file access errors on the NAS because it can't sort itself out. Mind you, I can run these same commands from a normal computer via a 1G or 10G link to the NAS just fine, the NAS doesn't care at all. The Pi just keeps dropping connections or something, I have no idea.
So I set all of this up as a VM on my TrueNAS box for a point of comparison. With two Haswell era cores available and a direct network connection to the storage it runs much faster, as you'd expect. And it runs without errors, as you'd expect(?). The only downside here is that the NAS will use 20-25W more than it would when the Pi is running the same job. That's the power usage of the disks and PC combined, so waaaay more than the Pi, and all in CPU.
The only theory I had is that "well the higher network usage with -N knocks over the network stack", so I ran a test at -N 4. This results in download speeds that never break around 20-25MB/s. I ran it against a whole set of VODs and it ran 11 hours straight with no errors at all. All chunks downloaded fine, no errors in the remuxing, looks great! Check the files, every single one is corrupt. 100% failure rate, AND silent failures. Useless!
So now I'm at a point where I'm not sure if I want to really dig into how networking and network mounts work on Linux to debug this or just, run it from the VM. Mind you, every system I'm using for this is Linux, it's specifically the Pi causing the issues here. And that's a running theme with when I use Pis for things. Just about everything I try them for, there's more bugs than I'd otherwise have. It all works, but only like 90%, there's always extra issues, extra steps. Little annoyances because it's not a Little Guy, it's a Little Arm. And it's upsetting because, the total time per video isn't 4-5x longer on the Pi, it's absolutely more power efficient overall. Not a lot of power difference but it's free since I've already got the hardware.
Maybe I'd be better served looking at ways to artificially limit CPU usage instead.
A small FAQ
Q: Why didn't you set this up on a VM to begin with?
A: The Pi predates the NAS, and the NAS until recently ran TrueNAS CORE, BSD based, and it's VM options kinda sucked. I did not want to learn how Jails worked.
Q: Are you actually interested in info on how to make the Pi work?
A: Sure! But I'm less willing to troubleshoot at this point unless you've got a very good idea as to where the problem is and a potential solution to try. Ideally one you've tested yourself.
Q: What OS is the Pi running?
A: Ubuntu 22.04. Previously was 18.04.
Q: Why on earth would you download Twitch VODs?
A: Do you trust Twitch to not nuke people's vods en masse for no reason? Plus I actually have the storage for it, some people who would if they could, can't.