littleampton

it's-a-me, me

  • he/him

25 | πŸ‡΅πŸ‡± πŸ§œβ€β™€οΈ | dum dum

☎🧰🎡🐳🌊


cathoderaydude
@cathoderaydude

I probably have some part of this wrong but I can find almost no complete info about it online; I'm sure someone will have corrections, I will add them if you leave a comment.

I've been wondering for years: how is it possible that an ISO can be written to a USB drive and booted on a PC, just like that? First three guesses don't count, mine were all wrong.


I came up with:

  1. It's some special BIOS feature just for this; no, it isn't, not the least because it works on machines way WAY too old for it to have possibly made sense.

  2. It's leveraging the USB CDROM support in many BIOSes; no, it isn't, there's no way to tell a flash drive to pretend to be a CDROM.

  3. The flashing tools (balenaetcher and rufus primarily) are injecting a chainloader stub; no, they aren't... well, okay, rufus is, sometimes, but not for most of the cases you're likely to run into. And more importantly, a lot of people "burn" ISOs to USB using dd, which does absolutely no processing of the image whatsoever.

This has bugged me for years and I've always meant to sit down and figure out just what's going on. What's so special about Rufus? Why do you have to use that, or a bespoke Microsoft tool, to write Windows ISOs? And why are there some ISOs that Just Don't Work Anyway? And how could dd ever work for this?

The answer is the only one that could possibly make sense: ISOs aren't ISOs anymore, and haven't been for a long time. Except that they are.

how it was, once

The .iso format is, in theory, a raw copy of an ISO9660 data track. It can't contain any other CD format*, which is why, if you attempt to image an old Windows game (for instance) that has CD audio, then burn the resulting iso back out to a disc, you won't get any audio. It'll work if you use a bin/cue, because that format supports multiple tracks, as well as content that isn't ISO9660.

  • "What about UDF?" UDF follows the ISO format in enough ways that it'll still fit in an ISO file, and you can use "UDF" interchangeably with "ISO9660" throughout this post, although in practice you usually will find actual ISO9660 in OS images.

For that reason and others, .iso is imo a terrible format that never should have existed. I'm not sure why it does; offhand I'm guessing it's a leftover de facto filetype used in authoring that just stuck around, but it's frustrating because the shortcomings with it would have been obvious from the moment it was made.

Mind you, bin/cue is also a terrible format - I can't imagine any use for treating the tracks of a CD as separate files, and from most peoples perspectives it just means "there's always one useful file and then one useless piece of detritus that I have to tote around for some reason." Why was this not just a single file containing a header followed by each of the .bins concatenated together? It apparently originated with CDRWIN, some guys shareware tool from the dawn of home CD burning, so... that's probably why.

Anyway, what matters is that ISO9660 is the data format of a standard CDROM, and it's not bootable by PCs. Prior to 1995, you could not put a CDROM into a PC and start up from it, you had to use a floppy disk with a CD driver as a bootstrap.

Then, in 1995, the El Torito standard ("Bootable CD-ROM Specification") was released, and that offered a standardized location on a CDROM for storing a "boot catalog", which in turn pointed to sectors on the disc which contained a hard disk or floppy disk image, which the BIOS' existing boot code could emulate, then boot from.

The floppy disk mode essentially just automated the process people were already using - when you boot from a Windows 98 SE CDROM, you are literally booting from an image of a floppy disk with a CDROM driver, which then mounts the CD and runs setup. It's almost the same disk as the 98 "EBD"; you can even extract that image and write it to a floppy, and it'll work. The same goes for e.g. Red Hat Linux 5; they provide a boot floppy for systems without El Torito, and it's the exact same image on the CD.

On the other hand, if you've ever booted a CD and seen a brief banner that said "no emulation", that meant that the disc was authored in "hard disk" mode, where the BIOS presents the CD as if it's an HDD. This is preferred nowadays.

El Torito was a huge improvement to OS installation media, and it's still around now, every BIOS supports it still; Ubuntu 22 uses a "non-emulation" El Torito image to boot from CD. It works very well, apparently. But what's strange about all this is that it doesn't... appear... to have been... necessary??

ISO9660 was bootable from day one. The original 1988 spec contained a "Boot System Identifier" that identified which machine it was intended for, and the first sixteen sectors of the disc were defined as "Boot Sector" and left undefined. The actual ISO9660 header begins at sector 16; everything before that is ignored.

(Notably, Apple's hybrid CDs, which stored both Mac HFS and ISO9660 filesystems, leveraged the system identifier field to identify the HFS partition, but I don't think they actually used the boot sector for anything - as far as I know they just identified the disc as having HFS, then booted from it just like a hard drive. This may be wrong, I had trouble finding a clear spec.)

ISO9660 sectors are each 2KB, and an entire PC MBR is only 512 bytes, so 16 sectors is way more than enough space to either stuff in an MBR, or a pointer to somewhere else where a more substantial boot record can fit. They could have just picked a sector, then extended the BIOS to read that and execute whatever's in it - why didn't they do that? I have no idea. It feels like it wouldn't even need BIOS support; disk controllers could have done it with option ROM code.

At any rate, they didn't. CD/DVD booting (can you boot from a bluray?) still uses El Torito, not that most people care anymore since almost nobody has an ODD anymore, and we've all switched to USB drives for OS installation. Those are much simpler, because they're just hard drives as far as the BIOS is concerned. If you ask to boot from one, it does exactly what it would do with a hard drive.

But of course, that's not true; we haven't all switched. There are many reasons people may still need to use an optical drive, they're still sold on amazon, some business machines still include them, so OS vendors have to continue to support installation from optical media. What they don't want to do is to provide two different versions of that install media for download, because that's simply a pain in the ass.

how this is now solved, usually

The critical thing here is that the first 16 sectors of an ISO9660 image are completely undefined. You can stuff anything you want in there, like an MBR or a GPT partition table and boot code, and that's exactly what almost all modern ISOs do.

If you download ubuntu-22.04.4-desktop-amd64.iso and look at it in a hex editor, you'll find an MBR (ubuntu 22 is UEFI-only, but GPT drives always includes a "protective MBR" stub), followed by a GPT (look for "EFI PART") and then, alllll the way down at 0x8000, you'll find the ISO9660 signature ("CD001".)

If you were to decode the GPT, you'd find that it shows at least two partitions: The first one is at 32KB, which happens to be sector 16; that's the ISO9660 filesystem. After that, you'll find a small partition, only a few megs, tagged with the GUID for an "EFI System Partition" (ESP.) If you were to mount this, you'd find a FAT32 FS with nothing in it except a folder called EFI, containing a folder called BOOT, containing a copy of GRUB.

If you write this image to a CDROM and boot it, the BIOS/UEFI will skip over the MBR and GPT and read the ISO9660 header. It checks for an El Torito bootsector, reads it, executes the floppy or HDD boot image it points to, and that in turn launches GRUB. Grub loads a CDROM driver, finds the CD, finds the linux boot image, and starts booting it.

If you write that same image to a USB drive, then put it in a a BIOS machine, it'll check for an MBR at 0x00 and use its boot code (I'm not 100% sure what this looks like), while a UEFI machine will check for a GPT at 0x200, search any partitions it has with a valid EFI filesystem, find the ESP, and execute GRUB. After that, everything happens the same way.

I don't know how Linux determines what medium it was booted from, but however it does, it proceeds to mount that FS... as iso9660. This kinda surprised me, though it shouldn't have. I always assumed that only the CDs were 9660 and that the USB drives were somehow ext4, because they won't read in Windows. Also because Linux (probably??) wouldn't work very well from a plain 9660 filesystem - but that doesn't matter, because it doesn't really run from the CD filesystem. The live boot process creates a ramdisk, copies a minimal OS image to it, then mounts the rest of the OS from a "squashfs" file. SquashFS is its own thing and Linux is very happy to run from it; it does copy-on-write into the ramdisk I think, so you can install packages and whatnot.

So that's it, that's the trick: ISOs just happen to have room for two distinct but compatible disk headers describing the same data.

Is this approach "a hack"? Kinda. Enough of one that it doesn't quite have a name - there's no formal spec for how these files are structured. People call them "ISOhybrid" files because the utility used to make them is called isohybrid and that's all the concrete info I really have. I can't find even an informal spec, just man and wiki files explaining how to use the tool, which is so "modern linux" I can't even formulate a joke about it.

To be clear though, it's "not a hack" in the sense that... there's boot code in the ISO boot code area, so it's spec-compliant, right? But that code isn't being used quite the way the drafters intended, and for that reason, it's very fortuitous that the writers of 9660 did it exactly this way.

Had they said "well sector 0 should be a disc identifier, and the platform boot code should start at sector 1", then using DD to write an ISO to USB would not work; you would have to memorize a particular incantation (involving either skip= or seek=, and no, you aren't going to remember which one it is, you'll have to look it up every time) to slice off the first few KB of an ISO in order to write it out to USB.

And of course, had 9660 not left this generous hunk of space at the beginning of every disc, or had they deleted it from the standard once it became apparent that almost nobody was going to use it as intended (I've heard that DEC did, IDK if there was anyone else) then the whole practice wouldn't be possible at all, because there'd be nowhere to store a bootloader in an ISO, so things would still have to be distributed in two different formats.

In practice however, this is pretty damn reliable. I have never seen it fail and I can't think why it would. It doesn't confer many limitations that I can think of; ISO9660 is a pretty cromulent FS for a highly limited, readonly device. To wit, it doesn't seem like there's any push to move away from this approach. But it's still not exactly ideal.

problems and remaining weirdness

It's great that this works, it's very convenient. It means you don't have to distribute two versions of every OS (which used to be common and is still done sometimes.)

It's unfortunate that they're just called .iso though, because that's... wrong in a pragmatic sense. Like, yes, by the letter of the law these are ISOs, but come on, they should have their own extension. I get why they don't: because Linux distributors don't want to have to explain to every user "if you want to burn it to a real CD, you have to change the extension to .iso so your burning app will see it in the file picker." That wouldn't quite defeat the purpose of the whole endeavor, but it'd come close.

It just fundamentally bugs me, though, that so many people are toting around these things they think are "just ISOs" when they really aren't. When you burn one of these to a CD, you're putting data on there that is not used by a CDROM, ever. When you put it on a USB drive, you are writing out a filesystem that is really fucking weird to put on a USB drive.

To wit: I assumed you couldn't read live Linux drives from Windows because they were ext4, but no: it's because you put ISO9660 on a flash drive. Windows has code for reading that FS, but it's perhaps understandably hardcoded to only check for it on CDROM drives. They should fix that! But like! It's not surprising that they haven't!

The other thing about this is that it's led to people being confidently wrong about how ISOs work, and that's going to have knock-on effects for a long, long time. In short, I think this is like the Y2K problem: It got fixed so quietly and effectively that most people don't think there really was a problem at all. Yeah, you can just write ISOs to USB drives - it Just Works, why wouldn't it, it's always worked for me!

Google anything about "hybrid ISO" and all you're going to get is results about mixed-mode data/audio discs, ancient mac discs, and other things that aren't relevant at all, or, you'll get stackexchange and reddit posts from people trying to figure out how to write out Linux images. Because, of course, that's what these are used for, overwhelmingly. All the people on reddit who say "you can just write ISO to USB with dd" are revealing that the only thing they've ever written to USB is Linux.

This makes sense and isn't surprising, of course, but ISOhybrid is largely a Linux practice. I'm sure lots of FOSS stuff in particular uses it, Haiku probably does, but - BalenaEtcher's "this ISO doesn't appear to contain a partition table" dialog was the result of a bug report about VMware ESXi images "not creating bootable drives," and that was because, sure enough, they just distribute ordinary El Torito images. I believe memtest also had this problem for a while but updated at some point. And, of course, anyone who's tried to write a Windows ISO in Etcher has gotten the massive warning message telling you specifically to use Rufus.

Unsurprisingly, Microsoft has NIH syndrome about isohybrid. Windows isos are plain el torito images. So what's so special about Rufus that makes it work with them? Surprisingly, very little. What makes it work is its unspecialness, actually.

Etcher seems dumb enough: it's nothing more than glorified DD for windows. dd isn't really what you're supposed to use to handle CD images, but that's okay because Etcher is overwhelmingly "the thing for writing Linux images." What it's intended to do is to write a Literally Incompatible Image to your disk. So if it's not actually one of these special gimmicked Linux images, it's no wonder it doesn't boot; why would it? ISOs aren't for hard drives. It's fortunate that they added the check for the GPT to Etcher; it's the one sneaky admission that what you're doing here isn't exactly "correct," and without it an awful lot of people would be confused a lot more often.

Rufus works for Windows because it's dumber than Balena: it copies the contents of the image to the disk. Like, the files. It literally does nothing more than make a FAT32 partition, then copy the contents of the Windows ISO to it, file by file. That's all.

If you don't believe me, try it: Wipe a flash drive, put a FAT32 partition on it, and just copy the contents of a windows 10 ISO to it. It'll boot, because UEFI mandates support for FAT32, and is hardcoded to look for specific files in /EFI/BOOT on any connected mass storage device - files you can see right in the ISO. Windows 10 is happy to install from FAT32.

(Note: At some point, Windows 10 ISOs contained a file larger than 4GB that broke this process; I just tested and the latest Win1022H2 works just fine this way. Rufus has methods of dealing with this scenario if it comes up, I read them somewhere, but it involves splitting up the WIM archive and there's an official process for it.)

Rufus does offer the option to format the drive as NTFS, for a variety of reasons including support for Windows releases with files larger than 4GB. It's not a custom bootloader however; it's an NTFS driver that runs inside UEFI, which then looks for an NTFS partition with /EFI/BOOT and boots from it. Everything else proceeds as normal after that.

You can do this with Linux ISOs too, by the way: open up the latest ubuntu image and you'll see /EFI/BOOT. Copy the contents to a FAT32 drive, and it'll boot. Grub and the kernel don't care where the initramfs or squashfs are or what FS they're using, as long as it's an FS they understand.

Notably, the UEFI spec does not mandate that this be FAT32. It MUST support FAT32, but MAY support other FSes. The common wisdom is that FAT32 is the only thing you can rely on, which often comes across as "UEFI only supports FAT32" but nothing says there couldn't be a machine out there that natively boots from NTFS or ext4.

Naturally on BIOS machines you still need an MBR. I believe in that case Rufus just extracts the El Torito image from the ISO and writes it to sector 0 of the drive, though I'm not 100% on that. Apparently there are a million other little edge cases being fixed up by Rufus as well that all deal with BIOS/MBR. It doesn't bug me too much that this is the only tool that seems to care about that target; BIOS systems are never going to be 100% reliable booting from USB anyway, especially older ones, and in time they will become less common, and good riddance. At least we have one solid tool that'll continue working with them forever, since they aren't a moving target.

So in conclusion: ISOs aren't ISOs anymore, except they are, except we shouldn't be calling them that, except we need to, except almost nobody cares and we should just fucking standardize this new thing. It seems to work really well and I don't see any real issues with it except that most people don't know it exists.


You must log in to comment.

in reply to @cathoderaydude's post:

As I understand it, a CDROM can have exactly one ISO data track, it has to be the first track on the disc, and an .iso is supposed to be a dump of just that track.

If only! You can have arbitrary numbers of ISO data tracks, and they can go anywhere on the disc. "Enhanced CD" discs usually place the data track after all the audio, and as you might imagine if you try to mount it as just a plain ISO, Windows gets very confused about the LBAs not starting at 150.

Speaking of starting at 150, that's because the ISO format (and almost every CD disc image format) completely skips over the leadin that contains the table of contents. The leadin is sectors 0 - 149, hence ISOs and every other disc image starting at 150 and not 0. It's great! Love it here.

(That's also why you need a table of contents like a standalone text cuesheet, or DiscJuggler's single-file .cdi that embeds the TOC with the disc data, etc.)

Nope, an ISO represents a single data track. It can't represent more than one track.

Way back in the day when people used to split up and compress game rips to share on dialup, I remember PC Engine CD games that had a random set of .iso and .mp3s, because there were several games on the system that had more than one data track (and the data track is always track 2, not 1). CD-ROM kind of just lets you do whatever you like

PowerISO detects PC/classic Mac hybrid discs at least, I haven’t seen much more than that. It also only shows the PC half IIRC.

Recently I was trying to boot Red Hat 7.3 on an HP i2000 (Itanium Merced) and ended up with this invocation to make a bootable CD out of the boot image which I believe was originally intended for an LS120 disk:

xorriso -as mkisofs -iso-level 3 -r -V "RHBOOT" -J -joliet-long -append_partition 2 0xef boot.img -partition_cyl_align all -o RHBOOT.iso . (very similar to https://wiki.debian.org/RepackBootableISO#arm64_release_9.4.0)

Why do I think it’s for an LS120? The size, and some later docs: https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/5/html/installation_guide/ch-ia64-intro

Fun little bit of history!

Yeah the gnashing of teeth mostly comes from how (AFAIK) there isn't a unified liveusb-with-persistent storage layout beyond the bounds of any particular distro. some did FAT32 shenanigans, some made an ext4 partition after the iso ends on boot, some relied on USB flashers to create a partition and put it into the grub config.

This has addressed so many tiny mysteries from my days of disc burning and USB stick creation from, like, 1998-2005. So many things that were β€œit might work or it might not” or vague explanations of β€œto make a boot CD you need a floppy disk image” or whatever.

Thanks!

Tangentially related knowledge from hackintoshing: Modern installers for macOS come in "Offline" (has everything) and "Online" (you gotta have a connection to download the install files to the machine) varieties.

The Offline variety come with the predictable "ESP that kicks you into an installer on an HFS+ partition" format, but for the Online installer, you can just make the disk one huge fat32 partition with an EFI folder and a com.apple.recovery.boot folder.

This just ends up mounting a DMG file, from my understanding, so not that different from the Ubuntu installer in the end. But it feels fascinating that you can run a macOS installer from just a normal fat32 disk.

Rufus works for Windows because it's dumber than Balena: it copies the contents of the image to the disk. Like, the files. It literally does nothing more than make a FAT32 partition, then copy the contents of the Windows ISO to it, file by file. That's all.

This used to work with NTFS on BIOS, too. MBR partitions could contain a bootloader, and if they are marked as bootable, BIOS would boot it. Crucially, NTFS places there a short program that finds a Windows bootloader on that partition and loads it. CD images contained that bootloader, so if you create an NTFS partition on a flash drive, mark it bootable, and copy onto it the contents of an ISO, it'll boot!

This used to work with NTFS partitions created on Linux too, until fairly recently people decided distributing a binary of Microsoft's bootloader someone found somewhere as a GPL code is probably haram.

You can / could (?) still create a FAT32 Windows boot USB using the media creation tool - it will super-compress the WIM into an ESD (encrypted?) image. That seems to work well enough, and I think they also allow split images too.

It went the ESD route when you'd tell it to make a combo x86/x64 image. If you use RUFUS' NTFS driver, naturally it breaks secure boot. I was pretty chuffed at finding a way around that.

I'm not sure that actually happened, in short. The Ubuntu ISO for instance shows up in Linux mount as iso9660; if it was udf it would say udfs afaik. I had been under an impression that DVDs required udf but this does not appear to be true. I haven't looked into it in depth but I suspect the features udf grants are just not that important on read only media compared to the increased complexity and decreased compatibility, but that's a pure guess.

Oh and the other part of the answer is that, as far as I know, UDF doesn't affect it at all because the disc header structure is identical for backwards compatibility purposes. Apparently you can make a dual FS disc that reads as both ISO and UDF, so the boot sector has to be there still and that means you can still do this trick with it.

thanks very much for this very thorough write-up! it's a topic we'd been meaning to dig deeper into, too, because of all the hidden complexity.

as regards the presumed need to load a CD-ROM driver, remember that motherboard firmware used to be smaller, ROM chips used to be more expensive, and firmware vendors likely weren't in a hurry to take on the additional cost of writing drivers for CD drives. the strategy of loading the driver off the disc puts the cost on... well, whoever writes the burning tool.

you also mentioned that Linux-y stuff doesn't always load properly on Windows despite it being nominally a standard filesystem. there are several extensions to the filesystem itself, which offer various kinds of metadata; it's likely that one of them is in use and messing Windows up.

even spicier is what linux image builders had to do (still have to do?) to get things to boot on the mac implementation of EFI while also still functioning properly on UEFI. i think there's a tool called "bless" involved somehow