For use only on NTSC Genesis Systems.
Avatar image by CyanSorcery!


Tumblr (inactive)
tumblokami.tumblr.com/
Twitter (inactive)
twitter.com/Techokami

cathoderaydude
@cathoderaydude

problem: you have a windows PC with interesting software on it, but it has a 500GB disk, of which only 18GB is used. how do you produce a disk image from this that doesn't suck ass to work with?


you can dd it, but nobody wants you to upload a 500GB image to internet archive. you can zip it, sure, but if the drive experienced fragmentation, etc. then there may be junk data that will cause the image to be much bigger than it has to be. 18GB is bad enough; if the zip ends up being 48GB zipped due to random junk sectors, that's super un-ideal. even worse, you still need 500GB of free space to extract it before you can so much as look at the contents. ugh.

okay, Just Use Ntfsclone. mmm, well, problems there too. by default, ntfsclone... just dds the partition, apparently? it's not actually clear why this mode exists, nobody seems to have an explanation. most of the tool's magic is in the confusingly named --save-image flag, which enables its secret sauce. they just call this "the special image format."

on its face this does exactly what you want. if the drive has 18GB of actual files on it, "ntfsclone --save-image [destination].ntfsclone /dev/[source]" will produce an 18GB image with no wasted space even if there's junk data on the disk.

new problem: nothing can read it. the "special image format" is completely unique to ntfsclone and nobody has bothered to implement support for it. you can't mount it as a loopback device, 7zip won't open it, etc. The only possible thing you can do with one of these is write it back to a block device with ntfsclone. ugh.

i've been trying to figure out what to do about this. copying the files themselves isn't an option because they never have installers; installed programs are useless without the OS and it's incredibly complex profile and registry data structures, which can't be easily identified or extracted. so you need to copy the OS in a replicatable, bootable state, but nobody wants an image file that's completely uninspectable in this year of our lord.

so here's my new idea:

  • use ntfsresize to shrink the partition to the size of its contents plus a small margin; 20GB, say
    • suppose the partition in question is sda1.
    • ntfsresize --info /dev/sda1 - this will tell you how much space is in use
    • ntfsresize -s 20G /dev/sda1 - shrinks the partition
  • use fdisk to shrink the partition
    • fdisk /dev/sda
    • delete the OS partition. it needs to be the last partition on the disk. if it isn't, you're fucked, give up.
    • create new partition and specify the size as +20G
    • set the type to 7 for NTFS
    • write
  • use dd to save the MBR to a file
    • dd if=/dev/sda of=machine.mbr.img bs=512 count=63 - this is not a magic number, it's the same for all drives.
  • use dd to save the partition itself to another file
    • **dd if=/dev/sda1 of=/destination/machine.part1.img bs=4M status=progress
  • zip the result

now you have an image that's as small as possible, and which you can open in 7zip. to restore it:

  • dd the MBR to your destination disk
    • dd if=machine.mbr.img of=/dev/sdb
  • refresh the partition table
    • blockdev --rereadpt
  • dd the disk image to the partition
    • dd if=machine.part1.img of=/dev/sdb1 bs=4M status=progress
  • boot into Windows from the newly imaged disk
  • resize the partition to fill the disk using Disk Management

I've tested this and it seems to work. This was on Windows 8, but should work fine with Vista or newer. XP lacks the ability to resize partitions as I recall; I was unable to get ntfsresize to play ball, for some reason it gets mad about the backup file map not matching, I don't know how to fix that and neither does anyone else it seems. So XP/2000 are still wildcards, but for anything newer this should work.


You must log in to comment.

in reply to @cathoderaydude's post:

that's a great suggestion which i know nothing about, haha. without looking it up myself either, i'm thinking CHD is just gonna be "dd with gzip plus some dipping mustards", and not FS-aware, so it'll get got by junk data; but perhaps it's more sophisticated!

I'm looking it up and since it's primary purpose is for mame, I don't think anyone's made a tool to just... mount the compressed image. Which would make it very useful. I'm assuming that it can be used exactly this way given that mame only wants the compressed file.

I just the other day made a post about how optical disks are also fucked in their own way, and someone linked me to a byuu/Near article that explained all of the nuance and proposed a better way of actually putting all of the requisite data, including really important stuff that does NOT get committed currently, into a single file.

Which could, not mentioned in the article, then be stuffed into a CHD to save space.

TBH my take is that it's not up to the archival format to solve this problem. I sorta wish that "img with internal gz" was a common format that I didn't have to do mild hijinx to work with, but other than that, "bit for bit image" is really the ideal. After decades of bizarre formats flying around thanks to proprietary software (norton ghost... shudder) the fact that we've universally settled on a raw sequential byte blob is actually fantastic and I don't want to fuck with that.

The real solution is in methods to clean up the original data. Garbage in means garbage out; if you want to get a clean, perfect image of a data structure in the most generic and universal way possible, then you have to massage that data structure in its native tongue before you image it. That's exactly what I'm trying to do here.

If the NTFS partition is 500GB but contains 18GB of data, then it can be expressed inside 18GB. The ntfsresize step has the function of grabbing all the real data and yanking it down below that 18GB barrier. This cleans up our source data; we can now tell dd "just copy up to 18GB and stop" and be sure that we got a clean copy. The image can't possibly be any smaller than that, so we've won, this is the most ideal possible outcome.

The only thing that really needs to be done is to automate all this, and... well, the result is Clonezilla. It does almost exactly what I'm talking about, minus the part where it saves the images in a simple binary format. But even if it was doing that, it would still mean producing an "image" consisting of a folder with a shitload of metadata files in it, and that still blows. It would be nice to find enough overlap between techniques etc. to generalize all these steps and produce a unified image format that can save and replay them as needed, but, well, I'll take "a good solid workflow" over "a one step workflow"

VHD is a bit complicated, because there is a format of VHD that literally is - the original Connectix Virtual PC VHD file format. it is 100% a dd image, except with a footer past the LBA namespace of the virtual drive in question, with very basic metadata. To my understanding, it is compatible with all existing VHD tooling, and will double-click mount under Windows without trouble.

But most current "VHD"s are either the sparse/snapshottable ("vhd2"?) format, which is based on VHD; or are VHDX, which is completely different and far more involved.

hmmm. if the MBR only has one partition (i don't recall windows disk layout well enough to know if this is the case), then idea:

at that point you should have a single file that you could (theoretically) just dd to a disk and boot. untested though, and i'm just making wild guesses as to the capabilities of a few things here

i've had four beers so i'll put the "i might not have followed your reasoning" flair badge on this reply but: generally you can just throw an MBR at linux fdisk, delete a partition and recreate it to match the new FS size, and it'll work. that's actually a necessary step when using ntfsresize, it doesn't touch the MBR, you have to do that manually.

I think “ntfsclone --save-image to an intermediary blob” is probably the winning option, as it doesn't require you to mount and modify the source file system. However, you could also use SDelete to zero-fill free space on the disk, and then use regular dd + some solid compression.

So I wanted to ask "why not just Clonezilla device-to-device to a file" but it turns out Clonezilla doesn't support restoring/cloning to a loop device, which is really stupid.

But you can work around that with VMs.

  1. Run a virtual machine of your choice.
  2. Create a virtual hard drive at least the same size as the one you're copying
  3. Boot Clonezilla and do a device-to-device copy to that virtual hard drive
  4. Shut down the VM and convert the hard drive image to a raw image with qemu-img.
  5. zip it

This is stupid, but should work.

I like this idea, but I do not know what tooling exists to make a qcow2 of an existing physical disk except maybe qemu-img itself. there's also so many options to give to qemu-img, it isn't entirely clear until you spend 10 minutes trawling manual pages to find which one means "take ntfs partition; trim unused/unallocated/deleted space"

also, guestmount, for mounting, is a bit weird and fuse-based, and there is absolutely no way that I know of to work with it from Windows - you're either using wsl in a hacky shared-drive setup or booting Linux proper.

My understanding of ntfsclone is that if you don’t pass the --save-image option, it will copy all blocks of the source filesystem that are used to the target file, and will “skip over” unused blocks.
That means that any blocks that were unused in the original filesystem will read as zero in the target file; if the target file resides on a filesystem that supports “sparse files”, these unused blocks will not use space on the filesystem. You will get a “500GB” file (as reported by ls) that only consumes 18GB of physical space, and you can then convert it to a a vhd/qcow2/vdi image that will ignore the zero blocks and be 18GB or smaller.

So I think this should work:

  • Create a sparse file the same logical size as the original disk:
    dd if=/dev/zero of=sparse.img bs=1M count=0 seek=500000
    (basically pass count=0 and seek=the size you want divided by bs)
    This should complete very quickly and give you a file that ls will say is 488G but take no disk space.
  • Mount the image to a loopX device:
    sudo losetup /dev/loopX sparse.img
  • Copy the partition table from the original disk to the loop device
    I’m actually not completely sure how to do it, I guess if it’s a MBR partition table copying the first 512 bytes with dd should work
  • Activate the partition table of the loop device:
    sudo partx -a /dev/loopX
    This will create /dev/loopXp1, p2 etc for each partition defined in the partition table
  • clone each partition of the original disk to the corresponding /dev/loopXpY device
    So basically ntfclone /dev/sda1 -o /dev/loopXp1
    If the disk has non-NTFS partitions you can dd them instead (or use an equivalent cloning tool if there is one — EDIT if I’m understanding the man page correctly, e2image -a should do it for ext2/3/4 filesystems)
  • unmount the loopX device:
    sudo losetup -d /dev/loop0

Done! You should now have a raw image of the disk containing only the useful bits.

I have two solutions, differing mainly in the image type presented.

  • WIM. This is the image format Windows is distributed by. Windows has tooling for it since Windows 10, and Linux uses wimlib-imagex, which is 99.8% compatible for all NTFS image-capture/apply/mount tasks, save for some weird Windows install-media idiosyncracies which largely don't apply here. It should be able to losslessly capture any NTFS or FAT32 medium and present the files compactly and ready to reapply to a device, or mount.
  • VHDX. It has a very well-adapted sparse format that allows tooling with filesystem awareness (such as windows/hyperv, but iirc imaging tools do this too) to trim off un/de-allocated blocks. I'm genuinely surprised ntfsclone doesn't use this, as it is an accessible, non-proprietary format.

ntfsclone (and partclone/clonezilla, while we're at it), despite being Free Software™, is cursed with its proprietary file format.