• she/they/any

software engineer | blaseball tool maintainer

avatar by cinnamon_shakes

occasionally 18+


catball
@catball

you can add the flag --rsyncable when making gzip and zstd archives and they'll rsync way faster and generally less than 1% larger than if you didn't

(it puts little synchronization checkpoints in the file that rsync then doesn't have to calculate on the fly iirc)

e.g. here's some flags i like:

tar -I"zstd -T0 -19 --rsyncable" -cvf stuff.tar.zst file1.txt file2.txt ./directory/to/files/

(note that using --long and --rsyncable at the same time might reduce the benefit to rsync speed since the rolling hash sync points added by zstd are based on the compression window size)

edit: the above example works with gnu tar, but -I means something different with bsd/macos tar. will make another example in a bit


lexyeevee
@lexyeevee

When you synchronize a compressed file between two computers, this option allows rsync to transfer only files that were changed in the archive instead of the entire archive. Normally, after a change is made to any file in the archive, the compression algorithm can generate a new version of the archive that does not match the previous version of the archive. In this case, rsync transfers the entire new version of the archive to the remote computer. With this option, rsync can transfer only the changed files as well as a small amount of metadata that is required to update the archive structure in the area that was changed.


tef
@tef

doing cat a b > big; gzip big and gzip a; gzip b; cat a.gz b.gz > big.gz will both produce valid gzip files, and uncompress to the same data.

the intended use case is things like log files: you can gzip them individually, then concat them together to make one big gzipped file without decompressing and recompressing the individual parts.

this is also how gzip --rsynchable works, too

by breaking up a stream into smaller chunks and gzipping them individually, you create a compressed file that doesn't change much when an individual chunk is updated.

the real trick is that the chunks are variable width.

we do this with a crc32 and a sliding window, ending a chunk when the checksum reaches some particular value. even when you edit the file, or move parts around, enough of the file should still end up in the same chunk as before, and those chunks will be unchanged in the compressed file.

this allows rsync to do it's thing and skip over unchanged parts of a file to speed up replication

(and iirc, it uses a sliding window and a crc32 to break the file up into chunks, too!)


You must log in to comment.

in reply to @catball's post:

in reply to @lexyeevee's post:

It follows perfectly from how rsync works, and this also applies to .zip files naturally because the files within a .zip are compressed independently from one another.

I'd be surprised if the savings loss were even 1%. .tar.gz et al has that as a handwave of why to compress the whole serialized archive all at once, when in reality it's pretty unlikely that the compression window will be all that beneficial when spanning separate files.

in reply to @tef's post: