type of creature



posting about computers unfortunately

safari user here to break your css in the name of web diversity

maybe i will simply use this account to subject you to rabbit holes that are relevant to exactly zero people. i warned u about following me & now you will suffer the consequences!!!

The Toon Boom Harmony TVG file format

in which i attempt to reverse engineer a proprietary undocumented binary format ;-;
i will now share my suffering in gruesome detail
Section Index

as you know, people often use wicked sorcery to make objects move as though they are alive, in a process called “anmiaiton.” however, sometimes they will also use “computer technology” for this purpose, such as “Toon Boom Harmony,” a relatively popular 2D animation software. I find Toon Boom Harmony is notable for its very nice vector drawing tools, and especially its vector pencil tool.

The Harmony pencil tool creates strokes in a format not possible in other common vector graphics formats such as SVG: a Bézier spline with variable width. The thickness data is another Bézier spline, making this a Bézier-Bézier offset curve.

An example stroke showing the center line (orange) and thickness data (red).

This way, you can just adjust your lines freely without worrying about messing up your line thickness, and vice versa. I think the people at Toon Boom are aware that this is pretty neat, because they make you pay extra for this feature.

Now, sometimes, you might wanna take your Toon Boom Harmony project and export it just a little bit. Just get that data out so you can use it somewhere else.

You can render it to a raster image, but that’s no fun. You lose all the benefits granted to you by the vector format. Toon Boom Harmony also lets you export to PDF, but PDF, frankly, sucks, and it also does not preserve the strokes created by the pencil tool, because such data cannot exist in PDF (I think they get converted to outlines).

So maybe what you really want is to be able to read the files the drawings are stored in themselves. And by you I mean me because most people probably really don’t care.

While the Toon Boom project files are in semi-human-readable-ish XML, the drawing files are not. They’re in a proprietary binary format called “TVG” which I assume stands for “Toon Boom Vector Graphics” or something. I had a look around, and this format is not documented. It seemed nobody had tried to reverse engineer it either. So I decided I might as well have a go!!

The Toon Boom Harmony license agreement forbids that you “6.1.5. modify, reverse engineer, decompile, disassemble, or create derivative works from the SOFTWARE or its proprietary source code,” so I decided not to do that because i would probably go to jail forever. Also, frankly, reversing stripped & optimized code sucks. Probably. I imagine.   not that., i would have ever, done anything ofthe sort   ,

So, I’m not a lawyer, but I imagine it would be legal to treat the software as a black box and simply examine the file format itself using files I created, since those files are not part of the software.

Let’s open it up!!

ReHex showing a TVG file.


that sure is lots of binary data! There are several interesting things of note here already:

  • The file magic is probably OTVGfull
  • For some reason, TVG files contain a “certificate.”

I, writing this right now, have future knowledge: this certificate is tied to your software license and is the same in every file you create. It’s identifying information, so I blurred it out.

I don’t know… why… you would do this? Putting, like, a certificate of authenticity™ in every single file a user creates? It’s certainly a very strange choice.

Continuing on,

A TVG file with several of its 4-byte tags and length markers highlighted.

TVG seems to use a common pattern found in binary file formats: 4-byte tags, followed by a length, and then followed by the data. I think it’s so nice of them to use string tags like this instead of, like, numeric enums. I don’t know what CREA and ENDT are supposed to mean, but tUAA, tCAA, tLAA, and tOAA (off-screen) are very likely the data for the four layers in a toon boom drawing (underlay, color, line, and overlay).

Every tag is also accompanied by another 4-byte tag that either says UNCO or ZLIB. Given that ZLib is a compression library, this is probably the data encoding. Since we can read the UNCO data just fine right here, this tag probably means “unencoded.”

There’s also another piece of identifying information here in the TVCI tag (toon boom vector… creator… information…? maybe?): the hostname of my computer and the software name. The hostname is again a very weird thing to include. Beware of sharing toon boom projects, I suppose…

Scrolling past the huge zlib blob, you can find a few more things at the end:

The end of this TVG file.

There’s another zlib blob in the TPAL tag, which is probably the color palette. This is followed by TTOC, which seems to contain the byte offset of every listed tag, (toon boom… table… of contents…?) and finally some kind of cryptographic signature.

It seems all the interesting stuff is in the ZLib-compressed data… guess it’s time to unzlib some stuff! I started with the palette data since that seemed easier.

Basic Palette Data: Reversing Is So Easy

I would like to note it took me an unreasonably long time to decode the ZLib data, because I thought I could just shove those bytes into ZLib and it would work. It did not work. It took me several hours to find out (including reading the ZLib specification… because I wasn’t sure this was ZLib data at all!) that the first 4 bytes of the data are not, in fact, part of the ZLib data. They are just another length value. Specifically, the length of the uncompressed data. Only the bytes after that can be shoved into ZLib and decompressed properly. So that was a thing.

Anyway, un-zlibbing the palette data already reveals several immediately readable things:

The palette data.

You can see some text like “Black” (the name of the color) and “2022-05-18” (the name of the project I stole this TVG from). Since every second byte in the text is zero, I am going to assume that this is UTF-16 LE.

Again there are 4-byte tags followed by length and data, so after annotating a little bit…

The palette data, annotated.

you can see that the color is actually just made from two tags: TCSC and TCID. TCSC contains 00 00 00 FF, which is just the color’s actual value in RGBA format (in this case, black). And TCID seems to contains identifying information like the name of the color.

Toon Boom projects actually very conveniently have the palette data also available in text form inside .plt files.

The one for this project reads:

ToonBoomAnimationInc PaletteFile 2
Solid    Black                      0x0a46da1a56b5abe6   0   0   0 255
Solid    White                      0x0a46da1a56b5abe9 255 255 255 255
Solid    Red                        0x0a46da1a56b5abec 255   0   0 255
Solid    Green                      0x0a46da1a56b5abef   0 255   0 255
Solid    Blue                       0x0a46da1a56b5abf2   0   0 255 255
Solid    "Vectorized Line"          0x0000000000000003   0   0   0 255

That very suspicious hex number there can also be found in the data (marked green). This is probably some sort of internal color ID.

The only thing left to figure out is the 10 bytes at the beginning. If I open the TPAL data for a drawing that uses two colors, the first byte at the beginning changes to a 2, so that’s probably the number of colors.

And finally, for the 79 00 00 00 00 00… well, I have no idea. This seems to just be some sort of header before every color entry. It doesn’t look important, though.

Well, that was easy! Surely the layer data in tLAA will be no different!

Continued In: What The Fuck Is This Number Format

in reply to @blep's post:

99 chances out of 100 says the personal identifying information is so that if a big animation studio pirates their software they can prove in a lawsuit just how many files were created using the pirated license (+++damages!)