echo-parallax
@echo-parallax

There are a couple of WebM video files that appear to animate their own width and height. On Discord (as of this writing), these videos also change the size of their enclosing element. So this video of a piston, for instance, appears to squish the entire message pane!

Four frames of piston.webm as viewed on a Discord server. As the piston contracts, the messages below it scroll up into view, where I'm saying "OMG / WHY CAN THE WEBM VIDEO FORMAT DO THIS"

It turns out it's well-defined!

  • There are a couple of good use cases for videos with changing sizes, mostly in streaming contexts
  • WebM includes the necessary subset of the Matroska media container format to support this
  • The HTML specification has a paragraph for what happens when a video changes its size
  • Chromium and FFmpeg FATE include at least one test case for this (phew!)

I can see Discord potentially modifying their CSS to prevent this, though.

(Disclaimer: video formats aren't my typical forte, so while this was surprising to me, maybe it's well known in the video format space! Also, the motion from some of these resizing can be unpleasant, so this post avoids including any animations of them — there are good use cases in streaming video, but in other contexts keeping a fixed width and height using CSS or by rejecting files with changing sizes may be a good way to proceed.)

In which I try to figure out where the file stores its frame sizes

Loading piston.webm into VLC, we can see that something unusual is going on — the video codec information appears to change when the piston changes size.

Two frames of the piston video in VLC with the Codec Information window open. When the piston is extended, the window shows that Stream 0 has resolution 160x320. When the piston contracts, Stream 0 changes to 160x160. Stream 0's format also changes in the intermediate frames.

(VLC really doesn't like this file — the VLC window size changes when the video does, which probably doesn't help)

This comment suggested viewing the file in FFmpeg's ffplay utility with debug logging turned on. This shows that ffplay has a dedicated message for a video changing size, format, serial, or filter, and also shows the full list of sizes the video changes to. It also shows that it gets the size of the stream before it decodes the frames — that'll be relevant later on!

>ffplay -loglevel debug piston.webm Input #0, matroska,webm, from 'piston.webm': Metadata: ENCODER : Lavf59.16.100 Duration: 00:00:06.24, start: 0.000000, bitrate: 53 kb/s Stream #0:0, 1, 1/1000: Video: vp8, 1 reference frame, yuv420p(tv, progressive), 160x320, 0/1, SAR 1:1 DAR 1:2, 25 fps, 25 tbr, 1k tbn ... (lines omitted) Video frame changed from size:0x0 format:none serial:-1 to size:160x320 format:yuv420p serial:1 ... (lines omitted; on first size change) Video frame changed from size:160x320 format:yuv420p serial:1 to size:160x280 format:yuv420p serial:1 ... (lines omitted; on second size change) Video frame changed from size:160x280 format:yuv420p serial:1 to size:160x240 format:yuv420p serial:1 ... (and so on)

WebM is a subset of the Matroska container format, so we can look at the Matroska element specification to see which Matroska elements could change the frame size. There's only one — Track, with its PixelWidth and PixelHeight members.

To check this, we can use Matroska's MKVToolNix to look at the structure of the file. It turns out that Track isn't what we're looking for — this file contains only two Tracks, one for video and one for audio, so it isn't changing resolutions that way.

A screenshot of the MKVToolNix GUI, showing piston.webm's two tracks. Track 0 is a video track with dimensions 160x320, and Track 1 is an audio track.

One unusual thing about this file is that it has five Clusters, more than a typical file. Each Cluster contains multiple Blocks (audio or video frames) and codec settings. I think these correspond to each of the five resolutions the file goes through — heights of 320, 280, 240, 200, and 160, then back up to 320. There are also nine Cues, so I'm guessing there's one Cue for each change of resolution/video codec setting as the video plays, and these index into the Clusters.

Matroska Cues, Blocks, and Clusters don't contain resolution information, though. But Matroska's a container format — maybe the VP8 frames themselves contain the resolution information?

A screenshot of the MKVToolNix GUI, showing that piston.webm contains 5 Clusters and 9 Cues

And indeed, VP8's key frames contain resolution information! Here's the relevant part of the RFC 6386 specification — width and height are each 14-bit ints, aligned to 16 bits:

   For key frames, the frame tag is followed by a further 7 bytes of
   uncompressed data, as follows:

   ---- Begin code block --------------------------------------

   Start code byte 0     0x9d
   Start code byte 1     0x01
   Start code byte 2     0x2a

   16 bits      :     (2 bits Horizontal Scale << 14) | Width (14 bits)
   16 bits      :     (2 bits Vertical Scale << 14) | Height (14 bits)

   ---- End code block ----------------------------------------

So, we can search for 9d 01 2a to find where there might be VP8 key frames — and that turns out to be where piston.webm stores the size changes!

At 0x11B3: 9D 01 2A A0 00 40 01 // A0 00 == 160, 40 01 == 320
At 0x28E0: 9D 01 2A A0 00 18 01 // A0 00 == 160, 18 01 == 280
At 0x3478: 9D 01 2A A0 00 F0 00 // A0 00 == 160, F0 00 == 240

The first instance is at an odd number of bytes, which is a bit unusual, but it seems to match up.

What are VP8's Horizontal and Vertical Scale bits for? I was curious about this — turns out VP8 has some upsampling modes, maybe for rectangular pixels? Here's the section from the RFC:
   The scaling specifications for each dimension are encoded as follows.
         +-------+--------------------------------------+
         | Value | Scaling                              |
         +-------+--------------------------------------+
         | 0     | No upscaling (the most common case). |
         |       |                                      |
         | 1     | Upscale by 5/4.                      |
         |       |                                      |
         | 2     | Upscale by 5/3.                      |
         |       |                                      |
         | 3     | Upscale by 2.                        |
         +-------+--------------------------------------+

Since everything used scaling value 0, we didn't have to worry about ignoring the top 2 bits to get a 14-bit integer.

Why it works

The Matroska file format website talks about one major use case for videos that can change size: streaming! Matroska's use cases include streaming and broadcasting — even really long streams that may include many different video programs, each with their own formats, aspect ratios (e.g. "stream an old 4:3 video for a while, then stream a 2.39:1 video for a while" without adding letterboxing) — and thus also resolutions.

Alternatively, imagine someone starts screensharing a windowed app, then resizes the app's window. Instead of stopping and restarting the screen, the screensharing program might encode a change of resolution in the stream!

Normally, these size and format changes occur rarely and only on livestreams, but these videos show that it can occur every frame and on local files. (There's at least one game I know of that resizes its own window as a special effect — I wonder how well Discord handles that!)

The HTML specification also mentions that videos can change their size at https://html.spec.whatwg.org/#dimUpdate:

Whenever the intrinsic width or intrinsic height of the video changes (including, for example, because the selected video track was changed), if the element's readyState attribute is not HAVE_NOTHING, the user agent must queue a media element task given the media element to fire an event named resize at the media element.

And the size of the <video> element can change every frame:

Otherwise (the video element has a video channel and is potentially playing)
The video element represents the frame of video at the continuously increasing "current" position. …

which changes how the element is rendered:

In the absence of style rules to the contrary, video content should be rendered inside the element's playback area such that the video content is shown centered in the playback area at the largest possible size that fits completely within it, with the video content's aspect ratio being preserved.

FFmpeg can create these videos by concatenating a series of VP8 videos of different sizes in a Matroska container — that's how the most widely used script for videos with animated sizes (side note: which is a lot, visually) seems to work! After creating a .webm file of a different size for each frame, it ultimately creates an FFmpeg concat file and then runs

ffmpeg -f concat -safe 0 -i path/to/concatfile -i path/to/audiofile path/to/outputfile

These videos don't seem to go beyond their initial size, and that's because Discord sets the <video> elements' max-width and max-height fields (probably based on the size reported by the Matroska/WebM container, scaled down to a maximum bounding box, if I had to guess?)

The piston video viewed in Discord. It has the CSS styles max-width:150 and max-height:300 set. That's proportional to the video's width and height of 160 and 320.

Self-resizing videos don't seem to be supported on the Android device I tested, which seems like a valid choice. (Sites like YouTube either re-encode the video or set CSS properties to keep the width and height fixed as the video plays; the pixels squash and stretch instead of the bounding box.)

Acknowledgements

Thanks to Kat on Discord for posting the WebM file I looked at in this post! My reaction is pictured in the third frame of this post's first image. (I'm not sure where this file originally came from; it looks like it probably wasn't created with the tool linked here.)


@ionchy shared with:

You must log in to comment.