• she/her

it's me! es. i'm also on bluesky, twitter, and mastodon.


unascribed
@unascribed

If you're anything like me, you've wondered what would happen if you used a compression algorithm for something it was not designed for. A lot of the time, the results are boring — you just get really large output.

However, image and audio codecs are special. Nowadays, psychoacoustic and psychovisual codecs are used — that is, codecs that are tuned specifically for human perception, and throw away information that humans will not notice. A great example of this is that most audio codecs will simply discard all audio frequencies above 20kHz. But the characteristics of the visual system and the auditory system are very different.

To explore this idea, I've created a dedicated page on my site with an interactive demo showing a sample image converted to audio, compressed with a lossy codec, and then back again, with various different codecs and parameters to choose from. That page has an extended introduction, I'll reproduce some of it here:


twilight-sparkle
@twilight-sparkle
This page's posts are visible only to users who are logged in.

You must log in to comment.

in reply to @unascribed's post:

this is super cool. i've been doing audio databending on images for a while now but somehow i've never thought to try using a compressed format as part of the process.

I've been aware of Audacity databending for a while — it's interesting the kinds of things audio processing filters do when applied to general data. As a codec nerd, my immediate thought is to use a lossy codec in addition. :P

I'm working on a revision to the site with more pixel formats and codecs — the results from low-bitrate Speex are actually pretty interesting:

FFmpeg doesn't support codec2, so it's out of scope for now as I can't plug it into my encoding harness.

Someone mentioned this on fedi as well, but I forgot to update the comment here — my specific system I use to get an FFmpeg build doesn't support codec2 unless you turn on an option that builds almost every feature, which I don't want, so, a bit of a rock/hard-place as I also wouldn't want to build FFmpeg manually all the time.

The person that mentioned this on fedi did some test encodes using my script snippet, and found that even at max bitrates codec2 emits completely unrecognizable smears very similar to low-bitrate Speex.

I tried compressing text data as JPEG and MP3 once. IIRC, you could make out some words in the JPEG-compressed text but the MP3-compressed one was totally garbled. Your approach is more interesting.

takes me back to fucking around with images in audacity a whole decade ago, a bunch of people i knew online were doing that and it was a great time, i still have some of those saved as wallpapers. i've wanted to mess around with it again but been short of novel ideas, so this could certainly be something to play with

Maybe I missed it but what order are the pixels in? I was thinking whether something like a Hilbert curve where a following pixel is always a neighbor of the previous one* would change anything. This is probably less straightforward to implement with planar formats and chroma subsampling, though.

*edit: while still true, locality is preserved only in 1D->2D conversion, not the other way, makes sense ig