send a tag suggestion

which tags should be associated with each other?


why should these tags be associated?

Use the form below to provide more context.

#RLE


I figured it would be time to get back to some Spectrum Next programming, and since NextSync has been on my mind lately I figured it would be fun to update it.

Last time I was working on something for the next it was a video player, but I hit some issues which required a new core and operating system version, which were not quite ready at the time.

The new versions have been out for a while now and one thing that got broken by the update was NextSync. Kind person with the nick SevenFFF figured out what was wrong and made a fixed version, so me getting back to development should be relatively pointless. Developing without NextSync is rather awkward =)

Longer-term goal for NextSync is to convert it from C-with-some-assembly to purely assembly.

First thing was to simplify things. NextSync is too flashy for its own good, so I ditched the custom font and custom text prints and real-time progress view.

Then I started pondering about one old idea I had: compressing on the server and decompressing on the Next. The compression and decompression would have to be fast enough not to slow things down too much, or there would be no point in doing that. And in practice, we want to reduce the number of packets sent over the network because handshaking between packets is what is taking a lot of time.

I figured I'd implement a very simple RLE compression scheme - if things worked out, it would be relatively straightforward to add other compression schemes later. Here's where I fell into python optimization rabbit hole.

Compressing with RLE comes down to finding spans of 3 or more bytes with the same value, and then splitting the data to "run" blocks (do N copies of X) and "skip" blocks (copy N bytes from input). I took a page from NextFli compressors and the exact scheme is as follows:

  1. Skip: [len][bytes]
  2. Run: [len][byte]
  3. Repeat

Where length is encoded as follows: If length is less than 0xff, encode it as one byte; if length is 0xff or more, encode it as 0xff followed by 16 bit length value.

After implementing the compressor and decompressor in python I run into a problem: the compressor only managed about 0.5 megabytes per second, which, while fine for most Spectrum Next related files (which tend to be 100k or less) it was still really annoying when compressing a 150 megabyte file took 250 seconds.

I wrote an external exe in C to do the compression and ran that exe from the python script, and compressing that file took 0.78 seconds. That includes loading it into memory in the script, spawning the process, loading the file into memory in the exe, compressing it, writing compressed data out and loading the compressed data in the script. Yay for disk caches!

While quite fast, using an external executable isn't portable and I know people use the NextSync server on macs and linux boxes, so I went back to looking at the script.

I found out python has built-in profiler, which let me spot some interesting things, like calling len() of a list actually takes some time even though the list doesn't change; storing the value in a temporary variable is faster. And appending to a list was a clear bottleneck.

Since the decompressor was simpler and faster (taking about a minute to decompress the 150 meg file) I concentrated on optimizing that. If I found something, I might be able to apply the same optimizations to the compressor.

Through googling I found some suggestions. Apparently appending to a list is slow because of memory allocations, so a common suggestion is to pre-allocate it to a huge pile of [None].

This made the decompressor slower. Probably because handling the output wasn't simple append but required additional variable to keep track where we're putting the data.

Another suggestion was to use bytearray instead of list, since all my data were bytes and the internal representation is a literal array of bytes.

This made the decompressor slower. I don't know why, but if I had to guess, lists are so central to python that they've probably seen more optimization than the byte array.

Yet another suggestion was to use memoryview to reduce copying.

This made the decompressor slower. I don't even care to figure out reasons why at this point.

In the end I ditched all the above attempts and just made the decompressor more pythonic; instead of playing with single indexes I used slices a lot. The decompression time went from 60 seconds to about 12.

If it worked for the decompressor, what could I do to make the compressor more pythonic? There's no ready function to find spans of bytes. Except, maybe, regular expressions.. The python re module has a fun function called "split" which I could use to split the input into a list of lists; each list would contain either run or skip data.

By combining that list of lists to a single list, I should get the original data. Or I could encode each sub-list to a compressed form.

Armed with https://pythex.org I got to working. I'd need a regular expression that both matches and captures the repeated sequences, or the split would not work. So "(.)\1{2,}" for instance matches fine but only captures the first byte. Eventually I found that the correct solution would be something like "(?=.)(\1{2,})" but that does not work with python since you can't refer to a non-capture group (which makes them a bit useless). I ended up with "((.)\2{2,})" which does what I want, almost: it captures the whole sequence and then captures the first item. I added logic in python to just skip the next one after each run sequence. And as they say, we were in business.

Compressing the 150 meg file now took about 25 seconds. Still far from C's 0.78, and if someone really wanted, they could still use the C solution; but being 10 times faster than my original python compressor I figured it was a pretty good cross platform solution.

I still haven't updated my Next to the new core or updated the operating system... I guess that's Next, then. =)