boredzo

Also @boredzo@mastodon.social.

Breaker of binaries. Sweary but friendly. See also @TheMatrixDotGIF and @boredzo-kitchen-diary.


posts from @boredzo tagged #dd-parallel

also:

Macs and Linux-based computers come with a tool called dd that copies data from one file—usually a device, such as a hard drive or SSD—to another.

A few years ago, as computers became multi-processor machines, I began to wonder whether dd was taking advantage of this. If you're copying from one physical device to another, separate physical device, you could theoretically read from one while writing to the other. Does dd do that, I wondered?

No. At least Apple's implementation doesn't, last I checked (which, again, was years ago).

Apple's implementation of dd is sequential. It reads a chunk, then writes it. Then it reads another chunk, and then writes it. And so on.

This makes for a simple implementation but not necessarily an efficient one. It is advantageous when both files are on the same physical device, particularly a spinning medium such as a HDD. But when you're copying from one physical device to another, or if they're SSDs and random access is free, copying sequentially becomes a drawback.

So I wrote a parallelized alternative. Lacking creativity in naming, I called it “dd-parallel”, despite making zero effort to reimplement dd's whole interface or feature set.

dd-parallel tries to always be reading and writing at the same time. In practice, reading is generally faster than writing (assuming similar devices, such as the same make and model of HDD) so the reading loop tends to have to wait for the writing loop to catch up. In other words, dd-parallel ends up write-bound—it copies data as fast as the writer can write data to the destination device.

The numbers are pretty good.

With two similar USB 3.0 HDDs:

  • macOS (Catalina) dd copies at 80 MiB/sec.
  • dd-parallel copies at 140 MiB/sec.

With two similar USB 3.2 (10 Gbps) SSDs:

  • macOS (Monterey) dd copies at 370 MiB/sec.
  • dd-parallel copies at 720 MiB/sec… until the SSDs overheat and throttle down to 100 MB/sec. (lol)

I also recently started a pure-POSIX port so this thing can be used on non-Macs such as Raspberry Pis and other Linux machines. Haven't tried it on a non-Mac yet, but on my Mac running Monterey, it performs almost as well (710 MiB/sec) as the Mac-only version. The Mac-only version uses GCD and Foundation, whereas the pure-POSIX one uses a couple of pthreads and some hand-rolled string-formatting machinery.

I'm pretty happy with this. If you want to try it out, the source is on GitHub.


 
Pinned Tags