gankra

gay cat who writes your docs

Author of The Rustonomicon and Learn Rust With Entirely Too Many Linked Lists. Also I made a lot of homestuck's games.

Making rust meta-devtools at axo.dev


sunshowers
@sunshowers

In part 2, I'd talked about how when you press Ctrl-Z, the shell sends the SIGTSTP signal to the process that's running.

While that is true, it isn't quite the whole truth. In reality, the shell sends a signal to the entire process group.

What is a process group?

On Unix systems, a process group is a collection of processes. Each process group is created out of an initial or top-most process, and a process group ID (PGID) is the same as the process ID (PID) of the top-most process.

Process groups are most commonly used by shells. Whenever you run a command from a shell, a new process group is created for the command. Here’s some example output for ps fo pid,pgid,comm:

    PID    PGID  COMMAND
  16528   16528  zsh
 520283  520283   \_ cargo
 520359  520283       \_ rustc
 520387  520283       \_ rustc
 520642  520283       \_ rustc
 520644  520283       \_ rustc

In this example, zsh (PID 16528) has created a process group for cargo (PID/PGID 520283). The cargo process has spun up four rustc processes, and each of those has inherited its PGID from the cargo process.

Why do process groups exist?

The main purpose of a process group is to be able to send signals to it atomically. In the above example, if you press Ctrl-C while cargo is running, SIGINT is sent to all of the processes in the process group 520283—this means the cargo process, as well as the four child rustc processes.

For nextest, you'd expect this to mean that when you hit Ctrl-C in the terminal, all child tests terminate, and nextest exits right away rather than having to wait for tests to finish running. Similarly, when you hit Ctrl-Z in the terminal, you'd expect that nextest as well as all child tests receive SIGTSTP and are suspended.

However, that's not what happens. In reality:

  • Nextest creates a separate process group for each test. (I have a blog post coming soon for why nextest does this. The tl;dr is that if a test times out, nextest needs to kill the test process as well as any children it starts.)
  • Also, each process can only be part of one process group. In other words, process groups don't form a tree.

Here's a snapshot of ps fo pid,pgid,comm while nextest is running:

    PID    PGID COMMAND
3931685 3931685 zsh
 689343  689343  \_ cargo-nextest
 690696  690696      \_ process_kill_on
 690711  690696      |   \_ bash
 690730  690696      |       \_ sleep
 691559  691559      \_ rt_threaded-fbe
 691627  691627      \_ rt_threaded-fbe
 696315  696315      \_ time_sleep-6107
 696905  696905      \_ tokio-4f33ad8bb

In this example, zsh has assigned cargo-nextest a new PGID (689343). In turn, cargo-nextest has assigned each test its own PGID (e.g. 690696). The individual test PGIDs are unrelated to the main nextest PGID.

The overall result of this is that only the nextest process receives the ctrl-Z -- none of the child processes do. But it also suggests a way forward: when cargo-nextest receives a ctrl-Z, it needs to simply forward that to child processes.

Did we shoot ourselves in the foot?

The change to put each test in its process group was made in July, well after nextest first came out. At first glance, it seems like we've unnecessarily made things harder on ourselves by assigning each test its own process group.

That's not quite the case! Even setting aside the benefits of putting each test in its own process group, they add very little complexity on top of what's required anyway, for reasons that we'll go into in future posts.

In the next part of this series, we're going to see what actually happens if you press ctrl-Z while nextest is running.


You must log in to comment.

in reply to @sunshowers's post:

Also, each process can only be part of one process group. In other words, process groups don't form a tree.

I hate this so much lol, what an accursed system. What the heck does this restriction make work better. Or is just for OS effeciency?

Old unix jankiness afaict, there's no good reason. You'd probably use cgroups or something today except it's a lot more heavyweight.

Windows has a similar notion with job objects, and Microsoft actually fixed them in Windows 8 to be a tree.