Hi, cohost global feed! This is part 3 in a series that you can find complete through the "anti-hardware club" tag in which I detail the struggles of finding the least-annoying period-correct PC to play DOS games on, and why, as we approach 2023, wanting period-correct hardware might be more of a commodity fetish than it is a real need for an "accurate" system. In the process of detailing the struggle, I want to talk about ways to deal with the issues that the system will inevitably face that make DOS gaming annoying, especially in the context of modern peripherals, and ways to fix it. Here we're going to talk about ways we can slow down a CPU, and why it's arguably one of the easiest potential compatibility fixes we have on our system, but still very finicky to get right.
So we've established that there's a lot of software that's sensitive to CPU speed, and at least some of it is relevant to the period we want to play games on our theoretical gaming PC, like Wing Commander, which is designed to run best on a 386 or 486 PC. The problem is that we have other games that we would want to run that are newer and require a better system than that. DOOM is an obvious one (not really playable below a 486 DX) and of course any newer 3D game with software rendering (Tomb Raider, for example) is going to have more issues, and even hardware accelerators need a minimum level of CPU throughput in order to be useful (we'll talk about graphics acceleration in a future post, which is a nightmare all unto itself, especially for anyone who wants their DOS games with HDMI video).
So we are seemingly at an impasse. Pick a slower CPU and we can play the older games accurately and able to detect our expansion hardware, but newer games might run poorly. Pick a faster CPU and we'll be able to run newer games at acceptable levels, but old games will run too fast. It seems on paper like this is already putting us at an impasse, but I would argue that we have a lot of options for dealing with this issue on a single machine, leading us to favor, in general, a faster choice for our CPU.
Given we're looking at mid-90s gaming, when DOS support was ending and early Windows games might have to do somewhat hacky things in order to run with console-quality performance, we'll probably want to get a fast Pentium II; any motherboard using a PII should also be well-supported in Windows 98, too, so we shouldn't get compatibility issues with appropriate hardware.
The good news is that making a fast CPU slow is easy (and making a slow CPU fast is nearly impossible, which is why we're not focusing on that). One of the secrets of cycle-accurate CPU emulation is that cycle-accurate CPU emulation is not actually as difficult as people make it sound, as long as we look at a CPU in isolation. There's a video about it, focused on NES emulation (because the video is about coding a cycle-accurate 6502, the NES CPU, in C++) that makes this point very explicitly that you might enjoy watching and lays out in essence what we need to do in order to get hardware compatibility:
The idea for how to do cycle-accurate emulation is this:
- Read in a CPU instruction
- Translate the CPU instruction into one our actual machine's CPU can run
- Execute the instruction
- Figure out how many cycles it would take for the CPU to run that instruction
- Wait as long as it would take for the instruction to run
When we're looking at a CPU in isolation, it really is that simple! (Well, we're still expecting the CPU to get input from somewhere and send output to somewhere -- a CPU truly in isolation does nothing, a point that the video also makes explicitly.) For our theoretical Pentium, we can ignore several of those steps because we don't actually have to do any instruction translation, we can just run the old software on the CPU we have, and then wait however long we need to.
There are a few things we can do to the CPU speed, but they basically fall into one of three categories. The first is disabling CPU cache. This is a pretty blunt change but one that can practically change CPU class. When the cache is on the CPU can store the result of operations it does a lot (or even operations it might expect to do a lot), so it takes up fewer CPU cycles; disabling the cache means that the CPU has to run instructions at their full cycle count without the benefit of, to put it overly simply, remembering what they do. There are different levels of cache (generally, two are common: one that's tied into the CPU, and another that's basically a specialized module of ram that's on the motherboard separate from it) which can be disabled separately, and whose functionality can be controlled granularly for specific uses (i.e., using it for CPU instructions that are vs. might be done regularly).
In general, we can tell what our CPU speed needs to be through the use of benchmarks. A significant one for DOS is the Superscape benchmark, which is a software rendering test. There are a couple versions of it, one of which is more designed to work with fast processors, but using both will allow us to make some judgments about roughly where we want our CPU to be in order to be compatible with various older CPUs. I strongly encourage using this benchmark because it's the most commonly used one and most general, and has a lot results published online for various hardware configurations.
You can get it as part of a compilation of DOS benchmarks here from Phil's Computer Lab, which might be the most comprehensive single site for DOS hardware management. You can see a number of results for various levels of Pentium CPU cache settings here, and examples of targets for older hardware on the same page. Note that a 386, the sort of hardware we might expect to run Wing Commander comfortably on, returns results in the 10-16FPS range for the benchmark, so that's an example of the kind of number we want. Of course, for the Pentium MMX shown on the results page, which is going to be at least a little slower than the Pentium II we're planning for our machine, only gets in that range with basically all cache features off on both the CPU and the motherboard. This may not be enough.
Fortunately, that's not the only option. While it's more common with a laptop (which have a host of issues in serving as a retro gaming PC due to having much more fixed hardware), by this point in time a lot of computers had the ability to vary the system power level, in order to let idling notebooks use less power (in general, the higher the CPU speed, the higher the power draw).
The tool most suited for both these tasks is a single one, actually featured controlling the cache levels on the Pentium in the benchmarking results link seen above: it's a tool called SetMul. Unfortunately, the ability of SetMul to perform this power throttling is not available on Pentium II, but instead the Cyrix C3 or some AMD processors like the K6-2+, K7, and K8. So maybe we should rethink our Pentium II option and instead take an AMD processor of similar capability. While a correspondingly powerful AMD K6 will run at a higher CPU rate (and thus might be less power-efficient overall), we can use SetMul to more granularly control CPU speed than through just cache controls.
By the way, if you want to see how a K6 compares to the Pentium II line, here's a benchmarking video:
It's difficult to overstate just how thorough Phil's Computer Lab is, and how invaluable a resource his various links and videos are; while the VOGONS forum is a necessarily broader resource with more news and information for less general hardware information, there's so much there that broad info can be hard to find.
But I should note: SetMul is not magic, and not all of features are available on all processors. As VOGONS user gerwin warns in the SetMul thread linked above:
Just so you know, If the FSB software is not written for both a particular Southbridge and a particular PLL clock generator chip, it won't do anything. And usually all FSB options of the PLL are already available through jumpers or the BIOS.
So the K6 may not necessarily be the better choice here, as there's no guarantee we'll get as many options as the pentium would support. It's complicated, and there aren't always guarantees. Hence why this guide is being sold as a big ol' warning post.
There is also of course a floor to how much we can lower the CPU through all of this. With cache disabled and clock multiplier dropped to its lowest levels, we may still be getting benchmark results on the upper levels of what a 386 or 486 is capable of. That could be fine, but what if we need more granular levels of CPU control? Well, let's go way back to near the start of this post and our other option might make a bit more sense:
For our theoretical Pentium, we can ignore several of those steps because we don't actually have to do any instruction translation, we can just run the old software on the CPU we have, and then wait however long we need to.
Quite simply, we just tie up the CPU with doing a bunch of irrelevant stuff so that it runs the stuff that we actually want it to run at a certain reduced speed at the rate we want it to. The advantage to this is that we can get a very precise control of how many game-relevant instructions per second our computer can run! The disadvantage is that this is going to be less efficient than power controls, because the CPU can't take advantage of potential power savings in idle states, because we won't be letting the CPU become idle.
As you might suspect for a task that is, in a sense, very simple -- just feed the CPU irrelevant commands until we're ready for a new one -- there are a lot of different tools that can perform this task, some compatible with DOS and others requiring Windows. VOGONS, again, comes to our aid once again with a pretty long list of tools that can help to slow down your computer. Amusingly this list includes the DOS emulation utility DOSBox, not to actually run DOS software, but to control the level of cycles dedicated to its process at a granular level; obviously, it can only run under Windows, whereas other tools are able to run natively in DOS.
That said, because of the limitations of these tools to run under specific operating systems or on specific hardware configurations to unlock all of their potential and needing to calibrate benchmarks to match them by granular modification of their settings, I can't actually give you some master list of all period-correct hardware that will guarantee we can get our 386 software to run at the pace we expect. And if we're going to use a program like DOSBox to try to burden our CPU with unused cycles -- why not just use something like DOSBox on a computer we already have to run the software in the first place? After all, if we're using an Intel-compatible CPU, we're not really translating instructions exactly, just controlling how fast they run and controlling the amount of RAM the programs have access to; if we're running, say, DOOM inside Windows 98, we're doing almost exactly the same thing as we would in DOSBox -- W98 is going to give the program a memory pen and use its drivers to manage sound rather than let the program handle them directly.
So already even with this incredibly simple option where we have a lot of tools to configure one piece of hardware to behave nearly exactly to us as some reference hardware (i.e., a 386 CPU for playing Wing Commander on) we still have a lot of unknowns that can require heavy tweaking to actually get to the state we expect, though we know what that state is from the reference benchmarks provided. DOSBox and related software will save us many of these headaches, so we can focus instead on the experience the games give us rather than trying to optimize it. Just wait until we get into more subjective hardware, like sound and graphics!