fools-pyrite

computer toucher; RPG enjoyer


engineers who Know What They Are Doing often say "make error recovery your default path." I'd never really grasped what it meant, until I worked through it for myself this week!

I was writing a little service that fetches a spreadsheet, parses its values, and then pushes the results to our database and two external providers. My naive design was "pull the spreadsheet, diff against the DB, and use the diff to push changes to the DB and both providers." This is nice and easy; you diff the incoming changes against the source of truth, and then make each change once.

Unfortunately this doesn't work if something went wrong with either provider! If they had some unexpected downtime or the process crashed, for example, you'd need a separate recovery path. Otherwise you'd check against the DB, find no changes, and never update the providers who have fallen behind.

The solution is pretty simple: diff against the DB / each provider separately, and apply each changeset separately. If Provider A didn't get the last batch of changes for whatever reason, the next time we run the service it will! Because the code assumes that it's operating in an error state (a provider was down, the message was dropped unexpectedly, someone powered off the server halfway through), recovery is our default behavior.

Maybe this example can help other people on the systems journey, or maybe it just makes sense for me lol


You must log in to comment.