noahtheduke

take data in change it push it out

cis - straight - white - 30 or 40 years old
clojure programmer
living in the shadow of grief


nothing remains forever empty


Profile pic commissioned from @ICELEVEL


chreke
@chreke

“Records” in dynamically typed languages keeps being a source of vexation for me. For example, I really like the idea of Lisp systems, but the fact that Clojure (and several other dynamically typed programming languages) default to representing domain objects as maps freaks me out a little bit.

To be fair, I’ve never worked in a big Lisp code base—my experience is only limited to experiments—but I have worked in large Erlang code bases with lots of business logic, where I would often run into issues with the lack of types for domain objects.

To illustrate my point, consider the following example—you’re asked to extend the following pseudocode to send an email after a user has finished their registration:

(def register-user [user]
  (if (valid-password? (:password user)
      (save-to-db user)
  ...)

The question is, what is user? It looks like it will be a map that contains a :password key, but does it also have an email address? If so, is it under the key :email or :email-address? Is the email a string or is it some kind of “email object”? If there is an email address, is it mandatory? Has it been validated in a previous step or do we need to go back and add validation somewhere?

Clojure / Lisp devs, how do you deal with this? spec? Docs? I know "the Lisp way" is to do interactive and iterative development, but at some point I would assume you end up in a situation where you can't look at what's inside the user variable. (Or maybe this isn't a real problem and I'm just too ML-pilled)


noahtheduke
@noahtheduke

(I wrote the below in a comment on the original post)

I work in enterprise and hobby Clojure, so I have some experience with this question.

This is probably the primary issue with Clojure at scale. You have to either be strict in what kind of objects you create (peer review, extensive testing), or you mimic a type system in the runtime with assertions and validation checks. Both of these can be cumbersome and lead to issues which are hard to notice ahead of time and harder to debug.

However! These issues just aren't that big of a deal.

Most domain objects are bounded and known by developers, they have common keys. Your example of a user object might seem opaque because you're looking at a single function, but if you look at the codebase as a whole, you'll see what the other keys are. But also, take a step back and ask yourself, why do you need to know what the other keys are? How does it matter at this moment? If you don't need them right now, then user here really is just a map that has the key :password and that's good enough. This will let you write tests that simply call (register-user {:password "hello123"}) instead of creating a whole test.

However, if you do need a whole "user" object and want to have some sort of documentation, there are a couple other solutions:

  1. You can reach for defrecord which will act like a regular map but has defined keys (in addition to other niceties). That doesn't help you in the instance where your function takes a user with no docstring, unless you know to look for a (defrecord User [first-name last-name user-name password]) in some other file, but it is generally helpful and once you know, you're not going to forget.
  2. You can write assert calls yourself, using defn's built-in :pre and :post assertion metadata to make sure you have the correct types or shapes when executing a given function.
  3. You can use one of the many validation libraries such as spec or malli (my preferred library) or Schema. These allow you to define the shape of your data and then either manually call the validation function (eg (assert (s/valid? ::user user) "expected a user")) or instrument a given function (check inputs/outputs at runtime before passing to the original function).

Coming from a static type background, it's easy to reach for these cuz they do provide a lot of the same features as static type systems except at run time, but all except records are slower and I think you'd be surprised how infrequently there are type or data issues in Clojure programs.

Because Clojure is built on a mindset of "many functions that operate on few types", you rarely run into the issue of performing an operation on the wrong type. They do happen, but it's not like calling .append() when you meant to call .push() because one is a vector and the other is a linked list. So here, you have a user and need to operate on it. You use assoc, update, dissoc, get, etc the same as if you had an organization or a chat-room or a game-state. All of the verbs are the same because they're all maps. Vectors (when given integer keys) work the same way: (assoc [0 1 2] 1 100) returns [0 100 2]. If you have invariants, you can write custom functions like update-password which will perform validation, but generally you'll instead write (defn check-password [old-password new-password] ...) and then write (update user :password check-password new-password) which still relies on the basic verb for the user object and performs the check locally to the specific data.

There's a common refrain/meme in the Clojure community about pushing side-effecting code to the boundaries, and there you can use validation libraries, but that your "business logic" should be pure functions which can be tested with unit tests or generative tests or even just in your large-scale integration tests because you wrote it at the repl and verified as you wrote it that everything worked correctly and can trust that it won't change out from under you (if the input is immutable, then other parts of the code can't touch a given function).

In jinteki.net's codebase, we don't use any validation libraries and perform almost no validation at run-time. We have a couple records to define the core objects and then don't worry about it. A card is a card is a card so there's never any confusion about what a card is or likewise what a player is or even what state has on it at any given point. And yet, we rarely have type errors or NPEs, mostly just horrible logic bugs from years of tech debt and poor technical decisions long ago (but that's a rant for another day lol). Even when I've performed +9136 -7965 PRs, they went off mostly without a hitch.

At my job, we use json schema (legacy code) and malli (new code) for validating our endpoints, and I've introduced malli for some domain objects too, due to the sheer size of the code base. It's not caught on super well because for the most part, it's not needed. Our domain objects are relatively simple, our names are consistent, docstrings say what they need or provide, and our massive integration test suite -- while slow -- covers all of the ground one could hope for.

The places where it matters are the files that have 50+ functions, each expecting a different custom object ("I need to merge and pivot these two custom objects, so i'll write a new function and have it return a third custom object, but i won't write any docstrings or comments. i'm a genius") and tangled together. Adding some malli schemas and then even just using those in docstrings removes 90% of the issues.

I like Typescript and I've liked the little bits of Ocaml that I've tried (but fucking up polymorphic equality is such a mistake that I just can't keep it up lol), and I really like when there's head-of-time checking for object validity. TS is killer in that regard. It's a bummer that Typed Clojure never caught on or got good enough. That being said, I don't miss it much.


You must log in to comment.

in reply to @chreke's post:

Yeah, it's an issue. I haven't done Enterprise Clojure for awhile, but back when I was doing it my favorite way was:

  • Declare and validate data shapes for function parameters/return values at module boundaries. spec is a good tool for this, so are the various schema libraries.
  • Within a module, write a bunch of tests to make sure that your various internal functions behave sanely as long as you have valid data coming in. If you're doing spec, you can pretty easily test.check your way to happiness.

I wish the language was more prescriptive on this front.

I work in enterprise and hobby Clojure, so I have some experience with this.

This is probably the primary issue with Clojure at scale. You have to either be strict in what kind of objects you create (peer review, extensive testing), or you mimic a type system in the runtime with assertions and validation checks. Both of these can be cumbersome and lead to issues which are hard to notice ahead of time and harder to debug.

However! These issues just aren't that big of a deal.

Most domain objects are bounded and known by developers, they have common keys. Your example of a user object might seem opaque because you're looking at a single function, but if you look at the codebase as a whole, you'll see what the other keys are. But also, take a step back and ask yourself, why do you need to know what the other keys are? How does it matter at this moment? If you don't need them right now, then user here really is just a map that has the key :password and that's good enough. This will let you write tests that simply call (register-user {:password "hello123"}) instead of creating a whole test.

However, if you do need a whole "user" object and want to have some sort of documentation, there are a couple other solutions:

  1. You can reach for defrecord which will act like a regular map but has defined keys (in addition to other niceties). That doesn't help you in the instance where your function takes a user with no docstring, unless you know to look for a (defrecord User [first-name last-name user-name password]) in some other file, but it is generally helpful and once you know, you're not going to forget.
  2. You can write assert calls yourself, using defn's built-in :pre and :post assertion metadata to make sure you have the correct types or shapes when executing a given function.
  3. You can use one of the many validation libraries such as spec or malli (my preferred library) or Schema. These allow you to define the shape of your data and then either manually call the validation function (eg (assert (s/valid? ::user user) "expected a user")) or instrument a given function (check inputs/outputs at runtime before passing to the original function).

Coming from a static type background, it's easy to reach for these cuz they do provide a lot of the same features as static type systems except at run time, but all except records are slower and I think you'd be surprised how infrequently there are type or data issues in Clojure programs.

Because Clojure is built on a mindset of "many functions that operate on few types", you rarely run into the issue of performing an operation on the wrong type. They do happen, but it's not like calling .append() when you meant to call .push() because one is a vector and the other is a linked list. So here, you have a user and need to operate on it. You use assoc, update, dissoc, get, etc the same as if you had an organization or a chat-room or a game-state. All of the verbs are the same because they're all maps. Vectors (when given integer keys) work the same way: (assoc [0 1 2] 1 100) returns [0 100 2]. If you have invariants, you can write custom functions like update-password which will perform validation, but generally you'll instead write (defn check-password [old-password new-password] ...) and then write (update user :password check-password new-password) which still relies on the basic verb for the user object and performs the check locally to the specific data.

There's a common refrain/meme in the Clojure community about pushing side-effecting code to the boundaries, and there you can use validation libraries, but that your "business logic" should be pure functions which can be tested with unit tests or generative tests or even just in your large-scale integration tests because you wrote it at the repl and verified as you wrote it that everything worked correctly and can trust that it won't change out from under you (if the input is immutable, then other parts of the code can't touch a given function).

In jinteki.net's codebase, we don't use any validation libraries and perform almost no validation at run-time. We have a couple records to define the core objects and then don't worry about it. A card is a card is a card so there's never any confusion about what a card is or likewise what a player is or even what state has on it at any given point. And yet, we rarely have type errors or NPEs, mostly just horrible logic bugs from years of tech debt and poor technical decisions long ago (but that's a rant for another day lol). Even when I've performed +9136 -7965 PRs, they went off mostly without a hitch.

At my job, we use json schema (legacy code) and malli (new code) for validating our endpoints, and I've introduced malli for some domain objects too, due to the sheer size of the code base. It's not caught on super well because for the most part, it's not needed. Our domain objects are relatively simple, our names are consistent, docstrings say what they need or provide, and our massive integration test suite -- while slow -- covers all of the ground one could hope for.

The places where it matters are the files that have 50+ functions, each expecting a different custom object ("I need to merge and pivot these two custom objects, so i'll write a new function and have it return a third custom object, but i won't write any docstrings or comments. i'm a genius") and tangled together. Adding some malli schemas and then even just using those in docstrings removes 90% of the issues.

I like Typescript and I've liked the little bits of Ocaml that I've tried (but fucking up polymorphic equality is such a mistake that I just can't keep it up lol), and I really like when there's head-of-time checking for object validity. TS is killer in that regard. It's a bummer that Typed Clojure never caught on or got good enough. That being said, I don't miss it much.

bona fides:

~/work/redacted [main ≡]
$ tokei
===============================================================================
 Language            Files        Lines         Code     Comments       Blanks
===============================================================================
 Clojure               974       248707       232202         3668        12837

~/personal/netrunner [master ≡]
$ tokei
===============================================================================
 Language            Files        Lines         Code     Comments       Blanks
===============================================================================
 Clojure               172       123430       111828         4494         7108
 ClojureC                4         1023          858           38          127
 ClojureScript          48        14221        13185          142          894

Thanks for the great reply! Very interesting to hear from someone who actually uses Clojure “in anger”

Most domain objects are bounded and known by developers, they have common keys. Your example of a user object might seem opaque because you're looking at a single function, but if you look at the codebase as a whole, you'll see what the other keys are. But also, take a step back and ask yourself, why do you need to know what the other keys are? How does it matter at this moment? If you don't need them right now, then user here really is just a map that has the key :password and that's good enough. This will let you write tests that simply call (register-user {:password "hello123"}) instead of creating a whole test.

This is a good point; you’re right in that most of the time it doesn’t matter—I guess you can say that the interface is defined by how the type is used; any map with a :password key implicitly conforms to this interface. I guess the problem has more to do with data flow, e.g at this point in the program what data do I have access to? Do I already have the user’s email or do I have to fetch it from somewhere?

The places where it matters are the files that have 50+ functions, each expecting a different custom object ("I need to merge and pivot these two custom objects, so i'll write a new function and have it return a third custom object, but i won't write any docstrings or comments. i'm a genius") and tangled together. Adding some malli schemas and then even just using those in docstrings removes 90% of the issues.

This is exactly the kind of issue I had as well, lol. It’s a pattern I’ve seen in several code bases where you might have some “base object” that gets extended on demand (ad hoc); e.g you might have a “user” object that gets extended with “user details” etc. This makes the code base really hard to reason about as you constantly have to second-guess if you’re dealing with a “regular” or “extended” object or if you need to deal with both.

I guess the problem has more to do with data flow, e.g at this point in the program what data do I have access to? Do I already have the user’s email or do I have to fetch it from somewhere?

Yeah, that can be annoying. I'm probably desensitized to the annoyance, I typically travel up and down the call stack to see the context of a given function as I read it, but when I worked in typescript, it was nice to be sure the thing had the expected shape no matter what. The solutions in Clojure are read the calling functions and add some validation (and then see which tests break lol). It's not ideal.

Fun fact, my first 3 years of Clojure, I didn't use any linting or lsp, I just had vim syntax highlighting and used ripgrep to search the codebase when I needed to find out how things were getting used. It's absolutely hell, but also gave me a really strong ability to hold the whole call structure in my head at once lol. I wonder how much that has affected my current preferences.

One thing that I forgot to mention is that developing at the repl means unless there's side effects, you can just check any function you want at any moment. "What's this function do?" just type in (register-user {:password "poop"}) and then eval it with your editor, and see what's returned. It's a marvelous way to learn and iterate on a codebase, imo.