tef

bad poster & mediocre photographer

  • they/them

go has some tricks for handling dynamically typed values, and how it handles json makes for a good example.

in other static languages, you might write a foo.toJSON() method for each and every type, along with a foo.fromJson() function to return a new object. the effort involved in doing so means this is often done with codegen or macros.

meanwhile, in golang? there's just two functions, bytes, err := json.Marshall(object) and err := json.Unmarshal(bytes, &output), which use reflection to avoid having to create special functions for every type.

for encoding, Marshall reflects over the structs passed to get the field names and values, and converts them to json. for decoding Unmarshall reflects over the empty struct passed in to do something very similar.


type Output struct {
    FieldA int
    FieldB string
}

var output Output

err := json.Unmarshal(bytes, &output)

it's a trick used all over golang, anywhere you'd want to pass in a type for an output function, one example of it is a database library. for one go ORM, you call .Scan(&results) at the end of a big chain of methods, which lets you unpack a row of results into a given struct.

using the &output trick means not having to return map[string]any arguments around, and it's one of the nicer things go lets you do to handle json.

it isn't always nice, though. let's get onto the crimes.


let's say we have some json objects that we want to handle in go: a request message, and a response message.

{"MessageKind": "Request", "Verb": "GET", "Address":"/"}
{"MessageKind": "Response", "Code": 200, "Status": "OK", "Payload": "Hello, World!"}

and let's say we do this with an interface, Message, and several structures that implement it. you might write something like this:

type Message interface {
    Properties() map[string]any
}

struct Request {MessageKind string, Verb string; Address string }
struct Response {MessageKind string, Code int; Status string; Payload string}

func (*Request) Properties() { ....}
func (*Response) Properties() { ....}

... and let's say we have some other struct, with some field M Message in it, which we want to turn to and from json.

the good news? encoding works: the field M points to some struct, the encoder reflects upon it to find out the fields, and spits out json with the right fields.

the bad news? decoding doesn't exactly work. you can't tell golang "hey, if you see a interface, and you need to unpack json into it, use this struct". you can't tell golang, "here's a method for this interface to unmarshall it" because structs, not interfaces, have methods.

so? what's a coder to do? you wrap the interface in another struct, and then tell go how to turn that struct to and from json

type Envelope struct {
        M Message
}

func (e *Envelope) UnmarshalJSON(bytes []byte) error {
        var header struct {MessageKind string}
        err := json.Unmarshal(bytes, &header)
        if err != nil { return err }

        if header.MessageKind == "Request" {
                e.M = &Request{MessageKind: "Request"}
        } else if header.MessageKind == "Response" {
                e.M = &Response{MessageKind: "Response"}
        } else {
                return errors.New("Unknown message: "+ header.MessageKind)
        }
        return json.Unmarshal(bytes, e.M)
}

func (e Envelope) MarshalJSON() ([]byte, error) {
        return json.Marshal(e.M)
}

et voila

Wherever you want to use Message inside a serialized struct, you now use Envelope. When go loads json into the structure, it works out which struct to use, and then returns it. When go creates json from the structure, it's as if the envelope wasn't there.

crimes.


You must log in to comment.

in reply to @tef's post:

You don't technically need the envelope - if you have a pointer to an interface, you can cast it to a pointer to a type which implements that interface (which is how a lot of Go machinery uses interface{} (now any) to shovel things around).

Also, if you don't want the whole double-deserialization thing, check out mapstructure

you don't need the envelope for decoding, which is why the method there is effectively pass through to the underlying struct (cast as a message)

but you do need the envelope for decoding

you can't implement func (m Message) UnmarshalJSON(bytes []byte) error { ... } in go, interfaces don't have methods attached to them, only concrete types do

this is why you need a container struct: to be able to implement UnmarshalJSON

as for the double unmarshalling, i could have written the code to use some third party map unpacking code, or hinted at something with a map of json.RawMessage fragments, or even used something like mongoose, which kinda supports the antics i'm up to

I guess I usually just ended up writing a helper like UnmarshalMessage or such, returning *Message, and then (as I'm sure you've run into) you'll still need some way to distinguish Request from Response to use the actual type in the right places.

One reason I'm glad I don't have to write Go anymore is the number of shenanigans required to work around the lack of union/sum types...

if you have out of band information, like a content type, that makes life easier

i went for the wrapper type, because when you have embedded messages inside other structures, like say i have a Batch struct which contains an array of Message, i can't just lean on json.Unmarshal as the encoder chokes without the wrapper

my solution to this, that i'm working on implementing in favor over a gnarly struct with 30 different struct pointers in it; instead there is a static registry, populated by any number of func init() scattered around the codebase,

type registeredType struct  {
    parse func([]byte) (any, error)
    // more stuff
}

var (
    registry = make(map[string]registeredType)
    registryNamesByType = make(map[reflect.Type]string)
)

func RegisterType[T any](name string) {
    registry[name] = registeredType{
        parse: func(b []byte) (any, error) {
            var t T
            err := json.Unmarshal(b, &t)
            return t, err
        }
    }
    var t T
    registryNamesByType[reflect.TypeOf(t)] = name
}

and you can marshal values back and forth with in-band data like

json.Marshal(struct {
        Kind string
        Value any
    }{registryNamesByType[reflect.TypeOf(foo)], foo})

and unmarshal like

var intermediate struct {
    Kind string
    Value json.RawMessage // defers parsing
}
if err := json.Unmarshal(b, &intermediate); ... { ... }
return registry[intermediate.Kind].parse(intermediate.Value)

if you're registering a function where these types are meant to be fed to, the consumer of this API doesn't even have to spell out the type names in the register call because they can all be deduced from the type of the passed function 👌

if you're getting wild with it and can tolerate that you can even register things by their reflected names, but generally you want the freedom to be able to fix typos and rename stuff.