wffl

vaguely burnt

  • it/its

I do stuff; pfp by spicymochi



I wonder how you parse markup languages like markdown which allow just random text anywhere. Like... I know how to handle normal programming languages(TM) -- you tokenize/lex first, transforming text into a pile on "things" (tokens), then you match a couple of them at a time, building structure as you go... But what do you do with markdown? Does tokenization still make sense? What are the tokens? How you handle when things are just text (like [ which is not part of a [link])? How do you properly specify it? [...]

This should not be so confusing. Yet it does confuse me. I should learn more...


You must log in to comment.

in reply to @wffl's post:

i'm pretty sure you would do it in one pass so your parser looks more like a lexer with a bunch of added state. at least that's how mine works but it's also not exactly good so,,, ^^

also markdown is like, ridiculously hard to parse correctly so i don't think there is a simple answer to the [ as a link vs non-link thing

it's just that handling text directly seems like a nightmare...

and yeah, I'm not asking for markdown specifically, because it itself is a nightmare, but more in a "what do I do if I want to invent my own dumb markup language"