ysaie

31 / ⚧ / code, music, art, games

──────────────────────────────
🌸 many-shaped creature
✨ too many projects
🚀 cannot be stopped
🌙 stayed up too late
:eggbug: eggbug enjoyer
──────────────────────────────
header image: chapter 8 complete from celeste
avatar: made using this character builder


📩 email
contact@echowritescode.dev

so i'm writing a parser in Python for some language (specifics don't matter), and most of my sub-parsers end up looking like this:

consume_whitespace_and_comments()
keyword = parse_keyword()
consume_whitespace_and_comments()
identifier = parse_identifier()
consume_whitespace_and_comments()
operator = parse_operator()
consume_whitespace_and_comments()
...

it's a (admittedly somewhat scuffed) recursive descent parser, so i figured i could skip having a separate tokenization phase and just do that while i'm parsing. it works, but it's really annoying to keep having to write consume_whitespace_and_comments().

does anybody have a good pattern for avoiding this? if it's "just tokenize it" that's alright, just wondering about other approaches


You must log in to comment.

in reply to @ysaie's post:

this sort of thing is what really got me into parser combinators lol. in this situation i'd take a page from it and introduce a "delimited by" parser that takes other parsers as parameters. first parameter would be your "delimiter" parser, and the rest (could be either a list or varargs depending) would be a sequence of parsers that represent the data that's delimited. in the parser itself you'd just run the delimiter parser first then iterate through your other parsers, making sure to run the delimiter parser after every iteration.
as for using parsers that return data... either you could have the "delimited by" parser return a list of results of each (non-delimiter) parser, or you could pass lambdas that assign the values to it