what if you were able to match “any token that doesn't already have specific meaning in this context”? wouldn't that be cool?

it particular, it would make throwing together parsers for semi-structured data much easier (think parsing logs or other ill-specified formats)

this should definitely be possible for any parser with static analyzability

it gets a little weird with lookahead like LR(1), but I don't think that's a technical challenge, just a conceptual one

I haven't seen this done in the wild, have you?


you could even lean into this more, and have a more elaborate system to match tokens, with multiple levels of specificity to resolve any ambiguities that may arise, like CSS selectors

(it's very analogous to the problem of CSS selectors, actually)

you need some structure on token selectors anyways: if you're building an LR(1) parsing table, you definitely want to know ahead of time which token selectors are disjoint


basically there are three sensible ways of disambiguating parses:

  1. order (regexes tend to do this, preferring leftmost parse; CSS does this of course)
  2. precedence operators, which specify the precedence and associativity of individual tokens (very common in LR parser generators)
  3. specificity (what I proposed above; also seen in CSS; this subsumes 2.)

You must log in to comment.