tef

bad poster & mediocre photographer

  • they/them

here is the problem: I have a file with a tab in it, and i'm matching against it piecemeal.

i.e if I go "Match 4 Columns of Whitespace, Match Another 4" and I see "\t", and everything is fine™

the last time i tried to solve this, i did the simple and slow answer: if i do not parse all of a tab stop, store the leftovers in some variables, and when i parse a tab, check the leftovers first.

it's a little clunky and i've been wondering how I can make it easier, and eventually i realise: i do not need to do anything special. i am already doing the right thing, without realising it.

i'm already tracking which column the parser is at in order to calculate the correct tabstop width. that turns out to be everything i need:

when i have a tab, and it's 8 spaces wide, and i only read four of them: i add four to the column. when i read that same tab again, i'm in column+4 and so the tabstop will be 4 columns shorter. the trick to parsing half a tab: increment the column variable, and not the file offset.

i am not sure if i feel very clever for working this out, or very stupid for missing what feels inherently obvious from the outset

edit:

the reason i am doing this? markdown. fucking markdown and four space indents


You must log in to comment.

in reply to @tef's post:

...I can't make your problem parts 1 and 2 line up in my head.

If you have a file with a tab in it - that's just 0x09 right? How do you match half of that?

I feel like I'm getting from the context that' you're matching the rendered contents of an editor pane?

if you have a grammar that says "here is a four space indent, and everything after is fixed width, these two lines should parse the same

<tab>Hi
<space x4><space x4>Hi

so yes "how do you match half a tab?"

the two answers above are

  • keep track of "i parsed half a tab at pos=1, and consumed 3 columns" and whenever you parse a tab in the same position, you update it

  • (the actual easy solution) if you're keeping track of columns (and you have to, to parse tabs), then when you don't fully parse a tab, you can just increment the columns

as a result, when you come back to parse the next rule, you see a tab at column +N, which makes the tab width N less

you just gotta make sure when you read all the columns in a tab, you advance the file offset