lexyeevee

troublesome fox girl

hello i like to make video games and stuff and also have a good time on the computer. look @ my pinned for some of the video games and things. sometimes i am horny on @squishfox



i wanted to scrape a few pages on doomworld but this became An Adventure because it turns out

(a) the entire page layout from some years ago is just fucking tables in tables in tables so it's a massive pain in the ass to identify anything

(b) a bunch of prose seems to have been written by a wysiwyg thing so even though the formatting looks the same across several pages, it is a fucking nightmare to wade through. sometimes a thing is <a href="..."><font size="+1"><b> but sometimes the same thing is <b><font size="+1"><a href="..."> because of course there are no heading elements, that's too hard. there are spurious wrapper <div align="center"> and <div align="left"> and if they're around an <hr> then it's basically random which one you get. just when i thought i'd nailed something down i found <p><b></b><strong>Title</strong></p>, as in, there was an empty <b> immediately followed by a <strong> containing the actual text. what the fuck is going on here

i think i could've just manually collected what i wanted in the time i've spent on this to crawl my way to like 30% success


You must log in to comment.

in reply to @lexyeevee's post: