RunawayDanish

Local Alien Degenerate opens blog

Time Elapses. The Past Recedes. Do you do the Dinosaur?

Furry artist from the internet, ancient and seasoned at the eon-spanning age of 32. I've been around on Furaffinity.net forever, watch me there too, maybe consider donating dosh to me over on kofi or SubscribeStar, I am unemployable.




All characters are over 18, by the by, and you should be too to visit this page.




PS: A random SFW account, no relation to me, of course.


caffeinatedOtter
@caffeinatedOtter

Incidentally, if you're on a platform where you can reasonably install cohost.py and you can responsibly vet an internet rando's python3 script before running it on your machine, you can use this to dump the text of all posts from a specific Cohost page of yours to individual files in the current working directory, named in the pattern postID_whenposted.md

I have made Choices in order to preserve some out-of-band information: post headline, CWs, and tags. I make no attempt to retrieve anything except text; archiving posted images is outside my use case.

#!/usr/bin/python3

from pathlib import Path
from cohost.models.user import User
from cohost.models.post import Post
from sys import exit


cookie = '' # copy-paste from web browser devtools as per https://github.com/valknight/Cohost.py#retrieving-your-cookie 
projectName = 'caffeinatedOtter' # or whatever your @pagename is

try:
	user = User.loginWithCookie(cookie)
	project = user.getProject(projectName)
except:
	exit('Cohost login failed!')

here = Path.cwd()
page = 0
while True:
	postlist = project.getPosts(page)
	if len(postlist) > 0:
		page = page + 1
		for post in postlist:
			md = post.plainTextBody
			if len(post.contentWarnings) > 0:
				cwList = '\n'.join(post.contentWarnings)
				md = f"<!-- CWs:\n{cwList}\n-->\n{md}"
			if len(post.tags) > 0:
				tagList = '\n'.join(post.tags)
				md = f"<!-- tags:\n{tagList}\n-->\n{md}"
			if len(post.headline) > 0:
				md = f"# {post.headline} <!-- Cohost headline -->\n\n{md}"
			if len(md) > 0:
				filename = here / f"{post.postId}_{post.publishedAt}.md"
				filename.write_text(md)
	else:
		print('Done.')
		exit(0)

MiserablePileOfWords
@MiserablePileOfWords

For people trying this on Windows that are like "Wait, it's complaining! Help!"
There's 2 potential problems:

  1. The filename created by this doesn't work on Windows, because post.publishedAt contains the invalid (for Windows) character : in the times.
    How do we fix this?
    On line 35
    filename = here / f"{post.postId}_{post.publishedAt}.md"
    becomes
    filename = here / f"{post.postId}_{post.publishedAt.replace(':','-')}.md"

  2. You might have, like me, pasted emoji straight into your text. The script doesn't like that either, because you'll get a UnicodeEncodeError: 'charmap' codec can't encode character '\Usomethingorother' in position somewhere: character maps to <undefined> error.
    Have no fear, however, the fix is simple enough:
    On line 36
    filename.write_text(md)
    becomes
    filename.write_text(md, encoding="utf-8")


MiserablePileOfWords
@MiserablePileOfWords

I have discovered there is at least one (1) typo in cohost.py1
(transparentShareOfPostId tries to get transparentShareofPostId)

and also that you can't trust all the data you get back.
Somehow, some of the page URLs for posts are 404?
*confused dog noises*
Edit: Actually, that's not true.
The page is there.
The page exists.
I can go to it.
The server sometimes returns a 404.
*dog noises continue*


MiserablePileOfWords
@MiserablePileOfWords

I have Pivoted My Code Again (because of course), and think the New Approach will work even better.
🤞


MiserablePileOfWords
@MiserablePileOfWords

yes of course I now also archive images because hello did you see my last chost?
Imagine coming across that in the future without the images.
Doesn't bear thinking about.


You must log in to comment.

in reply to @MiserablePileOfWords's post:

i am a contributor to OpenSolaris and like a dozen different programming languages and an untold number of projects great and small because i fixed typos in prs 😅

sending a pr saves the maintainer from having to do the work that the issue submitter clearly could fix, then the maintainer just has to click a button (or similar), i think it is nice to do