relia-robot

Trans married robot/doll

[Robot/doll/moth/slime/NHP]-girl. DGN-001. I like writing!

See post-cohost writing at https://reliarobot.dreamwidth.org/, on tumblr at https://www.tumblr.com/relia-robot-writes, or collected long-form pieces at https://reliarobot.itch.io/


caffeinatedOtter
@caffeinatedOtter

Incidentally, if you're on a platform where you can reasonably install cohost.py and you can responsibly vet an internet rando's python3 script before running it on your machine, you can use this to dump the text of all posts from a specific Cohost page of yours to individual files in the current working directory, named in the pattern postID_whenposted.md

I have made Choices in order to preserve some out-of-band information: post headline, CWs, and tags. I make no attempt to retrieve anything except text; archiving posted images is outside my use case.

#!/usr/bin/python3

from pathlib import Path
from cohost.models.user import User
from cohost.models.post import Post
from sys import exit


cookie = '' # copy-paste from web browser devtools as per https://github.com/valknight/Cohost.py#retrieving-your-cookie 
projectName = 'caffeinatedOtter' # or whatever your @pagename is

try:
	user = User.loginWithCookie(cookie)
	project = user.getProject(projectName)
except:
	exit('Cohost login failed!')

here = Path.cwd()
page = 0
while True:
	postlist = project.getPosts(page)
	if len(postlist) > 0:
		page = page + 1
		for post in postlist:
			md = post.plainTextBody
			if len(post.contentWarnings) > 0:
				cwList = '\n'.join(post.contentWarnings)
				md = f"<!-- CWs:\n{cwList}\n-->\n{md}"
			if len(post.tags) > 0:
				tagList = '\n'.join(post.tags)
				md = f"<!-- tags:\n{tagList}\n-->\n{md}"
			if len(post.headline) > 0:
				md = f"# {post.headline} <!-- Cohost headline -->\n\n{md}"
			if len(md) > 0:
				filename = here / f"{post.postId}_{post.publishedAt}.md"
				filename.write_text(md)
	else:
		print('Done.')
		exit(0)

MiserablePileOfWords
@MiserablePileOfWords

For people trying this on Windows that are like "Wait, it's complaining! Help!"
There's 2 potential problems:

  1. The filename created by this doesn't work on Windows, because post.publishedAt contains the invalid (for Windows) character : in the times.
    How do we fix this?
    On line 35
    filename = here / f"{post.postId}_{post.publishedAt}.md"
    becomes
    filename = here / f"{post.postId}_{post.publishedAt.replace(':','-')}.md"

  2. You might have, like me, pasted emoji straight into your text. The script doesn't like that either, because you'll get a UnicodeEncodeError: 'charmap' codec can't encode character '\Usomethingorother' in position somewhere: character maps to <undefined> error.
    Have no fear, however, the fix is simple enough:
    On line 36
    filename.write_text(md)
    becomes
    filename.write_text(md, encoding="utf-8")


MiserablePileOfWords
@MiserablePileOfWords

I have discovered there is at least one (1) typo in cohost.py1
(transparentShareOfPostId tries to get transparentShareofPostId)

and also that you can't trust all the data you get back.
Somehow, some of the page URLs for posts are 404?
*confused dog noises*
Edit: Actually, that's not true.
The page is there.
The page exists.
I can go to it.
The server sometimes returns a 404.
*dog noises continue*


1: Yes, yes, I'll open an issue on the github when I'm done with this very bad idea. Or like, what's the proper etiquette with someone else's code you don't know?I've opened a pull request with the fix.


You must log in to comment.

in reply to @MiserablePileOfWords's post:

i am a contributor to OpenSolaris and like a dozen different programming languages and an untold number of projects great and small because i fixed typos in prs 😅

sending a pr saves the maintainer from having to do the work that the issue submitter clearly could fix, then the maintainer just has to click a button (or similar), i think it is nice to do