Incidentally, if you're on a platform where you can reasonably install cohost.py and you can responsibly vet an internet rando's python3 script before running it on your machine, you can use this to dump the text of all posts from a specific Cohost page of yours to individual files in the current working directory, named in the pattern postID_whenposted.md
I have made Choices in order to preserve some out-of-band information: post headline, CWs, and tags. I make no attempt to retrieve anything except text; archiving posted images is outside my use case.
#!/usr/bin/python3
from pathlib import Path
from cohost.models.user import User
from cohost.models.post import Post
from sys import exit
cookie = '' # copy-paste from web browser devtools as per https://github.com/valknight/Cohost.py#retrieving-your-cookie
projectName = 'caffeinatedOtter' # or whatever your @pagename is
try:
user = User.loginWithCookie(cookie)
project = user.getProject(projectName)
except:
exit('Cohost login failed!')
here = Path.cwd()
page = 0
while True:
postlist = project.getPosts(page)
if len(postlist) > 0:
page = page + 1
for post in postlist:
md = post.plainTextBody
if len(post.contentWarnings) > 0:
cwList = '\n'.join(post.contentWarnings)
md = f"<!-- CWs:\n{cwList}\n-->\n{md}"
if len(post.tags) > 0:
tagList = '\n'.join(post.tags)
md = f"<!-- tags:\n{tagList}\n-->\n{md}"
if len(post.headline) > 0:
md = f"# {post.headline} <!-- Cohost headline -->\n\n{md}"
if len(md) > 0:
filename = here / f"{post.postId}_{post.publishedAt}.md"
filename.write_text(md)
else:
print('Done.')
exit(0)
For people trying this on Windows that are like "Wait, it's complaining! Help!"
There's 2 potential problems:
-
The filename created by this doesn't work on Windows, because post.publishedAt contains the invalid (for Windows) character : in the times.
How do we fix this?
On line 35
filename = here / f"{post.postId}_{post.publishedAt}.md"
becomes
filename = here / f"{post.postId}_{post.publishedAt.replace(':','-')}.md" -
You might have, like me, pasted emoji straight into your text. The script doesn't like that either, because you'll get a
UnicodeEncodeError: 'charmap' codec can't encode character '\Usomethingorother' in position somewhere: character maps to <undefined>error.
Have no fear, however, the fix is simple enough:
On line 36
filename.write_text(md)
becomes
filename.write_text(md, encoding="utf-8")
I have discovered there is at least one (1) typo in cohost.py1
(transparentShareOfPostId tries to get transparentShareofPostId)
and also that you can't trust all the data you get back.
Somehow, some of the page URLs for posts are 404?
*confused dog noises*
Edit: Actually, that's not true.
The page is there.
The page exists.
I can go to it.
The server sometimes returns a 404.
*dog noises continue*
I have Pivoted My Code Again (because of course), and think the New Approach will work even better.
🤞
The only thing reasonably left to do is like... comments I guess?
Pictures, maybe?
But... eh.
The final leaf doesn't have a poster because duh, It Me! Always!
yes of course I now also archive images because hello did you see my last chost?
Imagine coming across that in the future without the images.
Doesn't bear thinking about.
