ceargaest

[tʃæɑ̯rˠɣæːst]

linguist & software engineer in Lenapehoking; jewish ancom trans woman.

since twitter's burning gonna try bringing my posts about language stuff and losing my shit over star wars and such here - hi!


username etymology
bosworthtoller.com/5952

nex3
@nex3

One of my main goals with my new blog is to preserve the concept of "reposting" that's both a successful and, I think, desirable aspect of the "social network" model. It's a great way to spread visibility of other people's work and engage in conversation across different blogs. It's a big part of why I'm putting so much work into the concept of "embeds", as for Cohost and Letterboxd.

These embeds are quite automated: I just drop a link in my blog and as long as it's one of the supported sites, a little post-processing tool automatically fetches the information and adds it as template params. All in all, I'm pretty happy with this flow, but there's one critical problem: it requires me to write a separate scraper for every website.

The great strength of independent websites for social interaction is that everyone's can be as different as they'd like, but that also means that there's no standard way to understand or interact with them. Even something as simple as "what is the text of the post at this URL" is difficult to answer in general. We'd all hope that everyone would use semantic HTML so perfectly that you could just read the contents of the outermost <article> tag on the page, but the real world is never so pristine. We need a more explicit way to indicate the critical prose and metadata for something that's considered a "post".

I've been rolling this problem around in my head for the past week, and I have what is at least the germ of an idea. The core observation is that RSS and Atom feeds already have all the metadata that's strictly necessary for something like this, but they suffer from being time-limited—their use-case is focused on syndication, so a website can only be expected to have its most recent posts available in such a nice format.

So imagine if we co-opted this format for use in something more persistent. Something like:

<link rel="somens:reblog" href="post.xml">

where the XML file looks like

<?xml version="1.0" encoding="utf-8"?>
<reblog xmlns="https://nex-3.com/somens" xmlns:atom="http://www.w3.org/2005/Atom">
  <atom:link href="https://nex-3.com/blog/once-i-add-the/post.xml" rel="self"/>
  <atom:link href="https://nex-3.com/"/>
  <atom:entry>
    <atom:link href="https://nex-3.com/blog/once-i-add-the/" rel="alternate"/>
    <atom:id>https://nex-3.com/blog/once-i-add-the/</atom:id>
    <atom:published>2024-09-20T07:06:00Z</atom:published>
    <atom:updated>2024-09-20T07:06:00Z</atom:updated>
    <atom:author>
      <atom:name>Natalie</atom:name>
      <atom:uri>https://nex-3.com/</atom:uri>
    </atom:author>
    <atom:category>meta</atom:category>
    <atom:content type="html">&lt;p&gt;Once I add the ability to embed arbitrary blog posts from other blogs on here it&#39;s over. I&#39;m gonna be reblogging like a wild animal. Y&#39;all are gonna have your eyes blown clean outta your heads.&lt;/p&gt;</atom:content>
  </atom:entry>
</reblog>

Do you see my vision? This is mostly just stealing structure from Atom, but with the explicit guarantee that any logical "entries" on the page in question will also exist in the "reblog" XML. Maybe there should be some sort of way to explicitly link the two as well, idk. But in the general case, a page with one post would link to a single-post XML file which could then be used as the source for a reblog. Wouldn't that be neat?

Edit: Something like this already exists! Hooray!


nex3
@nex3

from @aaadelnrrv in the comments:

this reminds me of microformat's h-entry. it could be more flexible since it's in separate xml and not embedded in the html

Ohhh this is so close to what I'm talking about, it almost certainly doesn't make sense to reinvent the wheel. It doesn't necessarily make it easy to have syndicated content that's different than the original content—for example, I like to change my Cohost and Letterboxd embeds for RSS feeds so they don't require my site's stylesheets to be legible—but that's definitely still possible by creating separate elements with display:none. There are improvements that could be imagined, but it's way better to work within an existing specification than to create something new from scratch! I strongly encourage anyone who's interested in this to move in an h-entry direction, and I intend to to do the same for my blog.