#gif plays pkmn

An unfortunate side effect of working for overwhelmingly large tech companies for years is that you get used to the way they do things. One of the things @lifning and I were used to was the Launch Call, usually a conference call with multiple stakeholders on the line to make sure a new product launch or whatever went smoothly between the tech people, website people, Twitter people, whatever.

So a handful of us working on GIF Plays PKMN had a launch call the night of posting, after we had determined the setup was more or less ready. And during this call, I was nonplussed about one of our documented platform issues that impacted all browsers: whenever you clicked a button in the game, it would open an empty pop-up window.

The bottom of the player, with the original draft containing a link for "Blank tabs / Popups?" linking to the workarounds section

Our nearly-published workarounds section read:

  • Left-clicking the buttons opens a bunch of foreground popups/tabs. You can mitigate the annoyance by either middle-clicking the buttons or holding Ctrl while left-clicking them, but this still results in a bunch of empty tabs to close-to-the-right. cohost forces all hyperlinks to target="_blank" (which browsers will happily open even when they respond with 204 No Content).

target="_blank" is mostly sensible for a site like cohost: you're on a long timeline, you click a link, you don't want to lose your place. The behavior here is in line with, say, Twitter's; you click a profile or a tweet or whatever and it opens in your current window, and you click a link and it opens in a new tab, because despite the web being around for thirty-or-so years we keep finding new problems to invent.

Wouldn't it be nice if...

But damn, it'd be pretty nifty if clicking the buttons just did the right thing, at least in one major browser. So I started digging, and started with a basic test case:

Cohost post editor, showing a lone <a> tag with a relative path

DOM inspector showing a link without target=_blank

Hey wait a minute. So target="_blank" doesn't get added for relative URLs?

Can we trick whatever is doing this into ignoring a URL on another domain somehow?

...

The post editor showing an <a> tag with a URL starting with "//example.invalid"

DOM inspector showing a link with a domain name without target=_blank

Holy shit.

I made sure it worked in a real post, we updated our draft at the eleventh hour, and the buttons worked exactly like you'd expect them to (except in non-Chromium because of differences in handling 204 No Content while fetching media on the page but whateverrrrr).

Anyway, let's dive into why this works.

Theory of URL relativity

A URL starting with // is known as a protocol-relative URL. RFC 3986 is the more-or-less current definition of what a URL looks like, and it describes URLs as hierarchical. This hierarchy allows for relative paths:

Section 4.2: Relative Reference. "A relative reference takes advantage of the hierarchical syntax to express a URI reference relative to the name space of another hierarchical URI." BNF syntax demonstrating that the relative part of a relative reference starts with two slashes and an authority. "The URI referred to by a relative reference, also known as the target URI, is obtained by applying the reference resolution algorithm of Section 5.

Section 5.4.1: Normal Examples. The highlighted example shows the URL "//g", with the base URL http://a, is parsed as http://g.

Screenshot of Homsar, from Homestar Runner, talking about a floating blue letter "g".

"I'm not gonna lie to ya, that's a healthy piece of real estate."

The URL specification treats the authority (for HTTP, the domain name and port) as part of that hierarchy, and allows you to describe a reference that is relative to the current protocol.

Why the hell would you want this? Well, if you're in the late '00s to early '10s before widespread adoption of HTTPS, you might still be worried about the computational load of enabling encryption everywhere. Encrypted connections were generally limited to situations where credentials or payment information was going over the wire, not just to keep things running smoothly for clients, but to keep load down on servers. (Huge websites like Amazon actually bought hardware accelerators to handle the load!)

Browsers at the time would display mixed-content warnings to users: if a page was loaded over HTTPS, but an image or script was loaded over unencrypted HTTP, it might be possible that information is leaking despite the address bar saying https://. Thus, it's prudent to warn the user. But we don't want to have to serve everything over HTTPS (and incur the increased load on our CDNs) if the user doesn't care.

That's where the protocol-relative URL came in. You could use image tags like this:

<img src="//cdn.example.org/really-cool-wig.jpg" alt="a really cool wig" />

Because the URL is relative to the protocol, the browser will select HTTP if the user is browsing via HTTP, and will select HTTPS if the user is browsing via HTTPS. No mixed-content errors, no unnecessary load.

In the early '10s, encryption got faster and certificates got cheaper. The benefits of encrypting all traffic far outweighed the problems. Facebook enabled HTTPS by default to all users in 2012; Let's Encrypt's free and automated CA launched in 2016; and the world mostly forgot about protocol-relative URLs, with web developers calling it an "anti-pattern".

And yet, because of the way the URL is defined, we have this type of reference that is not quite relative (in that it belongs to the same domain you're browsing), but not quite absolute, either.

[bill wurtz voice] how did this happen?

This sounds like a bug? I think this is a bug. There is no legitimate use case (in 2012 or 2022) for protocol-relative URLs to be treated as "relative URLs", in the sense that you want to avoid adding things like target="_blank" to them. Let's go figure out who to send a bug report to.

The post sanitization code is accessible to clients; previews are rendered client-side. I took a trip to the browser dev tools to see if I can find which bundle of code had it.

Using the debugger in Firefox's web developer tools to search all the code for "_blank".

The first result here looks like the winner; it seems to be assigning '_blank' to the target property of some element, and the other results look like React components for various parts of the website.

We can deobfuscate that code a little by running a formatter on it and take a look at the module as a whole:

const o = /^[a-zA-Z][a-zA-Z\d+\-.]*?:/,
  s = /^[a-zA-Z]:\\/;
var a = n(24740);
const l = ["nofollow", "noopener", "noreferrer"],
  c = ["http", "https"];
function u(e = {}) {
  const t = e.target,
    n = "string" == typeof e.rel ? (0, i.Q)(e.rel) : e.rel,
    u = e.protocols || c,
    p = e.content && !Array.isArray(e.content) ? [e.content] : e.content,
    h = e.contentProperties || {};
  return (e) => {
    (0, r.Vn)(e, "element", (e) => {
      if (
        "a" === e.tagName &&
        e.properties &&
        "string" == typeof e.properties.href
      ) {
        const r = e.properties.href,
          i = r.slice(0, r.indexOf(":"));
        (function (e) {
          if ("string" != typeof e)
            throw new TypeError(
              `Expected a \`string\`, got \`${typeof e}\``
            );
          return !s.test(e) && o.test(e);
        })(r) &&
          u.includes(i) &&
          (!1 !== t && (e.properties.target = t || "_blank"),
          !1 !== n && (e.properties.rel = (n || l).concat()),
          p &&
            e.children.push({
              type: "element",
              tagName: "span",
              properties: a(!0, h),
              children: a(!0, p),
            }));
      }
    });
  };
}

const l and c here look like default settings of some kind, so my intuition says this is a third-party module somewhere on npm. Having read about "the bee movie bug", I knew collections of libraries like remark and rehype were likely involved.

Web search for "rehype external url", with the first result a GitHub repository called "rehype-external-links".

A snippet of code from rehype-external-links, showing conditional statements and string splicing similar to parts of the above code.

Well hey, that code looks roughly the same shape. I think we have a root cause! (btw: the difference in settings between this code and the code pasted above is some changes between v1.0.1 and v2.0.0.)

Reporting the problem

An unsent, empty email to support@cohost.org with the subject line "i spil my html mods help help hep help"

Shortly after going live, I reached out to cohost support to report the issue. At first I was concerned that there were possible security implications given the use of rel="noopener", but modern browsers have behaved as if it's set for some time when target="_blank" is set (the only time noopener is relevant), so there aren't any security concerns here.

I reported the issue to rehype-external-links; we'll see what happens there.

@staff responded to the report today, opening two issues on their tracker:

Ultimately this means that this workaround will be going away at some unpredictable time in the future, so it's not a good idea to rely on it. But it was a useful discovery at the last possible moment (literally, we were doing a final pass on the post content on the launch call) that made the GIF Plays PKMN experience that much smoother: instead of opening the post, clicking a button, getting slapped in the face with an empty tab, and having to go back and read the errata about imperfect workarounds... you could simply open the post and start clicking buttons with no ill effects, just like your intuition wanted you to.



So if you don't know me (or otherwise haven't noticed yet), I have a habit of hiding cute little details in the things I make. If you were around to play the game(you win! you're free now!) when it was live, you might remember that when you clicked the "read more" it would show a little GBC-BIOS-like animation, and a cartridge would slide into the slot before the stream started in earnest. Well, go ahead, click "read more" and let's look at that again...

read more

these are all beautiful

(I'm still trying to figure out how you made the game only appear after read more was clicked?)


that's actually necessary for us to not start loading the image before someone opts-in to loading it! we just put all the streamy-stuff and hyperlinks under the fold and absolute-position them up on top of the GBC, and cohost's JavaScript takes care of the rest. This was fortunate, as it was not only a concern some folks had for people on metered connections, but it also saves us from wasting our own bandwidth and compute resources on people who don't actually wanna engage with it!



So if you don't know me (or otherwise haven't noticed yet), I have a habit of hiding cute little details in the things I make. If you were around to play the game(you win! you're free now!) when it was live, you might remember that when you clicked the "read more" it would show a little GBC-BIOS-like animation, and a cartridge would slide into the slot before the stream started in earnest. Well, go ahead, click "read more" and let's look at that again...

read more