mcc

glitch girl

Avatar by @girlfiend

Also on Bluesky
Also on Mastodon.


Everything is so broken I can't even get the post embed explaining how broken things are to work.

I have this site, dryad.technology. It's my "static content" site. It has always been HTTP-only, no TLS. For reasons, I have to fix that now. It has to go HTTPS.

When I set this site up 7 years ago I decided I would finally use ~cloud technology~ (I self-host my other sites and was tired of it). The architecture is:

  • Content hosted in a secret S3 bucket
  • Cloudflare caches/proxies the S3 bucket and serves at the domain.

I pay almost nothing because Cloudflare is free, and Amazon charges me only for the traffic between Amazon and Cloudflare, which comes out to sub-penny levels. But I can't figure out how to move this setup to HTTPS. I tripped over this 7 years ago and I'm tripping over it now.

Can anyone help. Me. With this. Here is a list of things I tried that don't work.

Imagine my secret S3 bucket is named BEEFCAFE.

  1. Serve site from http://BEEFCAFE.s3-website-datacentername-1.amazonaws.com/
    Cloudflare proxies, https-encodes, adds a Cloudflare-signed https certificate

    Why this doesn't work: This is fake https. It's HTTPS signed from Cloudflare to the user, but it passes unencrypted over the open Internet between Amazon and Cloudflare. This is lying. It is giving a false sense of security to my users. I won't do it.

    The domain above does not respond to requests on port 443.

  2. Serve site from https://s3.amazonaws.com/BEEFCAFE/index.html
    Cloudflare designates site DNS as s3.amazonaws.com, and adds a rewrite rule that tacks on BEEFCAFE/ to all inner-website requests.

    Why this doesn't work: Although I can go to https://s3.amazonaws.com/BEEFCAFE/index.html in a web browser and see my site URL, they appear to be specifically denylisting Cloudflare. When accessing dryad.technology/BEEFCAFE/index.html after making the change I get:

     <Error>
     <Code>AccessDenied</Code>
     <Message>Access Denied</Message>
     <RequestId>R7QMB7JF9RKMTJ12</RequestId>
     <HostId>
     CCQMYXOnUVx1OTNPYl0W/GYL/xHLBm2kZyXq2aBls4YFiHSvNgUTRJfKD9J/znX05MkmYbngAc0=
     </HostId>
     </Error>
    

    … or similar nonsense, on every pageload.

    I cannot find any Amazon documentation explaining why requests from CloudFlare specifically would result in this AccessDenied request. However there is a stack overflow answer that purports to address this exact problem, for this exact scenario, by setting a JSON "bucket policy". Following this page's had literally no effect. It did not change the error when accessed through CloudFlare nor did it block me from accessing through non-cloudflare. That is weird enough I wonder if I did something wrong

  3. Follow the instructions

    There is a CloudFlare support page which purports to explain how to do exactly the thing I want to do [use CloudFlare to serve pages stored in an S3 bucket].

    The page is a bit oblique. It explains you need to go into the bucket policy editor and then "use this sample to fill out the needed JSON code". The referent of "this sample" is not in any way indicated, it is as if a sentence is missing. However I think they mean the IP based bucket policy. The CloudFlare support page then directs you to fill in the list of known CloudFlare IP ranges. This produces a JSON very similar to the one from the stackoverflow page listed above.

    I'm not sure whether this support page gives me what I want. For one thing, it has a step directing me to "redirect requests from this bucket’s URL to the subdomain bucket URL you created". This suggests I'm going through my BEEFCAFE.s3-website-datacentername-1.amazonaws.com domain from approach 1 above, which implies I'll founder on the same http/https problem as before. However they also make oblique reference to an "endpoint", and there is an "access points" menu in Amazon's interface I have not fully explored. So maybe this will solve one of my problems (it is theoretically possible someone could guess my bucket name and access it directly instead of going through dryad.technology, mostly harmless but still worth avoiding) and I can solve the https problem in a following step.

    Why this doesn't work: The Amazon examples page contains an interesting statement: "Warning: Before using this policy, replace the 192.0.2.0/24 IP address range in this example with an appropriate value for your use case. Otherwise, you will lose the ability to access your bucket." Editing the JSON, I think: Wait, does that mean I won't be able to read the bucket, or does it mean I won't be able to administrate it? Hopefully it's just reading the bucket!

    No, it was total. After applying the change, my S3 bucket was suddenly only accessible from CloudFlare IPs. Meaning the site still loaded (because it was loaded from a CloudFlare site) but all of the AWS administration pages for the bucket now showed big red error messages, and I assume I tried to edit it that wouldn't work either. Hilariously this meant I was also blocked from undoing the configuration change I had just made. (Fortunately Amazon foresaw someone might make this mistake, and if I logged in from my superuser account it unlocked the bucket policy JSON form only, allowing me to reverse the change.

  4. CloudFront?

    Amazon's help pages do contain a page on how to serve S3 as HTTPS. What they recommend is using a different Amazon service called CloudFront. Not CloudFlare. CloudFront. So in this model S3 funnels to CloudFront which funnels to CloudFlare which funnels to the user. The problem here is that CloudFront is not a straightforward web endpoint but a full-featured CDN, which isn't exactly what I want. This would mean two layers of caching which would be odd and could even be glitchy; it also implies I'll be getting charged twice per page update, which could potentially increase my AWS costs from one-quarter of a cent per year to as high as one-half of a cent per year. Given that this very almost works— I can access AWS via http, and I can access it (at the s3 site) via https as long as I don't do so via CloudFlare— without CloudFront, I'd prefer not to jump into that particular pool of frigid water unless I'm assured by someone who has done this before that it really is the only way.

    A second problem is that it's not clear to me from these docs how CloudFront accesses S3. Is it http? Is it https? Is it… whatever Amazon-internal communication method "aws://" is, which I assume but have not specifically seen documented is confirmed secure?

My "win condition" is that I have an S3 bucket, access to which is unrestricted with appropriate AWS credentials, and access to which is possible via public HTTPS but only when accessed by CloudFlare. Data should be encrypted (to ensure integrity) via some key or other at each link between the S3 bucket and the user's browser. I don't pay for CloudFlare and the S3 charges for updating pages and transmitting them to the CloudFlare cache comes out to a few pennies per decade, so I'm basically hosting high-availability static content for free. Can I make this work? Is there something I need to be reading? I am flailing.


You must log in to comment.

in reply to @mcc's post:

As a bonus question it would be super great if I could get CloudFlare to give me slightly more fine-grained analytics than the current "requests per month" number. I am only curious to break that down into how many requests per html page, I don't need cookies or referers even I'm just a little curious how many times each subdirectory got visited. But I think this is specifically part of the CloudFlare $20/mo plan so it seems like a waste of time to ask.

for (2): S3 website and normal S3 permissions are separate. i will find some reference links when i’m not on my phone in like 15 minutes, but it seems like what i’m about to tell you is pretty similar to what cloudflare is telling you for setting a bucket policy.

oh god cloudflare's documentation is obnoxious, and S3's has gotten worse1...

here is an S3 bucket policy to say "please allow any anonymous client to access files out of the bucket BEEFCAFE", which includes cloudflare, and also everyone else. (cloudflare wants you to make the policy only include their IP addresses because that's most secure, but that means you now have to update the list whenever cloudflare adds new locations, and it's a public website literally who cares if you can directly hit S3 for the content?)

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": "*",
            "Action": "s3:GetObject",
            "Resource": [
                "arn:aws:s3:::BEEFCAFE/*"
            ]
        }
    ]
}

this will allow you to set https://s3.{region}.amazonaws.com/BEEFCAFE/2 up as your cloudflare origin; you will need to add necessary rewrite rules to make requests for / show the content from /index.html though, because that's something the S3 website endpoints do that the normal S3 endpoints don't.


  1. this is basically impossible to find because amazon keeps getting in hot water for making it too easy to open up your S3 bucket full of, like, customer social security numbers or whatever to the internet.

  2. if your bucket name doesn't have any dots in it you can also do https://BEEFCAFE.s3.{region}.amazonaws.com/. you can't do it with dots because S3 only has a wildcard cert for *.s3.{region}.amazonaws.com and you can't do multiple levels of domains like that

Thanks… I'm a little confused.

Suggestion 1 (JSON): This is actually the JSON I have right now. Like right this second. I am still not able to access the bucket from s3.amazon.com if the access is from a CloudFlare IP, I get the error message I include above. I don't know if this is about CORS or something else. It doesn't seem like a CORS issue but stack overflow seemed to think CORS was relevant.

Suggestion 2 (no dots): My BEEFCAFE bucket does have a dot in the name. However I have a test bucket BEEF-CAFE which does not. When I access https://BEEF-CAFE.s3.region.amazonaws.com, there is no response. It's not a certificate failure or anything. It just hangs, forever. Is this surprising?

I'm going to search and see if proxy headers make S3 sad. I could see how S3's concept of "CORS" might include allowing that sort of thing.

re: no dots, that is surprising, as for a bucket I know exists and another I know does not I get valid responses:

$ curl https://buttslol.s3.us-east-1.amazonaws.com
<?xml version="1.0" encoding="UTF-8"?>
<Error><Code>AccessDenied</Code><Message>Access Denied</Message><RequestId>NJ8NDQVNQCBWR5SY</RequestId><HostId>dskf9bX9AivHzzwZO4tWn/UNrym+XoXsPN5Qe6GTLecM3i41sWHb0OFnr05wIu98DgZJ+D4PYN4=</HostId></Error>

$ curl https://kdjflskdjfslkdfj.s3.us-east-1.amazonaws.com
<?xml version="1.0" encoding="UTF-8"?>
<Error><Code>NoSuchBucket</Code><Message>The specified bucket does not exist</Message><BucketName>kdjflskdjfslkdfj</BucketName><RequestId>E268PJGW21Y8ETMM</RequestId><HostId>fizGxR1kzbPvQ37Kdvj1XCbw54B+5hddk8vo0OzvuZyYv4cVODwQKId8yAzYxG0m84xmKrNcNro=</HostId></Error>

i did something very similar to this, although i can't remember the exact details. i just checked my cloudfront and had a different s3 url format: {MYDOMAIN}.s3-website-us-east-1.amazonaws.com and i have a strange feeling that was important

cloudFLARE! (every time)...I thought I had HTTPS on (I don't have the fake SSL/TLS setting checked in cloudflare) but maybe this was wrong! The bucket is gone so I can't verify that end

4 is what I’ve done in the past

I don’t know what the cloudfront to s3 pipe looks like… haven’t ever dug into documentation to see and I don’t think it’s something I’ve ever chosen.

You do pay for CDN traffic in this scenario. Cloudfront is not free like Cloudflare can be. In my experience that’s also pennies for my site but I’ve never seen a traffic spike

Relatively easy to setup though since it’s all in aws’s universe

I have always used Cloudfront for this particular scenario, and it works well. I have my own static content set up like this. I wrote a whole "cloud computing 101" course for a bootcamp that has a chapter guiding through this kind of setup. The moment you step outside of AWS and try to get it to integrate with other stuff is usually where things get trickier. (This even goes down to using AWS to issue the SSL cert via its own ACM service. They like to make it easy to get locked into the ecosystem.)

That being said... I think you can do it with Cloudflare, and it's just a matter of getting the bucket policy right. It's probably something like having a statement with condition to allow read access via Cloudflare by IP PLUS allowing all access from, say, your AWS user as a principal.

What's confusing me is there is nothing in my policy banning Cloudflare now, but it's giving that "access denied" nonetheless, which implies the thing causing the block is not my policy but some other policy, somewhere else.

That's definitely possible. There are a bunch of S3 permissions systems that overlap with each other.

@iliana is on the right track, I think. The one thing I would add is to see if you can get S3 access logs set up. It's about 2/3rds of the way down on the Properties tab of the AWS web GUI for the S3 bucket, and I think they make it pretty trivial to set up now.

i suspect this is not the way you want to solve this, but...

i am using Wasabi for s3-compatible file hosting, and if you set a bucket as publicly accessible then the URL Wasabi gives you has https on it. it is possible that Wasabi is blocking Cloudflare for some reason, but... i don't see any good reason they'd do that & also it would break a load of usecases, especially considering they pretty much just provide s3-compatible storage and don't have a CDN business to promote.

Update: Based on the advice here set up CloudFront. It was pretty miserable, actually, ending with the site being offline for about six hours when it turned out that CloudFront gives you two options

  1. Have an unencrypted HTTP link between S3 and CloudFront (after which there would be an encrypted HTTPS link between CloudFront and CloudFlare). Although objectively unencrypted HTTP within Amazon's network is safer than unencrypted HTTP over the open Internet between CloudFlare and Amazon, this was still the thing I was trying to avoid
  2. If you have CloudFront load from S3 via ARN rather than HTTP, the site is unable to forward requests for directories to the corresponding /index.html page, because apparently despite being specifically given in the S3 documentation as the recommended web frontend for S3, CloudFront is not actually a web server. I had to write a JavaScript preload function, which will now load on every single CloudFront pageload, which guesses (based on the presence of a period in the filename— this is the way Amazon recommends doing it in the documentation) whether a URL is a directory or not, issues a 301 redirect to path/ if I think it is (because of course if you don't have the trailing slash the browser will load relative resources from the wrong place), and silently appends "index.html" if it ends in a / already. This is absurd and means I can no longer upload any file on the site that does not have a file extension.

So… the site works now! Or appears to! However the experience was miserable enough I'm about ready to switch away even though I now have a working website. I now have a list of two S3-compatible web hosts, and I'm keeping an eye on Cloudflare Pages, which would kind of make sense because I'm using Cloudflare already, but doesn't yet seem to be a functioning product (in that it does not fully support "uploading files").