AWS S3 - providing redirected subdomains for users

Question

I'm providing unlimited storage for users on AWS S3

Really? I need to store about 8 terabytes for free. Where do I sign up?

To me, that sounds like a fairly serious flaw in your plan, since I doubt you'd want to give away that much storage space for "free."

Aside from this, there's a second flaw, which is that CNAME isn't possible for the apex of a DNS zone. You can create a CNAME for www.example.com but it is impossible to declare example.com as a CNAME.

Yes, it can be done as an ALIAS record, but you have to host the DNS for your customers' domains on Route 53.

The third issue is this:

mybucket/user1folder -> user1folder.mybucket.awsS3.com

That should be user1folder.mybucket.s3.amazonaws.com, but I get the idea, and the idea doesn't work. This kind of magic is impossible through configuration of DNS, because the full hostname sent in the HTTP Host: header has to match exactly the bucket name. CNAME (and ALIAS) records only translate the IP address the browser uses to establish the connection, they don't do anything to the Host: header. There is no capability in S3 to provide this behavior, either.

It is technically possible to accomplish what you are trying to do by adding one or more EC2 instances into the mix, running a web server with rewrite and proxy functionality like haproxy, varnish, nginx, or apache, to rewrite each incoming hostname/url to a different one, including substituting elements of the path and fetch it from a "back-end" server (which, in this case, would be S3)... this would mean that when a request hits your web server as user1.com/foo you could rewrite the URL request over to mybucket.s3.amazonaws.com/user1.com/foo, fetch the object from S3, and return it to the user, but you would have to have EC2 instances of sufficient capacity to handle doing the rewrites and passing of all of these bytes from S3 back to the user's browser -- all of the data would flow through your EC2 instances.

In Apache, a primitive rewrite rule to take the hostname from the request and append if on to the first part of the path might look something like this:

RewriteRule ^(.+) http://mybucket.s3.amazonaws.com/%{HTTP_HOST}$1 [P]

If the server received a request for http://example.com/foo it would be rewritten as http://mybucket.s3.amazonaws.com/example.com/foo and that object would be requested by the Apache server from S3, with its contents being returned from apache to the browser. This would, of course, be somewhat slower than directly accessing S3, but that's not possible under the circumstances you describe. If the "username" is something other than "example.com" (the domain name of the web site) then you'd have to customize the apache configuration for each user since the substitution would not be simple reconstruction of the uri stem by inserting the hostname into the path.

So, yes, it's technically possible, but practicality and viability are different matters.

Update (from comments):

my blog ondrek.me is CNAME to S3 bucket and it works well

You are indeed using a CNAME at the apex of your zone, thereby breaking the DNS protocol and making it impossible or highly unlikely that you would be able to successfully also use www.ondrek.me or any other subdomain, and additionally eliminating reliable delivery of inbound email for your domain. Your current configuration is working, but invalid, and subject to unexpected problems.

is here ANY right way how to provide for my users blogs which have domain customer.com to mybloggingplatform.com/customerId?

Other than what I described? No, I would say there isn't. DNS can't do it, the browser can't do it, and S3 won't do it.

It's huge overhead to use EC2 and redirect every request of every user for every blog.

True, but importantly, I didn't say redirect. The solution I discussed above does not redirect, because that means sending the browser's address bar to a different location, which would be almost entirely unusable. I said rewrite and proxy -- the server translates the address and fetches the object. So, yeah, it's some overhead, though perhaps less than you might think.

there needs to be some clever solution,

Yes, the solution is "web servers."

You need one or more web servers, backed by some kind of storage, to serve the content from wherever it lives; the configuration determines where, on the server, or from elsewhere the content comes from, based on the incoming Host: header.

In AWS, I would think the more appropriate storage would be EBS although S3 could technically work, as I have described. I have a web server that has a large number of very "fresh" files and a large amount of very ancient files that still need to be accessible... my server stores the fresh files on its high-performance SAN array, and the ancient stuff has been migrated to S3 but still needs to be available at the old url. When a request comes in for /foo/bar, the server checks for a local file in /var/www/foo/bar, and serves it up if present, otherwise proxies a request through the back-side, to fetch /mybucket/foo/bar from S3. If the file is not there, either, my server actually returns the 403 from S3 to the client.

I have also successfully used HAProxy in front of an old custom-written, multi-tenant web server which, because of its design, determined pathnames from the Host header and could not reasonably be reconfigured, since it was so old that nobody remembered who wrote it or how it worked. This server needed to be accessed via a new hostname for the web site, and HAProxy worked brilliantly, translating the hostname in the HTTP headers so that this server always thought that's what the requests were for, even if you typed the IP address in your address bar. In such a setup, HAProxy could rewrite Host headers and URLs and handle several hundred concurrent connections pretty easily.

github pages have the same functionality,

I think you'll find github to have a large number of web servers.

s3 static websites also

...are hosted on web servers, of course.