Commit ID:2f03383829f156994206c3a0eba1b3310a1c04ee

Ximedes exists to allow a group of smart, friendly and ambitious professionals to work together on relevant and challenging software projects, to the delight of our clients and ourselves

Hosting a Gatsby site on S3 and CloudFront

Joris Portegies Zwart
April 23, 2018

Recently, we migrated the Ximedes website from WordPress to Gatsby.js. The resulting static site is hosted at Amazon AWS, using CloudFront backed by an S3 bucket that holds our static content.

Infrastructure and deployment

Our main setup is straightforward. The output of the Gatsby build directory is synced to an S3 bucket, which is configured as the origin server for a CloudFront distribution. (Note that we don't specifically configure the bucket as a web server, just as plain storage.)

AWS

While this gets you going rapidly, there are a few issues with this setup that need to be solved before launch.

  • By default, CloudFront caches the response of every request for 24 hours. This conflicts with the proper caching strategy for Gatsby sites.
  • While CloudFront allows you to specify a default object for requests targeting a directory (typically index.html), this only works for the root directory, not for subdirectories.
  • We take great pride in our secure coding practices, and so we wanted to include all the proper HTTP security headers to our responses. Unfortunately, CloudFront does not support configuring custom response headers.
  • Migrating to a new site, with a whole new structure, means a lot of our old URLs don't work anymore. While this isn't a huge issue for us, we still wanted to make sure some basic pages, such as the About and Contact pages, were still reachable under their old URL.
  • Finally, requests for non-existing content by default returns a HTTP 403 response. This is an S3 security feature; returning a 404 would let the client know that the requested object doesn't exist, which is information you shouldn't know if you don't have sufficient access rights. However, in our case we would like to return a 404 to the end user, in line with the HTTP standard.

Introducing AWS Lambda@Edge functions

We solve the first four issues by using AWS Lambda@Edge functions. These are plain functions, written in JavaScript, Python, Java, C# or Go, that can intercept requests and responses to and from your site, and perform custom logic.

CloudFront allows up to four lambda functions to be configured, shown as λ1 through λ4 in the picture above.

  • λ1 is invoked on every request from the browser to CloudFront
  • λ2 is invoked on every request from CloudFront to the S3 bucket (so only on a cache miss)
  • λ3 is invoked on every response from S3 to CloudFront
  • λ4 is invoked on every response from CloudFront to the browser

These lambda functions receive either a request or response event as parameter, depending on where in the chain they are configured. The format of these events is documented on the AWS website.

Adding correct cache headers

To make CloudFront implement the proper caching strategy, we first configure it such that instead of caching everything for the default TTL, it should respect Cache-Control headers as returned by the origin server. Then, we create a lambda function at position λ3 that actually adds these headers to responses from S3 as follows:

'use strict';
exports.handler = (event, context, callback) => {
  const request = event.Records[0].cf.request;
  const response = event.Records[0].cf.response;
  const headers = response.headers;

  if (request.uri.startsWith('/static/')) {
    headers['cache-control'] = [
      {
        key: 'Cache-Control',
        value: 'public, max-age=31536000, immutable'
      }
    ];
  } else {
    headers['cache-control'] = [
      {
        key: 'Cache-Control',
        value: 'public, max-age=0, must-revalidate'
      }
    ];
  }

  callback(null, response);
};

This boils down to 'cache anything in the directory /static forever, and always check for newer versions of everything else'. Gatsby V2 will also allow long-term caching of JavaScript files, which will make the cache dramatically more effective. But for now, this is it!

Adding security headers

The OWASP Secure Headers Project provides an excellent summary of HTTP headers that should be used to increase the security of your website. While one of the great benefits of creating a static site is that the risk of attacks is minimized, it is still good practice to include these.

Adding security headers should be done in the same function λ3 we created to add caching headers, which we extend to this:

exports.handler = (event, context, callback) => {
  const request = event.Records[0].cf.request;
  const response = event.Records[0].cf.response;
  const headers = response.headers;

  if (request.uri.startsWith('/static/')) {
    headers['cache-control'] = [
      {
        key: 'Cache-Control',
        value: 'public, max-age=31536000, immutable'
      }
    ];
  } else {
    headers['cache-control'] = [
      {
        key: 'Cache-Control',
        value: 'public, max-age=0, must-revalidate'
      }
    ];
  }

  [
    {
      key: 'Strict-Transport-Security',
      value: 'max-age=31536000'
    },
    {
      key: 'X-Content-Type-Options',
      value: 'nosniff'
    },
    {
      key: 'X-Permitted-Cross-Domain-Policies',
      value: 'none'
    },
    {
      key: 'Referrer-Policy',
      value: 'no-referrer'
    },
    {
      key: 'X-Frame-Options',
      value: 'deny'
    },
    {
      key: 'X-XSS-Protection',
      value: '1; mode=block'
    },
    {
      key: 'Content-Security-Policy',
      value:
        "default-src 'none' ; script-src 'self' 'unsafe-inline'; " +
        "style-src 'self' 'unsafe-inline' ; img-src 'self' data:; " +
        "font-src 'self' ; manifest-src 'self' ; " +
        'upgrade-insecure-requests; block-all-mixed-content; ' +
        'report-uri https://ximedes.report-uri.com/r/d/csp/enforce;'
    }
  ].forEach(h => (headers[h.key.toLowerCase()] = [h]));

  callback(null, response);
};

It is unfortunate that Gatsby requires the unsafe-inline directive in the Content-Security-Policy header for scripts and styles. One of the main purposes of CSP is to mitigate cross-site scripting (XSS) attacks, and disabling the execution of inline scripts goes a long way towards that. However, Gatsby creates large amounts of inline JS and CSS. Some open Github issues (#3758 and #3427) address this, but there doesn't seem to be an easy solution for now.

Also note the directive report-uri https://ximedes.report-uri.com/r/d/csp/enforce;. This makes the browser report any CSP violations to ReportURI, an easy way to quickly get your CSP reporting up and running. Of course you can also create your own reporting endpoint.

Serving index.html

To address the issue with serving index.html for directory-level requests, we create another lambda function λ2 to check every request to S3, try and detect if it's requesting a directory, and add index.html to the URI before proceeding. (Note that the logic here is simple to the point of naive, but so far it has worked without issues.)

exports.handler = (event, context, callback) => {
  const request = event.Records[0].cf.request;
  const uri = request.uri;

  if (uri.endsWith('/')) {
    request.uri += 'index.html';
  } else if (!uri.includes('.')) {
    request.uri += '/index.html';
  }

  callback(null, request);
};

Redirecting legacy URLs

To make sure old links and outdated search results to our About and Contact pages would still work, we extend λ2 to detect requests for the old paths (/contact and /about-ximedes respectively), and instead of passing the request onto S3 return a 301 response redirecting the user to the new locations (/contact-us/ and /about-us/).

exports.handler = (event, context, callback) => {
  const request = event.Records[0].cf.request;
  const uri = request.uri;

  // Redirect some popular search results to their new pages
  const redirects = [
    { test: /^\/contact\/?$/g, targetURI: '/contact-us/' },
    { test: /^\/about-ximedes\/?$/g, targetURI: '/about-us/' }
  ];

  const redirect = redirects.find(r => uri.match(r.test));
  if (redirect) {
    const response = {
      status: '301',
      statusDescription: 'Moved Permanently',
      headers: {
        location: [
          {
            key: 'Location',
            value: 'https://www.ximedes.com' + redirect.targetURI
          }
        ]
      }
    };

    callback(null, response);
    return;
  }

  // Make sure directory requests serve index.html
  if (uri.endsWith('/')) {
    request.uri += 'index.html';
  } else if (!uri.includes('.')) {
    request.uri += '/index.html';
  }

  callback(null, request);
};

Dealing with 404s

Finally, the issue of returning a 404 error code after S3 returns 403 is solved by properly configuring CloudFront. With our 404 page being the Gatsby default /404.html, we add the following custom error response in the Error Pages tab of your CloudFront distribution.

404

And there we are - a Gatsby site running in a serverless AWS infrastructure, with proper caching, error codes and directory root objects.