BattlefyBlogHistoryOpen menu
Close menuHistory

Fast builds, isolated deploys and instant rollback — Battlefy’s frontend architecture Cont’d Part three

Ronald ChenMay 18th 2021

This is part three of a longer post. Read part one or part two.

I heard you like CloudFront so I put a CloudFront in your CloudFront so you can CloudFront while you CloudFront

One of the advantages of using a third-party service like Netlify is it helps you think about the problem and provides an example of how to solve it. It starts you down the journey to becoming an expert at the topic. Looking at Netlify’s solution and considering our own needs, we saw opportunities for optimization. There was stuff we simply didn’t need or could achieve more simply if we just had more control.

While microapps got us going with React and solved our bundle.js size problem, we still didn’t have a solution for the slow Webpack build.

Can we solve all our problems at the same time? Let’s go; clear the whiteboards; let’s build some prototypes!

After a few failed starts, what we ended up with was, if Webpack build speed is the problem, let’s just split the microapps into independent Webpack builds and only build what we need.

There was already no code dependency between the microapps. They didn’t need to be built together.

But how can we even do that?

We could put each microapp in its own folder and run the Webpack builds in parallel with a shell script. This would be fine for the production build, but what about development? This wouldn’t work for webpack-dev-server, we can’t just run multiple webpack-dev-servers on the same port.

To make this work in development, we needed a single webpack-dev-server. Fortunately, webpack-dev-server has a feature that allows you to specify a context path and proxy it. Context paths save the data again!

We can now move each microapp into its own repository. This finally allows us to specify npm dependencies and have an independent Webpack configuration for each microapp.

On our local development, we just spun up our main webpack-dev-server that proxied to all the microapps, then started the microapps we were actually working on, each on different ports. This reduced the webpack build time drastically!

As for production, we decided to use AWS CloudFront, but we hit another snag. Each microapp is a single-page application and needed to rewrite the context path. There is a trick with AWS CloudFront to emulate the rewrites and it’s to configure the 404 document to be /index.html. This effectively rewrites /* to /index.html. This is fine if you only have one single-page application, but we had many.

Defeated. We were about to give up, but then we wondered…if you can only have one rewrite per CloudFront, what if we just added another layer of CloudFronts? What if we spun up a CloudFront for each microapp and then configured our root CloudFront to send traffic to it for a context path? And it works, but we were sad. This was a monstrosity.

But time was up, we had a solution. We needed to get off of Netlify before our contract ended, let’s ship i-, WAIT. We forgot about prerendering.

When we moved to Netlify, we punched in our prerender.io key to keep link unfurling working, but we need a solution for CloudFront. After yet another investigation, we learned that we can integrate prerender.io with CloudFront Lambda@Edge.

We added two lambdas, viewer-request (before caching) and origin-request (after cache miss). The viewer-request lambda simply checks the user-agent to see if it’s a link bot and sets a header. The origin-request lambda only acts if it sees the bot header and then calls prerender.io.

We were unsatisfied with our link unfurling, as it was pretty slow. What prerender.io did was load our entire site, wait for the JavaScript to run, we then updated the title, description and image, and waited for us to signal we were done loading. This was slow.

This was all very complicated to fetch 3 pieces of information. On top of that, we realized we haven’t implemented link unfurling for our microapps at all.

Oops.

Can we do better? Can we speed up link unfurling and have it work for the microapps as well?

Our solution was to get rid of prerender.io. We didn’t need our JavaScript to load the entire site just to update 3 pieces of information. Why not just fetched the data in the origin-request lambda and rendered out a tiny HTML page for the link bots? Thus we implemented a new endpoint that fetched the link unfurling data for the lambda. The nice thing about this was, we now had control over the link unfurling data for all URLs, including our microapps!

This project also avoided us having to return to Jenkins. We sprinkled on AWS CodePipeline along with some AWS CodeBuild and then we realized we had a lot of infrastructure for each microapp. We needed to automate this and reached for Terraform.

We automated the creation of microapps by forking our template repository and generating a Terraform script for it. The script spun up the entire build pipeline and CloudFront distribution. Finally, updated our root CloudFront distribution to add the microapp.

It all worked. It was complex, but it worked. We finally had a small change to a single microapp only trigger the rebuild of a small webpack build and safely deploy only the affected microapp.

Fast builds and isolated deploys achievement unlocked! But it feels like we’re forgetting something…I’m sure it’s not important.

What we’ve eliminated thus far

  • the manual process to rev the version and cut a release
  • the manual process to pass version number into Jenkins job and start it
  • time waiting for Docker image to be uploaded to Dockerhub
  • time waiting for Docker image to be downloaded from Dockerhub
  • Dockerhub cost
  • AWS bandwidth cost uploading/downloading to/from Dockerhub
  • Jenkins build job
  • Elastic Beanstalk application
  • Docker container
  • Custom Express.js server
  • being stuck on Angular
  • uncontrolled bundle.js growth
  • slow webpack build
  • giant webpack config
  • big bang deploys
  • mono npm project for all microapps
  • Netlify cost
  • prerender.io cost

Everything should be made as simple as possible, but no simpler

The honeymoon period with our new frontend architecture didn’t last very long. The way Terraform worked simply didn’t gel with how our team operated. We had different teams all trying to maintain various Terraform states, but it became a failure of the commons. The rigour required to use Terraform well got in the way with teams just trying to get things done. The Terraform rules were violated; manual changes were made to environments; the Terraform state didn’t reflect reality anymore.

What didn’t help the unreadable diff when adding a new microapp to our main CloudFront distribution. Even though there is a well-defined order for path behaviours in CloudFront, the Terraform module didn’t sort by that when showing the plan diff. This was the first broken window and we lost respect for Terraform.

And there was still the niggling issue of the second layer of CloudFront distributions for each microapp. That just didn’t seem right. This extra layer compounded the Terraform issues as it takes several minutes to create/update/delete CloudFront distributions.

When we eventually ran into our first bad deployment, we realized we were missing a lever. In our rush to move off of Netlify, we forgot to retain one of the most useful features, instant rollback.

If that wasn’t enough, the CodePipeline/CodeBuild was fairly slow. This one can be easily fixed by throwing more money at it, but given our Terraform woes, we rather just get rid of it.

Remember that link unfurling solution we came up with? Well, it’s actually rather consequential as we added the origin-request lambda after we decided to add the second layer of CloudFronts.

Why does that matter? The origin-request lambda happens to be the place where one can do rewrites. We didn’t know if Lambda@Edge was a good solution to implement rewrites back then, but now since we already were using it for link unfurling, now there was no cost.

Lovely, we can simplify. We can modify the shell script to not create the extra CloudFronts, but instead to modify the origin-request lambda. But why not take this one step further? We want to get rid of Terraform anyways and to do that we need a solution to AWS CodePipeline/CodeBuild first.

What is the build even doing? It’s really not that complicated

  1. Upon merge to master
  2. npm install
  3. npm run build
  4. Upload assets to S3

Instead of pushing the source to AWS to have it build, we can just build with GitHub actions and push the assets to S3.

OK! We have all the pieces to replace the existing shell script and since we no longer rely on Terraform, the automation can now just be implemented on an endpoint. The automation now all boils down to various GitHub API calls.

One last thing we need to restore. Let’s bring back instant rollbacks. In order to get the “instant” part of instant rollback to work, we need to make sure we are caching everything correctly. Notice I didn’t say, never cache anything, as that would require our users to redownload the bundle.js on every page reload.

I’ve been using bundle.js as shorthand, but really, it looks more like bundle.main.6483891676e878fb219e.js. Since the bundle.js has an unique hash per build, this means we can cache hashed files forever. There is never a reason why the content would ever be different for the same filename. The index.html refers to the exact bundle.js, so as long as we never cache index.html, then instant rollback is simply rolling back the index.html from a previous build. It’s fine to have the browse always redownload index.html as it’s a very small text file, but you could optimize even this by putting a really short 1 second cache on it. This would vastly improve performance on busy CloudFront edge nodes.

We’ve finally caught up to what we do today at Battlefy. Here’s what it looks like to add a microapp.

  1. Using admin panel, provide new microapp name
  2. Automatically check for microapp name conflicts
  3. Automatically clone frontend template
  4. Automatically create pull request to add proxy to repository starts runs the webpack-dev-server. This can be merged immediately
  5. Automatically create a pull request to add microapp to origin-request lambda. This is merged when we are ready to go live. This also has its own GitHub action that deploys the lambda to the main CloudFront distribution
  6. Develop new microapp in the new repository
  7. Merge to master to kick off GitHub action build and uploads to S3
  8. Users enjoy the new microapp
  9. Later if needed, use the admin panel to select an old build and instantly rollback to it

The final tally of everything we’ve removed in this journey

  • the manual process to rev the version and cut a release
  • the manual process to pass version number into Jenkins job and start it
  • time waiting for Docker image to be uploaded to Dockerhub
  • time waiting for Docker image to be downloaded from Dockerhub
  • Dockerhub cost
  • AWS bandwidth cost uploading/downloading to/from Dockerhub
  • Jenkins build job
  • Elastic Beanstalk application
  • Docker container
  • Custom Express.js server
  • being stuck on Angular
  • uncontrolled bundle.js growth
  • slow webpack build
  • giant webpack config
  • big bang deploys
  • mono npm project for all microapps
  • Netlify cost
  • prerender.io cost
  • second layer of CloudFront distributions
  • CodePipeline/CodeBuild
  • Terraform

There are many things we learned over this journey and I want to just highlight two of them.

Terraform was just a really bad fit for our team and it wasn’t something that was easy to see ahead of time. For any future tool, we need to consider is it the kind of tool designed for centralized command and control, or can it actually be used in a distributed manner?

Microapps is an example of a pattern I love to use, the switch pattern. If you squint, the context path each microapp is mounted to is like a case in a switch statement. The key to applying this pattern is to figure what is the domain of the thing you are switching over, for microapps it’s the context path. And second figure out the architectural mechanism to do the actual switching. For microapps, to switch over the context path, we used proxies in webpack-dev-server and origin-response lambda in CloudFront. If you have these two things, you can apply the switch pattern.

While our current frontend architecture is working great, we’re never satisfied. There are still many improvements we can still make, we will just keep on the backlog until see we see a business need or opportunity to realize it.

If that sounds interesting to you and want to learn from me on how to build simple effective systems, you’re in luck, Battlefy is hiring! Check out our open positions

If you have any questions or comments, feel free to tweet me at pyrolistical

2022

Powered by
BATTLEFY