BattlefyBlogHistoryOpen menu
Close menuHistory

Minor oversight? Lose millions

Ronald ChenNovember 8th 2021

Now with GDPR, data leaks can result in fines in the hundreds of millions. Hackers don’t even need to be involved in data leaks as one common issue is mixing public and personally identifiable information. This can lead to PII being accidentally leaked when intending to display public data.

This blog post will show a simple oversight during modelling of a problem can lead to data being leaked. We will then show how to fix the design to make it impossible to leak the data.

Pending approval

At Battlefy, we are building esports for competitive players and we want to give them the ability to express themselves. We want to promote people in esports and grow it all for everybody.

Furthermore, Battlefy’s clients are large and well-known companies. We have to ensure the esports events we run uphold our client’s brand.

These two desires end up butting heads for features as simple as being able to upload a team logo. We want teams to be able to bolster their brand with a logo, but at the same time, we need to prevent trolls from uploading offensive images. These offensive images would hurt our client’s brand.

We solve this problem by adding an approval process. Our esports operation staff approve each team's logo before they are displayed publically.

Leaky implementation

Here is a system diagram for the team logo approval process. Let’s describe each step in more detail.

  1. During the team registration process, the captain uploads the team logo. There are safe and secure means to allow users to directly upload to CDNs.
  2. The team logo URL is returned to the frontend. This URL has a generated UUID to make it effectively impossible for anybody to guess. By using a UUID, we can simplify the permissions for images and allow them to be publicly readable.
  3. The team registration is completed by submitting the team logo URL to the backend, along with all the other registration details.
  4. The backend inserts a new document into the team collection. The logoState is initialized to pending.
  5. Sometime later, an admin checks to see if there are any team logos that need approval. The admin backend makes a find query on the team collection with the filter logoState: 'pending'.
  6. The logo is loaded for the admin to view and the admin approves it. The admin backend sends an update to set logoState to approved.
  7. Viewers then load the page with all the teams for the event. This sends a request to the backend.
  8. The backend makes a find query on the team collection. It is critical that the backend logic deletes the logoUrl field if logoState is not equal to approved. Failure to implement this logic will result in logoUrl leaking.
  9. The frontend will receive all the teams, and for those teams with a logoUrl field, it loads the team logo. If the logoUrl field is missing the frontend displays a placeholder image.

Step 8 is the critical step that can lead to a data leak. Notice how easy it would be to forget to implement this logic and the team pages would still work. Also, notice how every single query in other parts of the system that load teams must never load all the fields or implement this critical piece of logic. This landmine waiting for a victim.

How do we fix this leak? One natural inclination would be to have all the images uploaded to the CDN be private and only upon approval make them public. But then this makes it awkward for teams to see their own team logo pending approval. Sure we could add roles to all teams and admins to be able to always read images, but this is getting complicated. There is a better way.

Fix the design to make the leak impossible

The root cause of the problem is the logoUrl and logoState don’t belong on the team document. logoUrl is serving two different purposes. It is simultaneously the pendingLogoUrl and approvedLogoUrl. pendingLogoUrl should never be returned to the frontend, whereas approvedLogoUrl should always be returned to the frontend.

We can fix this by moving logoUrl and logoState into their own collection. We could easily call this collection team logo review but considering this problem isn’t a one-off, we can easily generalize this into a feature with just a few additional fields.

This updated system diagram shows how we segregated logoUrl and logoState into a new image review collection. It contains documents with a type field and for our use-case type: 'team logo' means there will be a teamID field.

Neither the player nor viewer backends ever have a reason to query the image review collection. Only the admin backend uses it to manage the approval process.

Instead of having the frontend rely on an optional logoUrl field on the team document, it simply loads the image at <cdn uri>/team/<team id>/logo.png. The frontend already has the team ID when it received the list of team documents. If the team logo doesn’t exist in the CDN, the frontend shows a placeholder.

<cdn uri>/team/<team id>/logo.png is populated after the admin approves the image. It is copied from the previously uploaded image.

These are not the collections you are looking for

Comparing the two designs, note how a minor oversight of adding logoUrl and logoState to the team document lead to a leak. This decision seems so trivial at the time yet has dire consequences down the road.

To avoid this trap, one must think critically when designing schemas. Separate concepts and hence collections. Putting everything into one document is sus. Err on the side of having too many collections than too few.

Extra strength required for PII

This example only covered a very minor leak. While this kind of leak would reflect poorly on Battlefy and could potentially hurt our clients, it would not constitute a GDPR violation.

For cases where PII is being handled, it is not as simple as segregating the data into different collections. Accidental leaks in the form of logoUrl would-be prevented, but handling PII requires more auditing and security. One simply does not colocate normal data with PII data.

One solution would be to create a separate database to only house PII and severely limit access to it, even more so than the regular production database. It becomes much easier to handle PII data appropriately when one can put a wall around it.

Do you want to sweat the details on every single database field? You’re in luck, Battlefy is hiring.

Secret superpowers of URIs
November 15th 2021


Powered by