Headlines are breaking out all over the last few weeks about high-profile data breaches caused by company databases and other information being stored in public Amazon Web Services (AWS) Simple Storage Service (S3) buckets. See here and here for two examples. The question I get most often around these breach notices is, “Why does anyone leave these buckets as public, and isn’t that AWS’s fault?” The answer is straight-forward, but comes as a bit of a shock to many – even many who work with AWS every day.
A quick refresher on S3
For those not familiar with S3 or what it is and what it does, basically S3 is an online file system of a very defined type. S3 is a cloud-based Object Storage platform. Object Storage is designed to hold un-structured collections of data; which typically are written once and read often, are overwritten in their entirety when changed, and are not time-dependent. The last one simply means that having multiple copies in multiple locations doesn’t require that they be synchronized in real-time, but rather that they can be “eventually consistent” and it won’t break whatever you’re doing with that data.
S3 organizes these objects into “buckets” – which would be the loose equivalent of a file system folder on more common operating system file systems like NTFS or EXT. Buckets contain sub-buckets and objects alike, and each level of the bucket hierarchy has security permissions associated with it that determine who can see the bucket, who can see the contents of the bucket, who can write to the bucket, and who can write to the objects. These permissions are set by S3 administrators, and can be delegated to other S3 users from the admin’s organization or other organizations/people that have authorized AWS credentials and API keys.
It’s not AWS’s fault
Let’s begin with the second half of the question. These breaches are not a failure of AWS’s security systems or of the S3 platform itself. You see, S3 buckets are *not* set to public by default. An administrator must purposely set both the bucket’s permissions to public, and also set the permissions of those objects to public – or use scripting and/or policy to make that happen. “Out of the box,” so to speak, newly created buckets can only be accessed by the owner of that bucket and those who have been granted at least read permissions on it by the owner. Since attempting to access the bucket would require those permissions and/or API keys associated with those permissions, default buckets are buttoned up and not visible to the world as a whole by default. The process to make a bucket and its objects public is also not single-step thing. You must normally designate each object as public, which is a relatively simple operation, but time consuming as it has to be done over and over. Luckily, AWS has a robust API and many different programming languages have libraries geared toward leveraging that API. This means that an administrator of a bucket can run a script that turns on the public attribute of everything within a bucket – but it still must be done as a deliberate and purposeful act.
So why make them public at all?
The first part of the question, and the most difficult to understand in many of these cases we’ve seen recently. S3 is designed to allow for the sharing of object data; either in the form of static content for websites and streaming services (think Netflix), or sharing of information between components of a cloud-based application (Box and other file sharing systems). In these instances, making the content of a bucket public (or at least visible to all users of the service) is a requirement – otherwise no one would be able to see anything or share anything. So leveraging a script to make anything that goes into a specific bucket public is not, in itself, an incorrect use of S3 and related technologies.
No, the issue here is that buckets are made public as a matter of convenience or by mistake when the data they contain should *not* be visible to the outside world. Since a non-public bucket would require explicit permissions for each and every user (be it direct end-user access or API access); there are some administrators who set buckets to public to make it easier to utilize the objects in the bucket across teams or business units. This is a huge problem, as “public” means exactly that – anyone can see and access that data no matter if they work for your organization or not.
There’s also the potential for mistakes to be made. Instead of making only certain objects in a bucket public, the administrator accidentally makes ALL objects public. They might also accidentally put non-public data in a public bucket that has a policy making objects within it visible as well. In both these cases the making of the objects public is a mistake, but the end result is the same – everyone can see the data in its entirety.
It’s important to also point out that the data from these breaches was uploaded to these public buckets in an unencrypted form. There’s lots of reasons for this, too; but encryption of data not designed for public consumption is a good design to implement – especially if you’re putting that data in the cloud. This way, even if the data is accidentally put in a public bucket, the bad actors who steal it are less likely to be able to use/sell it. Encryption isn’t foolproof and should never be used as an alternative to making sure you’re not putting sensitive information into a public bucket, but it can be used as a good safety catch should accidents happen.
No matter if the buckets were made public due to operator error or for the sake of short-sighted convenience, the fact that the buckets and their objects were made public is the prime reason for the breaches that have happened. AWS S3 sets buckets as private by default, meaning that these companies had the opportunity to just do nothing and protect the data, but for whatever reason they took the active steps required to break down the walls of security. The lesson here is to be very careful with any sensitive data that you put in a public cloud. Double-check any changes you make to security settings, limit access only to necessary users and programs by credentials and API keys, and encrypt sensitive data before uploading. Object Stores are not traditional file systems, but they still contain data that bad actors will want to get their hands on.