2024-07-24

Patrick Walsh

Snowflake and AT&T Breaches Were Preventable With Application-layer Encryption

Misleading Encryption Claims and a Lack of “Security by Default” are the Root Causes

Snowflake claims that it encrypts all its customers’ data:

All Snowflake customer data is encrypted by default using the latest security standards and best practices. Snowflake uses strong AES 256-bit encryption with a hierarchical key model rooted in a hardware security module.

Sounds good, but if Snowflake is encrypting customer data to protect it, then why in the world are so many of their customers getting breached?

What Happened?

Beginning in April, a series of Snowflake customers including Advance Auto Parts (2.3M customers impacted says the company; 380M says the hacker), LendingTree (190M customers impacted per the hacker), Ticketmaster/Live Nation (560M customers impacted), Santander Bank (30M customers impacted), Neiman Marcus Group (31M customers impacted), Pure Storage, and AT&T (242M customers impacted) have all had their data breached via Snowflake. And that’s just a few of the 165 organizations who’ve been notified that their Snowflake accounts were hacked.

For reference, there are only 260 million adults in the U.S. so we’re talking more than twice the adult population of the U.S. being affected just by the Ticketmaster breach. The AT&T breach is smaller by comparison, but leaked more sensitive data including records of who is calling and texting whom and the impact will spread to non-AT&T customers who aren’t included in this count.

In each of these cases, the culprit was stolen credentials. Information-stealing malware on an employee or a contractor’s computer leaked valid Snowflake login credentials and allowed hackers to impersonate trusted users to steal sensitive data. That’s it.

And this is extremely common.

According to Verizon’s 2024 Data Breach Investigations Report, compromised credentials are the top cause of breaches again this year. For web attacks with compromised data, 71% involve stolen credentials. And per the report, this just represents the use of previously stolen credentials and brute force attacks. Credentials stolen via phishing are a different category leading to the same problem.

Victim Blaming

Snowflake has come out and made a big point of saying that these compromised accounts didn’t have multifactor authentication turned on. And that’s a problem. That simple measure may have prevented several of these attacks.

Essentially, Snowflake is pointing the finger at their customers and saying, “You could have had a secure instance of Snowflake but you didn’t and that’s your fault.”

But hold on a second there, Snowflake. It’s interesting that it was only a week ago that you added a feature allowing customers to force MFA for their employees. Sadly, it’s still up to them to turn this on, but now they can at least audit their people and force the issue. Similarly, they only just released a “Trust Center” that allows customers to evaluate and understand their Snowflake security – or some aspects of it.

And yet all of this absolutely, completely, fails to address the real problems.

Security By Design and By Default

Even highly trusted companies holding exabytes of sensitive data, like Snowflake, are failing to design their systems to be secure by design and by default. If it was secure by default, for example, then MFA would automatically be required for everyone. It wouldn’t be a per-user or per-company option. Snowflake can and should do better.

Spending time to make a feature that lets customers see who doesn’t have MFA turned on is a waste of time for Snowflake and for the customer’s admins who now have another thing they need to be looking at and yet another thing where they have to know the right knobs to turn to get basic security.

And when it comes to security by design, data companies should have data protection. Period. Snowflake has cumbersome and expensive options where the customer can take control of their own data protection (and customers should since obviously Snowflake hasn’t), but this is a tacked-on feature that puts the burden again on the customer.

Snowflake Data Protection: Real or Bollocks?

Going back to the start of this post, we quoted Snowflake as saying that they encrypt all of their customers’ data. But if that’s true, then what happened with these breaches?

Here are some more claims from Snowflake:

With Snowflake, your data is automatically encrypted by default. No setup, no configuration, no add-on costs for high security features. Data is encrypted during its entire lifecycle. From loading data to storing data at rest, we apply end-to-end encryption, such that only the customer can read the data, and no one else. … For all data within Snowflake, we use strong AES 256-bit keys. Your data is encrypted as you load it. … We provide our customers best-in-class data security as a service.

It turns out that it’s somewhat disingenuous to simply say that you encrypt data. Anyone can say that. In fact, anyone can do that. It’s a checkbox for many cloud providers and for some it isn’t even an option, it’s automatic (hurray security by default!). But the type of encryption we’re talking about here is low level. It’s infrastructure-layer. In a running service, the data is always fully available in decrypted form.

This type of encryption has a purpose: it makes the data inaccessible when a hard drive goes bad or is removed from a computer. If it’s applied to removable drives, it’s actually good security (assuming the removable drive isn’t just always plugged in and mounted).

But in this era of cloud computing when servers and services are running 24/7, this type of encryption does absolutely nothing to stop access to data by an attacker.

Many companies will go on to talk about how they encrypt “at-rest and in-transit” which usually means they do the transparent infrastructure encryption and also require TLS (the “s” part in an https URL). In fact, Snowflake uses this language in a press release, “Snowflake automatically encrypts all customer data by default, in transit and at rest, using the latest security standards”. Others like to say they use military-grade encryption, which is a sure red flag for me when I’m reviewing the security of a site.

For many buyers of software, seeing phrases like “encryption in transit and at rest by default” is reassuring. For me, it shines a spotlight on a company that is bragging about doing the bare minimum for data protection.

But Snowflake isn’t just making these claims. They’re going much farther. They claim that they use end-to-end encryption by default. In this case, no Snowflake employee or server would ever see unencrypted data. But in fact, that isn’t how it works unless the customer makes an effort for this to be true. See the docs, here. By default, there is a “staging area” for processing unencrypted data and by default that’s run by Snowflake. Also by default, Snowflake holds the keys (“Snowflake manages data encryption keys to protect customer data”). A customer has to set up their own staging area and implement client-side encryption in order to get to a better level of security.

And even then, anyone with credentials to see data can just browse it in unencrypted form.

So does Snowflake have strong security options? In fact, they do. For example, they do offer a form of customer managed keys as an option. Unfortunately, they took a bizarre approach.

In a properly designed CMK/BYOK/HYOK (so many acronyms for the same idea!) system, the customer’s key never leaves the customer’s possession. But in Snowflake’s design, the customer key is combined with a Snowflake key to make a master key that Snowflake holds for all encryption and decryption operations. Because of their funky choices, customers who choose to use their CMK offering will find themselves in a weird position where they can never rotate their encryption key since they derive another key from the secret.

Even if you don’t understand that, your takeaway should be this: the Snowflake approach is not considered best practice for this type of security feature.

So is it real or bollocks? In their default configuration, it’s bollocks. With a lot of effort by the customer, it can be made more real, but with disappointing compromises. But why is that burden put on the customer?

Application-layer Encryption (ALE) and Customer Managed Keys (CMK)

How It Should Work

Note: if you don’t want to get into the weeds on cloud application security architecture, skip to the next section.

First, let’s explain application-layer encryption (ALE) and debate for a moment what can qualify: if you encrypt the data BEFORE it goes to the data store, then it’s ALE. And so for most companies, that would mean encrypting before sending it to a data lake or SQL database or whatever. See our full explainer on application-layer encryption for more details and diagrams.

But where things become murky is when the datastore is a service provider like Snowflake. If they have an API for accessing data and they return data on that API, does it matter if the data is encrypted before it hits the API versus after? Where is the application layer? There’s a Snowflake application and they encrypt at that layer before they store. But for someone who is using Snowflake, if they send the data in plaintext to Snowflake, is that ALE?

That’s the crux of what could be debated, but in my view, there’s only one answer: Snowflake effectively IS the data store. ALE means encrypting before sending to Snowflake – what they call “client-side encryption” and what they mean when they say “end-to-end encryption” even though they are absolutely not an end-to-end encrypted service as they claim.

So to do ALE right, Snowflake would have to make client-side encryption the default – and in fact the only – option. Then someone gaining access to the Snowflake service does not necessarily gain access to the data because they’d ALSO have to have access to the encryption key, which is separately guarded and better protected. The application that does the encrypting is the thing that can access both the key and the data; the user cannot. This ensures that the business logic in the application can be enforced. And then that business logic simply disallows any sort of wholesale exfiltration of data.

Secondly, when it comes to customers bringing their own keys – holding their own keys – the right pattern here is for encryption and decryption operations to work on an envelope encryption basis where the master key is a key encryption key (KEK) that never leaves the possession of the customer. It remains fully secret from the data provider. The customer’s key management server (KMS) then has to be asked to decrypt keys as needed. There are then techniques to lease wrapped keys that can minimize performance and uptime concerns.

It should be noted that we often hear people say that they can’t encrypt data at the application layer because it would make it impossible for them to make use of the data. This is a view that hasn’t been true for over twenty years.

In fact, the place where it’s easiest to fix these problems is in data lakes like those provided by Snowflake. All of the processing of the data happens in big chunks that get loaded off of disk together anyway, so decrypting them before processing over them is easy in this case. But I’ll hold that and a discussion of homomorphic encryption aside for another blog.

Changes Snowflake Needs to Make

Stolen credentials of a Snowflake customer are enough to compromise all of the data that company has inside Snowflake. This shows they lack data protection that brings layered security.

Snowflake needs to re-architect their encryption layers so that client-side encryption is the default and so that customers have to go through applications and business logic that protects their data on the client side from wholesale theft. To the extent that they have various features that today require Snowflake to be able to process the data in unencrypted form, Snowflake needs to invest in ways to accomplish those same things without compromising data protection or layered security.

Furthermore, Snowflake needs to fix their key management scheme. It has layers of complexity that don’t bring more security, but do bring serious drawbacks such as the inability for a customer to hold their own key and also rotate their key. This makes the feature unusable for security conscious customers leaving them without the ability to retain control of their data.

Common sense and expected security measures like MFA need to be on for everyone. There shouldn’t even be an option to disable something this important.

Finally, Snowflake needs to stop blaming their customers for security failures that are a direct result of Snowflake’s poor defaults and the consequences of their security designs. These were not customer failures. They’re the failure of a leading data company to help their customers stay safe.

IronCore’s Application-layer Encryption Platform

Application-layer encryption is not easy mainly because it isn’t yet a common pattern. But it’s becoming more prevalent as large companies start to demand it of their vendors and as standards like PCI DSS v4.0 require it. Standards like this force action from companies that would otherwise prefer to invest in almost anything else in their software stack (like adding fancy new AI features) rather than commit time and resources to improving security.

But for the leading companies who adopt security by design and who start with the data as a core piece of what has to be secure, there are a great many challenges. It’s easy to encrypt. It’s less easy to figure out how to manage keys, keep things performant, use encrypted data, allow for crypto agility to be ready for a post-quantum future, let customers hold their own keys, and so on. It’s critical that companies get these things right.

IronCore has spent years working with some of the biggest names in software to build out a platform that gets it right and makes it hard for developers to get wrong. Our suite of products, from our SaaS Shield Application Layer Encryption Platform to our encrypted search offerings, Cloaked Search for encrypted keyword searches and Cloaked AI for encrypted AI searches and RAG workflows, helps cloud vendors to give their customers strong security, data sovereignty, and more, without having to become experts in the mountain of edge cases that these systems sometimes encounter.

If you’re building systems like Snowflake – or really if you’re building any software in the cloud even if you’re not a SaaS provider – set up a time to talk with us about your needs. If we can’t help you, we can still likely point you in the right directions.