Breaches Happen — But Data Theft Shouldn't
Why Hackers Keep Winning (and How One Change Could Flip the Script)
Let’s recap some of the major breach headlines from this past week in order to make a point:
- Coinbase says its data breach affects at least 69,000 customers | TechCrunch
- What was stolen: names, addresses, and account balances
- Who’s affected: 69,000 Coinbase customers
- First-level cause of breach: bribed insiders
- How the knock-off Signal app used by Trump officials got hacked in 20 minutes
- What was stolen: private and sensitive text messages
- Who’s affected: 44,503 users including people in Government, the VC firm Andreesen Horowitz, JP Morgan, and many other companies (per Micah Lee’s analysis)
- First-level cause of breach: unsecured java heap dump endpoint, per Micah
- Mysterious Database of 184 Million Records Exposes Vast Array of Login Credentials | WIRED
- What was stolen: collected usernames and passwords from many breaches
- Who’s affected: 184 million records
- First-level cause of breach: exposed Elasticsearch database
- 19-year-old accused of largest child data breach in U.S. agrees to plead guilty
- What was stolen: name, school, birthday, address, parent, social security number, health concerns, disciplinary records
- Who’s affected: 62 million U.S. school children
- First-level cause of breach: reused username/password and no 2FA
- Legal Aid Agency data breach - UK
- What was stolen: addresses of applicants, dates of birth, national ID numbers, criminal history, employment and financial data such as debts and payments
- Who’s affected: lawyers and clients, including domestic abuse victims; number of impacted people is unknown, but supposedly 2.1 million “pieces of data”
- First-level cause of breach: undisclosed
In each of these hacks, articles blame some simple security failure like an unsecured endpoint, a lack of two-factor authentication, or untrustworthy insiders with access. All of that is accurate, but it perpetuates an impossible problem for security teams: any miss becomes a single point of failure that results in catastrophe.
In complex systems, it’s nearly impossible to be 100% perfect, but getting 99 out of 100 endpoints secured is still a failure. A single employee installing malware that steals cookies from their browser can lead to a complete compromise of data.
In fact, the three top causes of data breaches are:
- Phishing or stolen/compromised credentials @ 31%
- Cloud misconfigurations @ 11%
- Software vulnerabilities @ 11% (tied with #2)
Source: Verizon Data Breach Investigations Report 2023; note: the newer report rolls the above up into a single “system intrusion” category but excludes misconfigurations, making a more recent “top causes” number difficult to obtain.
These sorts of stats drive tons of money to technologies that are intended to catch software vulnerabilities, cloud misconfigurations, phishing emails, etc., even while buyers know that none of these technologies can ever catch 100% of these issues. Which means organizations are collectively spending billions of dollars on technologies that, at best, reduce how often attackers are successful.
Of course, security teams are in an impossible position. Cloud misconfigurations are a problem and having tools and processes that can minimize or hopefully (if they’re lucky) eliminate that issue are important.
But where most organizations get it wrong – and everyone is to blame here, including the journalists who report on these breaches – is in thinking that the problem begins and ends with whatever that initial breach vector is.
Security professionals like to talk about Defense-in-Depth, but we don’t often see much depth. Every single one of the above breaches was a single point-of-failure leading to a data breach. We rarely read articles where people point out the missing layers. On every one of these breaches, when I read the article, I shook my head and thought, “if they had just encrypted the data using proper ALE, that [INSERT COMMON ISSUE HERE] wouldn’t have resulted in such a mess”.
Encryption by itself, of course, is not data protection. The patterns of where and when things are encrypted, using what keys, where the keys are stored, how they can be accessed, and the pathways for interacting with sensitive data all matter tremendously.
Application-layer encryption (ALE) is the first and most important thing to add a meaningful layer of protection that can keep a hack from becoming a data breach. ALE just means encrypting the data before sending it to a data store (as opposed to the industry standard approaches of low-level transparent encryption, which only protects data on media or in services that are offline). To access the protected data in a readable form then requires both a compromise of the data and a compromise of the key (or of something that brings these things together). This isn’t impossible in any scenario, even end-to-end encryption, which is a form of ALE, but it makes a data breach significantly more difficult and less likely and it allows companies to force access of data to go through apps instead of allowing backdoor access directly via databases. This protection alone restricts insider access, which means even if a database admin’s credentials are stolen (or, like with the Coinbase hack, they’re bribed), protected data remains safe.
ALE is not trivial to do right, which is why it makes sense to use a platform that is well built. We talk about problematic design approaches elsewhere, but it can be done well as we show in this notes app demo. And while the number one objection (from developers) is around the usability of the data, that’s largely based on outdated assumptions. There are a lot of data-in-use technologies like encrypted search that allow data to be found, mined, and otherwise used even while it stays encrypted. In fact, if that compromised Elasticsearch database with the 184 million usernames and passwords, which was likely being used by researchers, had utilized ALE encrypted search like Cloaked Search, then the data would not have been stolen. Ditto for Microsoft’s customer support Elasticsearch misconfiguration that accidentally exposed 250 million customer records (they said they have systems to detect such misconfigurations, but those were misconfigured and didn’t detect this, which is a beautiful example of how complexity leads to mistakes which leads to compromise if the defense-in-depth is weak).
Every time a new breach hits the news, affected parties and journalists should ask, “why didn’t you application-layer encrypt the data?”
Because of the terrible inertia of the status quo and poor understanding of how to do things better, we’re stuck in this Groundhog day of breaches. Without ALE, single points of failure will continue to result in data being stolen over and over.