Patrick Walsh
Originally published at onezero.medium.com.

An Unpopular Opinion: Apple's CSAM Detection Is A Net Good

I work in cryptography. I’ve published a paper on the topic and hold patents as well. To undersell a bit: I care a great deal about privacy and security. And there’s very rarely any daylight between the opinions I hold and the position of the Electronic Frontier Foundation.

Yet, everyone I follow and admire in the privacy sphere is angry at Apple right now for their attempt to combat child sexual abuse photos in the Apple ecosystem. And I’m not.

Here are the significant points that the EFF and others are making:

  1. It’s a slippery slope: once you build the infrastructure to match known child sexual abuse photos, then it’s only a tiny stretch to use it to match other types of photos. Governments can apply pressure on Apple to use it for other things that endanger dissidents and others.
  2. It’s a form of mass surveillance: any time law enforcement is searching everyone’s phone or computer or whatever, it’s mass surveillance, which should remain solely in the domain of authoritarian Governments who are to be widely condemned.
  3. It’s no longer end-to-end encryption: your data may be encrypted and decrypted on the clients, but with this system, the data is inspected before it’s encrypted, and it could be separately encrypted for later review by Apple or others. Because of this, it is effectively a law enforcement encryption backdoor.
  4. There will be false positives: the scanning uses a fuzzy matching algorithm to detect similar images, and this could cause it to match against something that is not child pornography, causing privacy to be invaded when not warranted.
  5. Big corporations should not be in the morality or law enforcement business: many countries limit invasions of privacy by the Government, but those laws are often bypassed when private companies are involved, so Governments love to partner with private companies to invade the privacy of citizens legally, and now Apple has fallen into this trap.

To me, each of these points is compelling. And without deeper context and understanding, I think I’d be lining up with my pitchfork and yelling along with the (highly educated and respected) mob. So I’m here to discuss why I think these points don’t apply or carry less weight than may seem at first blush.

I believe Apple is doing right here not simply because of what they’re doing but because of how they’re doing it, which I think is a responsible and privacy-preserving approach to an important issue.

I have stayed silent until now because my views on this topic risk the scorn of people I respect and because I don’t want to tarnish my own reputation as an advocate of privacy and security. But after fielding questions from friends and family who have been alarmed by the headlines, I think I need to provide the narrative that Apple seems unable to articulate well. The privacy community values diversity in many ways. Hopefully, that extends to a diversity of ideas.

A Meaningful Preamble: My Past Exposure to NCMEC and CSAM

A little more than a decade ago, I was the CTO at a company named zvelo (formerly known as eSoft). We started as a Unified Threat Management (UTM) company providing all-in-one security appliances with network-layer anti-virus, intrusion prevention, firewall, VPN, spam filtering, and web filtering capabilities. When I started there, we pulled most of these capabilities together from other vendors and OEM’d them in our product.

At some point, we started bringing some of these technologies in-house. We built a threat-prevention team that could write intrusion detection signatures, reverse engineer viruses and create signatures to defend against them. And then, we started on web filtering.

The web filtering database we licensed at the time had around 100,000 sites in it. All were categorized at the top level, and all had been categorized by hand. It hardly ever changed, but we paid a hefty price for it.

We decided to build a system that used machine learning to classify web pages automatically. We could then aggregate classifications up to a path, including the root of the site. If the classifications were largely the same, we rolled them up into a single parent entry. But if pages or sections varied, we’d categorize them independently. This was a huge boon for improving the meaningfulness of the web filtering on sites like Yahoo Pages, which hosted home pages for millions of people. We had some 50 categories ranging from business to news to porn, and we allowed for multiple categorizations so you could have a page that was labeled both business and news at the same time.

Finally, we created a system whereby any pages where we had relatively low confidence in the categorization would go to humans. We also fed some percentage from each category to a team of humans so we could blindly check and track our accuracy. On the human side of things, we built an army of contractors who got paid according to their earned level. Pay ultimately was a function of accuracy and speed. For accuracy, we treated humans much like the machine classifiers, and some sampling of their categorizations went to someone with a higher level than them, and we’d check to see if there was agreement. If not, it went to a full-time person to break ties and educate anyone who miscategorized a site.

The entire system proved to be very effective, and we quickly built the world’s largest and most accurate web filtering database. And that business became so promising that we spun off the UTM business, selling it to a competitor, and kept just the web filtering business under the zvelo name. That database is sold as an OEM product and used under the hood in a variety of applications.

But here’s the thing: when you’re classifying the world’s web pages, you find a lot of garbage. And our to-do list of uncategorized pages was fed by real traffic flowing through web filtering software, which means we saw new and unlisted websites very quickly. Most of these weren’t indexed by Google (yet, at least) or anyone who relies on crawling.

But we started running into problems where our human categorizers were seeing websites with extremely violent imagery in some cases, disquieting pornography in other cases, and sometimes, child pornography. Of these, the child pornography sites were by far the most disturbing and the most upsetting to the contractors, as you might imagine.

Even though it may be upsetting to hear, let me be clear: we aren’t talking about pictures of nude children getting baths. We’re talking about photos of young children being raped. But we use phrases like “child sexual content,” “child pornography images,” and “child sexual abuse material (CSAM)” because we don’t want to conjure the images by describing their content. The content is horrific. So we’ll use CSAM going forward.

We ended up building an entire system for reporting those images and videos. We sent reports directly to the National Center for Missing and Exploited Children (NCMEC), so they could work with the FBI to deal with the sites we found. We also got access to NCMEC’s database of hashes of known child pornography images so we could, whenever possible, detect those and report the URLs automatically. It dramatically cut down on the amount of CSAM that made it through to our people.

That experience has left me with a powerful feeling of hatred for these sorts of photos and videos. They are bad. Evil. Many are worse than you can imagine. And the NCMEC database that helps select software companies to detect these images: that is an incredible resource for good.

What About Those Anti-Privacy Arguments?

They all have at least some validity, so I’ll take them one by one:

1. The Slippery Slope Argument

I understand this argument. And with Facebook or almost anyone else, I’d be pretty worried about it. But Apple has earned a bit more benefit of the doubt from me, and I’m willing to wait and watch and see what they do. If something happens to make that theoretical future bad behavior look more likely, and then I’ll reassess. But in the meantime, here’s why I think Apple as an organization has earned a bit more slack than most anyone else:

  • When the San Bernardino shooting happened, and the FBI got ahold of some iPhones and asked Apple to subvert its security measures to get in, they fought that publicly and in court. They weren’t willing to change their software and reduce its security even under tremendous Government pressure. I don’t know why people suddenly think Apple is going to change their posture on similar attempts to coerce them into subverting their security and privacy, but I, for one, presume they will continue to hold that line. And in the U.S., it’s unlikely they can be compelled to change their software. Court warrants are for the production of records and data, not for a software project that a software company would have to undertake.
  • Apple has gone out of its way to make this CSAM detection as private as possible and as resilient to false positives as possible. It’s an impressive design and well thought through. They put a similar amount of effort into keeping the “Find My” feature something that is useful to people but still preserves privacy. They have serious cryptographers taking serious research and making it a reality that is deployed at scale. Apple runs the largest scale end-to-end encryption software in the world, and they really do go the extra mile — the extra hundreds of miles — to keep data private and secure. Throw them under the bus as much as you want, but the alternatives do none of that. Show me the better alternative. I prefer to reward the company that is most thoughtful and most technically capable of creating and deploying advanced privacy. I should note that Google has a huge staff of awesome cryptographers, too, but we see far less of it brought to bear to protect customer data. Whereas Apple takes research like differential privacy, which was largely academic, and puts it into practice across the board. They move the needle in the right direction. Ideal security is sometimes the enemy of improved security. And the same could be said of privacy.

2. It’s a Form of Mass Surveillance

The idea that this is something of a warrantless search has merit. It’s slightly mitigated by the fact that this data is stored (even if in encrypted form) by Apple’s services. To me, what they’re doing is more akin to what they already do when scanning laptops for viruses automatically than it is to the Snowden revelations from 2013. When I think about mass surveillance, here are some things that alarm me:

  • Keeping track of who talks to whom;
  • profiling people based on sites they like to visit, things they purchase, even public postings they make;
  • searching people’s communications, including emails, text messages, voicemails, etc., for keywords;
  • storing any records on people not currently under suspicion so it can be mined in the future if they come under suspicion for something;
  • tracking where people are or where they have been;
  • and I suspect there are quite a few more if I keep brainstorming.

But one thing that doesn’t particularly bother me: a filter that matches photos against known child pornography images before uploading it to a service that doesn’t want to store such images. Is that really surveillance? I don’t think so. To me, it’s the prerogative of the service provider to determine policies and to enforce policies. Apple has gone the extra mile in an attempt to enforce those policies without themselves learning anything about your photos or other data unless a significant amount of illegal content has been detected.

3. It Undermines End-to-End Encryption

My company builds technology that helps other software companies add application-layer encryption and, if desired, end-to-end encryption to their apps, and I’m extremely sensitive to the idea of backdoors. I’m alarmed every time a politician or law enforcement leader agitates for them.

Backdoors aren’t just a slippery slope; they’re a hole in a system that actively undermines security. The holes are certain to be used by the wrong people eventually, and they undermine all security guarantees. There’s just no way to make sure that only “the good guys” can go through them and only when they have a legal and legitimate need to. Everyone should be strongly against cryptographic backdoors.

But what Apple is doing is very different from a cryptographic backdoor. They aren’t weakening the encryption. They’re processing data on the client before it is encrypted. And this is a very important distinction.

When we talk to software companies looking to add end-to-end encryption, we get many questions about how to do things with the data. Much of the time, the answer is, “you just need to do that on the client instead of the server.”

And to illustrate this concept, we often point to Apple as an example of a company that uses a lot of end-to-end encryption and does a lot of processing on the client-side. For example, Apple does image analysis on your photos to categorize them so you can search through your photos with descriptions like “mountains.” No one is yelling, “it’s a backdoor!” at that. Unlike Google, it does this on your phone and keeps your search index secure so that not even Apple knows what’s in your photo repository.

These are clever ways to keep data secure and encrypted and still enrich and use that data. Processing data before it’s encrypted is not, in my mind, any kind of cryptographic backdoor.

4. There will be false positives

Back in those web categorization days, we licensed some software that was supposed to detect when there was nudity in a photo. That software could do basic categorization, too, but the nudity detection was their headline feature. Unfortunately, in our tests, we found that it frequently couldn’t tell the difference between a sandy beach and a nude woman. And often, it couldn’t tell if a woman was wearing a bikini or not. The false positives made our system less accurate instead of more accurate, and we had to turn it off.

But you know something that we never got false positives on? The NCMEC child pornography database. That wasn’t machines guessing at the content of a photo. That was something altogether different. That is a curated list of fingerprints for specific, vetted photos.

Some tech folks have made a big deal about the fuzzy matching aspect. This is where you can match a photo even if the resolution is different or if the aspect ratio is a bit changed or the brightness or contrast is adjusted, etc. It’s like the machine squints its eyes and looks at a blurry version of the photo before matching. And for detailed things, if you have two nearly identical photos with a small change, they can definitely both result in a match even though they’re different. And in some contrived cases, you can get photos that look different but that still match, though it isn’t very likely without someone crafting something to fool the algorithm.

So here’s the thing: false positives are possible, but they will be extremely rare. And if there is a false positive, well, that’s not enough to trigger a report. You have to have 35 matches before action is taken. The odds of that happening to someone who holds only innocent pictures are vanishingly slim. And this idea that a malicious person could send such crafted photos to someone to frame them is also pretty ridiculous on a lot of levels, including that they’d have to have access to the originals, and in the end, someone still checks the photos to make sure it’s the real deal.

5. Big corporations should not be in the morality or law enforcement business

I agree. It’s not great when companies force their morality on others, as often happens to LGBTQ folks, sex workers, and other marginalized folks. But we’re not talking about the gray areas here. We’re not talking about something where even supreme court justices can’t create a clear definition. This is something that is universally abhorrent.

If Apple suddenly decided no one could have nude photos on their devices, for example, this would be a very different conversation. But this isn’t a gray area or even an area where there are just different standards and beliefs around the world.

And while companies shouldn’t be in the morality enforcement business, all companies have policies and terms of service, and they have the right to enforce them. It’s not unreasonable for Apple to try to detect when those terms are being broken.

Conclusion

When people want law enforcement backdoors, they almost always bring up child pornography and terrorism. These two topics are like the backdoors to get the backdoors. For those of us who understand the tactic and see what their requests will do to security and privacy, we fight back. But that doesn’t stop the fact that stopping CSAM is a societal good. That stuff fuels the violent fantasies of child predators.

I believe that everyone wants strong security and privacy, and almost everyone wants to stop child predators. If Apple can find a way to protect the privacy of its users while preventing this shit from sitting on its servers, which I believe they have done, then I’ll stand up for them. And I’ll give them the benefit of the doubt on the whole slippery slope thing even as I continue to watch them closely for any sign that their commitment to privacy is wavering or their foot is aiming for that slope.