Docs
SaaS Shield
Suite
Amazon S3
How It Works

How SaaS Shield CMK for Amazon S3 Works

The core of the SaaS Shield CMK for Amazon S3 product is the new S3 Proxy service. This service provides a transparent proxy that handles requests destined for the Amazon Simple Storage Service (S3). The proxy supports all methods available on Amazon’s S3 service. For some methods, the client’s request is simply passed through to S3. For other methods, the proxy makes modifications before sending the request to S3.

Here, we describe in more detail how the S3 proxy works.

Terms and Definitions

In the following discussion, we use the following terms:

DEK: Document encryption key, used to encrypt an object stored in S3
EDEK: Encrypted document encryption key, a DEK encrypted using a customer-controlled key, which is stored in an object’s tags
TSP: Tenant Security Proxy, a companion service to the S3 proxy that is used to generate DEKs and EDEKs

S3 Client Changes

The proxy was designed to require minimal changes to an S3 client to enable transparent object encryption. That being said, there is an important change that must be made to allow the proxy to function. This is changing the endpoint to which the client directs its requests. Instead of an endpoint such as s3.amazonaws.com, you must instead point to the URL that has been configured for the proxy. This change can be easily implemented via changes to the configuration of Amazon’s S3 SDKs.

This is an example configuration using AWS SDK for Java v1.11.880:

Java

AmazonS3ClientBuilder
    .standard()
    .withEndpointConfiguration(new EndpointConfiguration("proxy-url.example.com", "us-west-2"))
    .build()

AWS Credentials

The proxy supports calls from any AWS user, but each user that is allowed must have credentials stored in the S3 proxy configuration file. These credentials include the user’s access key and secret key. If no credentials are stored, the proxy will not start up. When a request is received by the proxy, the access key is looked up in the configuration’s credentials map. If the access key cannot be found, the request is rejected. Otherwise, the proxy uses the stored secret key to sign and send the new request.

Request Signing

Note that the request signing must be version 4 of the AWS signature algorithm and that both the Authorization Header and Query Parameter methods of authentication are supported. Note that the query parameter method of authentication was added in 1.4.0.

When the proxy receives a request, it uses the Authorization header or X-Amz-Credential query parameter to determine the credential being used and the corresponding AWS account to re-create the request’s signature. If no credential is present, the credential contains an AWS account not present in the configuration file, or the signatures do not match, the proxy returns a 403 error, and the message body conveys the reason for the failure. If the signatures do match, the proxy transforms the request before sending it to S3. The proxy submits the transformed request to the same region that the original request was intended for. The submitted request retains most of the headers that were received, but some are modified or removed (see Appendix B). Using the same AWS account that sent the request, the proxy generates a new Authorization header with a new signature. This signature is calculated using every header possible, except the Connection header.

Determining Tenant ID for Encryption

In order to allow tenants to control their own data, SaaS Shield CMK for Amazon S3 must determine the appropriate tenant for each object being stored in S3. It can do this in two ways: with a folder mapping file or explicitly with a request’s header.

Folder Mapping

With a folder mapping, the proxy determines the tenant for a request based on the bucket and key of the object being stored. The S3 proxy configuration includes two parameters, ConfigBucket and ConfigKey, that refer to a folder mapping file that must be stored in S3. The file can be stored in any bucket within the same region as the proxy, but all AWS accounts present in the configuration must be able to read the file.

The file is HOCON, a JSON superset, with the following form:

Conf

mapping = [
   { explicit-tenant-regex = {BUCKET_AND_KEY_REGEX}, tenant-id = {TENANT_ID} },
   { capture-tenant-regex = {CAPTURE_REGEX} }
]

This mapping is a single list of JSON objects. Each object can take one of two forms: explicit tenant mapping or capture tenant mapping. Explicit tenant mapping is the most straight-forward: it contains a regular expression and the tenant ID to associate with any objects that match the expression. Capture tenant mapping, on the other hand, only contains a regular expression and uses what’s found in the first capture group as the tenant ID. In both cases, the regular expressions are matched against the object bucket and key, joined with a / character. These regular expressions are evaluated in order, so conflicts in expressions are allowed. They are automatically anchored at the start and the end of the expressions. The first expression that matches will determine the tenant. This mapping is used for object encryption; when a tenant ID is determined for an object that is being added to S3, that tenant’s encryption key is used for the object. However, decryption uses metadata encoded into the object’s tags. This means that you are free to modify the mapping at any time without compromising the decryption of previously encrypted objects.

The mapping file is downloaded from S3 when the proxy starts up, and the proxy checks for updates to the file every 10 minutes thereafter. If there is no file present at the location specified by ConfigBucket and ConfigKey, the proxy will upload an example mapping file to the location. This example contains multiple sample entries, but they are all commented out. This will allow the proxy to start up successfully, but no files will be encrypted until the mapping has been updated and reloaded. An invalid mapping file is treated as a fatal error on proxy startup, but a running proxy will continue to function if an updated mapping is invalid. In this case, the proxy logs that it was unable to update the mapping and continues to use the previous version.

Example mapping:

Conf

mapping = [
   { explicit-tenant-regex = "tenant_foo/.*", tenant-id = "TENANT_FOO" },
   { explicit-tenant-regex = "tenant_bar/.*", tenant-id = "TENANT_BAR" },
   { capture-tenant-regex = "my_bucket/(.*?)/.*" },
   { explicit-tenant-regex = ".*", tenant-id = "FALLBACK" }
]

The last regular expression is a catch-all, intended to match any bucket and key that is submitted to the proxy. If you configure a tenant using a key you control, you can guarantee that every object written to S3 will be encrypted. If you omit this, if an incoming bucket and key are not matched by any regular expression, the object is written to S3 unencrypted.

Request Tenant ID Header

The proxy supports a custom header on encryption requests that can allow you to explicitly set the desired tenant ID for a request. This header is x-icl-tenant-id and is enabled by default in the configuration. If the header is present and enabled, it will take priority over any folder mapping matches for an encryption request.

AWS Encryption

AWS supports different types of encryption for objects stored in S3. The proxy uses Server-Side Encryption with Customer-Provided Keys (SSE-C) for all of its encryption and decryption operations. For this type, Amazon S3 manages the encryption and decryption automatically as objects are written/accessed, and the proxy supplies the encryption keys. These keys are passed to S3 through encryption headers on specific requests. Though S3 uses five different encryption headers to allow various forms of encryption, the proxy only uses three of them: x-amz-server-side-encryption-customer-algorithm, x-amz-server-side-encryption-customer-key, and x-amz-server-side-encryption-customer-key-MD5. These headers, as well as the two other encryption headers, are considered reserved for the proxy; they will be removed from incoming requests so that the proxy can generate its own. If a caller wishes to supply its own encryption headers, it must make the request directly to S3 instead of through the proxy.

ICL Reserved Tags

S3 allows every object to have 10 associated tags. These tags are key-value pairs that are stored as un-encrypted metadata for the object. The S3 Proxy adds between one and four tags to each encrypted object containing the information necessary to decrypt it. This info includes the tenant ID and an encrypted form of the key used to encrypt the object (EDEK). The number of tags necessary depends on the length of both the tenant ID and the encryption key. These tags all have a key that begins with the reserved prefix ICL@. Any requests that pass through the proxy that attempt to delete or modify these tags are changed to preserve them. In addition, any tags included in a request that begin with this prefix are removed from the request. It is extremely important that these tags are not modified or removed through other means (Web interface, CLI, etc.), as doing so will prevent the object from being decrypted.

Encrypting Objects

PutObject

When the proxy handles a request to PUT an object into S3, it first has to determine the tenant ID to use for encryption. If the request contains the x-icl-tenant-id header and header processing is enabled in the configuration, the header’s value is used as the tenant ID. If not, the proxy then evaluates the folder mapping file and uses the first match as the tenant ID. If there are no matches, then the file will be uploaded un-encrypted and the rest of this section can be skipped. After selecting the tenant ID, the proxy passes it to the TSP. The TSP generates a random document encryption key (DEK) for the object and uses the identified tenant’s configured KMS to encrypt the DEK, creating an encrypted document encryption key (EDEK). The proxy forms encryption headers for the request that use the DEK as the x-amz-server-side-encryption-customer-key. This causes S3 to encrypt the object using the provided key before storing it. After S3 uses the key to encrypt the object, it discards the key.

The EDEK and the tenant ID are encoded as ICL reserved tags and added to the object’s tags. Since the DEK is randomly generated per object and forgotten after the object is encrypted, these ICL reserved tags are required to decrypt the object and must never be deleted or modified.

CopyObject

When copying an object from one location to another, the source’s encryption key is not copied as well; new encryption headers are generated for the new object. This means that you can copy an object that is encrypted with one tenant ID to a location that encrypts to a different tenant ID. This also allows you to copy an un-encrypted object into a location that maps to a tenant ID; the new copy is automatically encrypted based on the destination. In addition, this enables an easy approach to key rotation: copying an object on top of itself preserves the object’s data while changing its encryption key. Note: because the proxy needs to generate new ICL reserved tags for the object, a supplied x-amz-tagging-directive header is ignored, and the header has a value of REPLACE in the final request.

Because the source’s encryption keys are required to perform the copy, the proxy makes a request to get the source’s tags prior to the CopyObject request. The proxy assumes that the source is in the same region as the destination, and it will attempt to follow the redirect if this is not the case. This means that cross-region copying of files will be slightly slower than same-region copying due to the extra requests that are required.

Multipart Uploads

Multipart uploads are different from regular PutObject requests because they are done with multiple calls that all require the same encryption headers. When a request to CreateMultipartUpload comes into the proxy, many of the same steps take place as for PutObject. The folder mapping supplies the tenant ID, and this is used by the TSP to create a DEK and an EDEK. The DEK is then used to create the encryption headers for the request, and the Tenant ID and EDEK are stored in the future object’s tags. The difference is that these headers are then saved in the server’s memory for later. When requests come in for UploadPart or UploadPartCopy, these encryption headers are retrieved from memory instead of being re-generated. Finally, the headers are cleared from memory by a call to AbortMultipartUpload or CompleteMultipartUpload. This means that even if the completion call to S3 fails, you can no longer add additional parts to the upload because the encryption headers were forgotten.

Decrypting Objects

GetObject

When the proxy receives a request to GET an object from S3, the proxy first makes a call to get the object’s tags. If the object contains no ICL reserved tags, the object is assumed to be un-encrypted, and the call is simply passed on. If the object does have ICL reserved tags, those tags are decoded to get the EDEK and tenant ID that were used when encrypting the object. These are then sent to the TSP, which decrypts the EDEK and returns the DEK that was used to encrypt the object. The proxy then re-creates the encryption headers that were used when uploading the object and adds them to the new request. S3 automatically decrypts the file using the DEK and returns the un-encrypted data.

Modifying Objects

PutObjectTagging

Calls to PutObjectTagging don’t require object decryption, but they do require special care to avoid changing or removing the ICL reserved tags. With regular S3, the XML body of the request contains the new tags the object should have, replacing all old tags. To preserve the ICL reserved tags necessary for decryption, the proxy changes the request’s body. First, the body of the request is filtered, removing all tags with a key that begins with the reserved ICL prefix ICL@. Then, the proxy does a GetObjectTagging request to get the object’s current tags. It finds all current ICL reserved tags and adds them to the request’s XML tags. The request is finally sent to S3. Note that S3 rejects the request if the body now contains more than 10 tags.

DeleteObjectTagging

Similar to PutObjectTagging, calls to DeleteObjectTagging must be modified by the proxy to avoid removing reserved ICL tags. Because this method deletes all tags on the object, it can never simply be passed through. Instead, the proxy makes a GetObjectTagging request to retrieve all of the object’s ICL reserved tags, then sends them as the XML body of a PutObjectTagging request. Since these are the only tags in the body, this deletes all other tags. Because PutObjectTagging and DeleteObjectTagging share valid headers (at the time of writing, this is only x-amz-expected-bucket-owner), the original request’s headers are preserved when forming the new request.

Unsupported at This Time

Browser-Based Upload using HTTP POST
CORS
Chunked authentication

Notes

Unauthenticated requests are not supported.
CreateMultipartUpload holds the encryption headers in the server’s memory until AbortMultipartUpload or CompleteMultipartUpload. If the server restarts or crashes and loses these headers, the multipart upload needs to be restarted.
The Expect header is not honored by the proxy and is not included when making the call to S3.
If the caller signs over the request payload, the proxy needs to load the entire request into memory.

Appendix A: Which Requests Are Passed Through and Which Are Modified?

Passthrough requests

Requests to the root
- ListBuckets
Requests to a bucket
- CreateBucket
- DeleteBucket
- DeleteBucketAnalyticsConfiguration
- DeleteBucketCors
- DeleteBucketEncryption
- DeleteBucketInventoryConfiguration
- DeleteBucketLifecycle
- DeleteBucketMetricsConfiguration
- DeleteBucketPolicy
- DeleteBucketReplication
- DeleteBucketTagging
- DeleteBucketWebsite
- DeleteObjects
- DeletePublicAccessBlock
- GetBucketAccelerateConfiguration
- GetBucketAcl
- GetBucketAnalyticsConfiguration
- GetBucketCors
- GetBucketEncryption
- GetBucketInventoryConfiguration
- GetBucketLifecycle
- GetBucketLifecycleConfiguration
- GetBucketLocation
- GetBucketLogging
- GetBucketMetricsConfiguration
- GetBucketNotification
- GetBucketNotificationConfiguration
- GetBucketPolicy
- GetBucketPolicyStatus
- GetBucketReplication
- GetBucketRequestPayment
- GetBucketTagging
- GetBucketVersioning
- GetBucketWebsite
- GetObjectLockConfiguration
- GetPublicAccessBlock
- HeadBucket
- ListBucketAnalyticsConfigurations
- ListBucketInventoryConfigurations
- ListBucketMetricsConfigurations
- ListMultipartUploads
- ListObjects
- ListObjectsV2
- ListObjectVersions
- PutBucketAccelerateConfiguration
- PutBucketAcl
- PutBucketAnalyticsConfiguration
- PutBucketCors
- PutBucketEncryption
- PutBucketInventoryConfiguration
- PutBucketLifecycle
- PutBucketLifecycleConfiguration
- PutBucketLogging
- PutBucketMetricsConfiguration
- PutBucketNotification
- PutBucketNotificationConfiguration
- PutBucketPolicy
- PutBucketReplication
- PutBucketRequestPayment
- PutBucketTagging
- PutBucketVersioning
- PutBucketWebsite
- PutObjectLockConfiguration
- PutPublicAccessBlock
Requests to objects that don’t require encryption headers
- AbortMultipartUpload
- CompleteMultipartUpload
- DeleteObject
- GetObjectAcl
- GetObjectLegalHold
- GetObjectRetention
- GetObjectTagging
- GetObjectTorrent
- ListParts
- PutObjectAcl
- PutObjectLegalHold
- PutObjectRetention
- RestoreObject

Requests Needing Modifications

All others
- CopyObject
- CreateMultipartUpload
- DeleteObjectTagging
- GetObject
- HeadObject
- PutObject
- PutObjectTagging
- SelectObjectContent
- UploadPart
- UploadPartCopy

Appendix B: Which Headers Are Modified/Removed and Which Are Passed Through?

Headers That Could Be Modified or Removed

Authorization
Expect
Host
x-amz-content-sha256
x-amz-server-side-encryption-*
x-amz-tagging-directive
x-forwarded-*

Passthrough Headers

All other headers