Feature rich services and systems often have unintended side effects when not considering how features impact another, as not all features were designed with all use cases in mind initially. Azure Storage certainly qualifies as a feature rich service – and one set of issues associated with unintended side effects is all about metadata.
In Azure Storage, blob objects carry different kind of metadata. Every blob comes with system properties. Some of them are read-only and are maintained by the service. Some of them can be set. On top of that, Azure’s blob storage supports custom metadata that can be set for containers or individual blobs. Read up on the REST API documentation about retrieving properties. The main thing to take away from the documentation:
- All properties, system and custom properties, are represented as response headers when doing a
get blob
orget blob properties
API call. - Retrieving a blob or its properties retrieves ALL properties, not just system properties. There is no API call to retrieve a subset of properties.
- As all properties are represented in HTTP response headers, if a client in a service consumes a blob directly from Azure blob storage, the client will have visibility on ALL properties.
This has impacts on your system design if you plan to have consumers of a service accessing content directly from blob storage:
- Consider if there are system properties which external consumers should not be privy to. Check for more on that below.
- Be mindful what you store in custom properties if external consumers are retrieving blobs directly from storage or through any service that passes through HTTP response headers from storage (e.g. like a CDN).
A closer look at HTTP response headers
Property values are mapped to HTTP response headers.
- Certain system properties are tied to standardized HTTP response headers, such as
Content-MD5
,Content-Type
orContent-Length
. - Other system properties are tied to proprietary header names starting with
x-ms-*
, such asx-ms-creation-time
,x-ms-blob-type
orx-ms-server-encrypted
. - Custom properties set by you are tied to header names starting with
x-ms-meta-*
. For example, if you set a metadata property named FooBar, it will be represented with a header namedx-ms-meta-FooBar
.
This allows you apply some more selective control when exposing blob storage through a service like a CDN or a proxy/gateway where you can filter HTTP response headers.
The special case of copy blob
copy blob
is an API operation that allows you to copy blobs to an Azure Storage location, with the source being pretty flexible. Even a URL outside of Azure (like Amazon S3) can serve as a source. Every time a blob is copied (within Azure Storage), the target blob receives both blocks and original metadata from the source, and in addition, a set of service properties associated with the copy operation. This set of service properties is represented through HTTP response headers starting with x-ms-copy-*
.
Like with other system properties, you need to consider if you can/want expose x-ms-copy-*
HTTP response headers to (external) clients. The intent of these system properties is to provide information that allows you to track down issues with the copy operation. Usually, that is of no concern to a client just interested in getting the blob that was the target of a copy operation. One particular property to call out in particular, is the CopySource
property, represented with the x-ms-copy-source
HTTP response header. It contains the URL of the source of the copy blob operation. The URL also includes query parameters. This can be problematic for two reasons in cases where untrusted clients can do get blob
on these copy blob
target blobs:
- Untrusted clients know where the blob they are getting has been copied from.
- More importantly, if a SAS token was used to access the source blob in the
copy blob
operation, that SAS token is now embedded as part of the request query parameters in theCopySource
property and hence thex-ms-copy-source
response header. If you’re unfortunate enough to have used an account SAS token with lots of permissions and a long expiry date, you have just given away full access to your storage account to any client capable of doing aget blob
on a target blob that exposed thiscopy blob
properties.
So what’s the recommendation here? You have a couple of options.
- Don’t use
copy blob
when your target blobs are accessed byget blob
from untrusted clients. You might not have control over this. - If you’re only concerned about query parameters leaking SAS tokens, don’t use a SAS token when accessing the source blob for
copy blob
. Use shared key or AAD based authentication. - If you have control over the workflow, you can do a
copy blob
and then clean up the properties of the target blob, by adding/modifying any custom property to the target blob. This triggers a removal of the copy properties on the target (other system properties remain of course). Of course, this needs to happen before you expose the target blob to any untrusted clients. - Probably the best advice right now: Don’t expose your blobs directly to untrusted clients, especially if you don’t have full control over the workflow used on the storage account(s). Use a CDN or proxy/gateway service that allows you to filter particular HTTP headers. The most reasonable approach is probably to remove headers of the
x-ms-*
pattern completely fromget blob
responses for untrusted clients only needing to access blob.