WavebreakmediaMicro - Fotolia
Amazon S3 is one of the oldest and most popular cloud services, containing exabytes of capacity, spread across tens of trillions of objects and millions of drives. Given its scale and significance to so many organizations, AWS doesn't make changes to the storage service lightly.
Nevertheless, sometimes modifications and updates are required to improve scalability and functionality, or to add features. That was the apparent rationale for planned changes to the S3 REST API addressing model. This change will deprecate one syntax for another. Given the wide-ranging implications on existing applications, AWS wisely gave developers plenty of notice, with support for the older, S3 path-style access syntax not ending until Sept. 30, 2020.
This announcement might have gone unnoticed by S3 users, so our goal is to provide some context around S3 bucket addressing, explain the S3 path-style change and offer some tips on preparing for S3 path deprecation.
Get to know the Amazon S3 REST API
For starters, it's critical to understand some basics about S3 and its REST API.
Unlike hierarchical file systems made up of volumes, directories and files, S3 stores data as individual objects -- along with related objects -- in a bucket. S3 buckets organize the object namespace and link to an AWS account for billing, access control and usage reporting.
Objects within a bucket are uniquely identified by a key name and a version ID. Every object has only one key, but versioning allows multiple revisions or variants of an object to be stored in the same bucket. The crux of the impending change to the S3 API entails how objects are accessed via URL.
Differentiate between the Amazon S3 addressing styles
S3 currently supports two forms of URL addressing: path-style and virtual-hosted style. The latter, also known as V2, is the newer option.
Objects in S3 are labeled through a combination of bucket, key and version. However, the two addressing styles vary in how they incorporate the key elements of an S3 object -- bucket name, key name, regional endpoint and version ID.
For example, let's say you encounter a website that links to S3 objects with the following URL:
If versioning is enabled, you can access revisions by appending "?versionId=<the version ID>" to the URL like this:
In this example, which illustrates virtual-host addressing, "s3.amazonaws.com" is the regional endpoint, "acmeinc" is the name of the bucket, and "2019-05-31/MarketingTesst.docx" is the key to the most recent object version. Thus, the bucket name becomes the virtual host name in the address.
Note that our example doesn't include a region-specific endpoint, but instead uses the generic "s3.amazonaws.com," which is a special case for the U.S. East North Virginia region.
If you wanted to request buckets hosted in, say, the U.S. West Oregon region, it would look like this:
Alternatively, the original -- and soon-to-be-obsolete -- path-style URL expresses the bucket name as the first part of the path, following the regional endpoint address. Sticking with our U.S. West Oregon region example, the address would instead appear like this:
Here is a complete example from AWS documentation of the alternative syntaxes using the REST API, with the command to delete the file "puppy.jpg" from the bucket named "examplebucket," which is hosted in the U.S. West Oregon region. First, the virtual-hosted style request:
Next, the S3 path-style version of the same request:
Why the Amazon S3 path-style is being deprecated
AWS initially said it would end support for path-style addressing on Sept. 30, 2020, but later relaxed the obsolescence plan. AWS will continue to support path-style requests for all buckets created before that date. There are two reasons for the original S3 path-style change, according to AWS evangelist Jeff Barr:
- The path-style model makes it increasingly difficult to address domain name system resolution, traffic management and security, as S3 continues to expand in scale and add web endpoints. When problems arise, the virtually hosted model is better equipped to reduce the scope of the damage.
- S3 features currently under development depend on unique, virtual-hosted style subdomains. Users of path-style addressing might miss out on these potential features, such as greater control over the security configuration, including ciphers and cipher versions for each bucket.
Preparing for the switch
It's not too soon for users to prepare for S3 path deprecation, and AWS has already called out several pointers:
- First, identify path-style URL references. Use S3 access logs and scan the Host header field. You can also check the host element of the requestParameters entry of CloudTrail Data Events to find applications making path-style requests.
- AWS SDKs use the virtual-hosted reference, so IT teams don't need to change applications that use those SDKs, as long as they use the current versions.
- Consider changing the name of any buckets that contain the "." characters or other nonroutable characters, also known as reserved characters, due to known issues with Secure Sockets Layer and Transport Layer Security certificates and virtual-host requests.
- If you aren't already, start using the virtual-hosting style when building any new applications without the help of an AWS SDK.