BACKGROUND IMAGE: iSTOCK/GETTY IMAGES
Amazon S3 remains the most popular storage option from AWS, but it has some limitations. Enterprises -- and, more importantly, legacy applications -- can't use it to interact with the service like a standard network file system. Instead, they have to use REST APIs to access individual file objects.
AWS addresses this issue through its Elastic File System (EFS), a service that provides standards-based file shares on the cloud. But despite the benefits of Amazon EFS, some users struggle to determine if it's the best AWS storage option for them. And, if they do choose EFS, configuring the service can also pose a challenge.
Amazon EFS basics
EFS is a managed network-attached storage filer for EC2 instances based on network file system (NFS) version 4. Unlike DIY NFS implementations that might use one or more EC2 instances with Elastic Block Store (EBS) volumes as an NFS server, EFS is distributed across servers that span several availability zones (AZs). This eliminates I/O bottlenecks to improve performance.
This distributed design also means Amazon EFS is highly available, reliable and scalable -- up to petabytes -- with I/O throughput that increases as the file system grows. EFS volumes deliver consistent performance of 50 Mbps per TB of storage; however, throughput can double to 100 Mbps in short bursts. Burst performance for file systems larger than 1 TB linearly scales at 100 Mbps per TB.
Like other NFS file shares, compute instances mount the remote file system to access data. Instances in different virtual private clouds (VPCs) and AZs can create "mount targets" in each VPC to mount the same share.
Once mounted, the Amazon EFS share looks like any other file system that's compliant with the Portable Operating System Interface, and it uses standard NFS permissions to control access to users and groups. Mount targets also enable on-premises systems to access EFS shares via a VPC that spans a Direct Connect network link. Furthermore, EFS is available to VMs hosted on VMware Cloud on AWS.
When to use Amazon EFS
Due to its ability to customize I/O performance, EFS is versatile and well-suited for a range of workloads, including data analytics, database backups, rich media storage, content management collaboration, user home directories and container image storage.
Enterprises typically compare EFS to S3 and EBS when they weigh their AWS storage options. EFS is generally best for traditional file-based applications, while S3 is best for cloud-native applications. EBS is ideal when users require maximum control over the file volume configuration. Other characteristics of each include:
- EFS: This service can be mounted as a file system, provides virtually unlimited scale, supports huge files, has multi-AZ redundancy, offers high and customizable throughput with reasonable latency and supports traditional applications.
- S3: This service offers unlimited scale on file objects, works with most file object sizes, supports highly parallel applications, is widely accessible over the internet and has dual-AZ redundancy.
- EBS: This option has the lowest latency, imposes no limit on file size, provides high throughput, supports various file systems, boot volumes and databases, and provides redundancy only within an AZ. This service can also only be mounted by a single instance in one AZ.
Amazon EFS configuration
EFS offers two performance modes, which users configure at setup to accommodate different workloads:
- General Purpose mode: This mode balances low latency with reasonable I/O throughput and is well-suited for most workloads, including web servers, content management systems and user home directories. General purpose shares are limited to 7,000 Input/Output operations per second (IOPS).
- Max I/O mode: This mode can scale to much higher levels of total I/O throughput, with the tradeoff of higher access latency. Max I/O shares are best suited for parallelized workloads, such as big data analysis, video processing and technical applications, including genomics.
Users can also configure EFS in one of two throughput modes that control a share's I/O capacity:
- Bursting mode: This is the default option, with IOPS capacity of up to 100 Mbps for short periods. Since EFS shares scale standard I/O throughput as the volume size grows, the amount of burst capacity and its duration directly relate to the size of the EFS file system. For example, a 100 GB share sustains 5 Mbps, while a 1 TB volume increases this to 50 Mbps sustained, but each can burst to 100 Mbps.
EFS uses a system of credits to determine how long bursts can sustain, with increases based on the size of the underlying file system. In our previous example, a 100 GB share can burst for up to 72 minutes per day, while the 1 TB file system earns more credits that enable it to burst up to 12 hours per day. File systems larger than a TB can burst to proportionately larger values than 100 Mbps based on size -- for example, a 5 TB share can burst to 500 Mbps for up to 12 hours per day. To monitor their burst credits, users can set up a CloudWatch alert that notifies when the "BurstCreditBalance" parameter drops before a certain threshold.
- Provisioned mode: This mode is for applications that require high sustained throughput or have IOPS requirements that exceed the file system's burst capacity. It also enables smaller file shares to achieve high I/O rates without overprovisioning capacity. Users can increase or decrease provisioned throughput at any time, but AWS only allows one decrease per day. Likewise, users can convert EFS file systems between modes no more than once per day.
Provisioned throughput is an extra-cost option, so use it discriminately. For example, a 100 GB file system with 10 Mbps of provisioned throughput would cost $30 per month for the basic (burstable) storage. It would cost an additional $30 per month for the provisioned I/O, after accounting for the 5 Mbps of throughput -- at 50 Kbps per GB of capacity -- included with burstable service.
Because Amazon EFS is based on NFS version 4, users should follow best practices for an NFS file volume. For instance, they should unmount a file mount target before deleting it; use a current Linux version with the latest NFS code and bug fixes; enable parallelized open and close operations for the OS configuration; and ensure that the number of open files and simultaneous users doesn't exceed EFS limits, which are 32,000 and 128, respectively.
Like other NFS implementations, each file operation has some latency, so access to large numbers of small files is much slower than reading one large file. To get around this, parallelize small file access across many EC2 instances, which results in higher aggregate throughput. Also, read and write requests to an EFS share use up system memory and CPU resources, so choose larger instance types to handle the I/O for apps that make thousands of EFS accesses.