AWS was one of the first major cloud providers to offer a managed container service, pre-empted only by Google,...
which gave rise to modern containerization. But overall, AWS container services still seem like a work in progress.
AWS introduced Amazon Elastic Container Service (ECS) over three years ago, but its level of complexity deterred all but the most sophisticated users. And ECS has been fodder for a steady stream of improvements since shortly after its debut.
With ECS, users still must run, monitor, restart and update a cluster of Elastic Compute Cloud (EC2) instances behind the scenes. ECS uses EC2 instances to create a virtual server pool for the Docker runtime and other components, but it doesn't automate the operation of the underlying AWS resources.
An IT team must use a collection of other services to automate tasks, including CloudFormation for resource deployment templates, CloudWatch for resource monitoring and Application Load Balancer for traffic routing. Elastic Beanstalk can manage most of the deployment, configuration and ongoing monitoring, but that adds even more complexity.
Amazon ECS issues
Users face a number of challenges with Amazon ECS; cluster management is just one of them. Cluster scaling, in particular, can be troublesome when you need to accommodate variable workloads. ECS doesn't handle this task. Instead, users must run EC2 instances in an Auto Scaling group.
Health checks can also cause issues. The service will automatically start tasks until it reaches the configured capacity, and it will restart services that fail. The EC2 cluster must coordinate with a load balancer to perform this task. But some applications might reach a state in which a container starts but can't execute due to a missing dependency, such as a database that isn't online. This problem can cause an endless loop within ECS, as it attempts to stop and restart a service infinitely. Developers must carefully configure health parameters to avoid these cascading restarts.
Also, when you decouple container and instance management, the resulting log data often excludes context. For example, a log might not record that a container restarted as a result of host failure and an ECS restart. Developers are on the hook to provide meaningful telemetry in application logs.
ECS uses a rolling deployment model in which a minimum number of containers will remain running -- 50% by default -- as others stop and update with new code. While this technique eliminates downtime, there will be a window when two versions of an application or microservice run at the same time. This could be a problem for other applications when you need to update APIs and I/O formats.
One of ECS' substantial technical achievements is its state management and consistency over a distributed system, which ensures that simultaneous changes on different containers in the cluster don't create conflicts. For example, this can prevent writes to the same database before a previous write commits. Nevertheless, the state management system can produce problems if container instances lose contact with the management engine.
Finally, another Amazon ECS issue is that it doesn't establish a full application stack. ECS requires other components to build a complete containerized, service-based application. Developers will need a service registry, like Amazon Elastic Container Registry, to store and manage Docker images.
Recent ECS improvements
Containerization is still one of the hottest areas in cloud computing. So it's not a surprise that AWS container services, including ECS, constantly evolve. Recent enhancements include:
- Cloud-native networking for ECS, which enables containers to use elastic network interfaces provisioned for EC2 instances within an Amazon Virtual Private Cloud (VPC).
- Enhanced Auto Scaling capabilities within ECS. Use CloudWatch alarms to set scaling policies to trigger new services in a container cluster. But these policies only apply to the container tasks, not cluster instances, which must still use an EC2 Auto Scaling group and a load balancer to spin up more capacity.
- ECS discovery via domain name system (DNS). This process uses an agent in container instances to register services to the Route 53 DNS service. The agent watches Docker events and registers the service name and each task's metadata, such as IP address and ports, into a Route 53 private hosted zone. With ECS, you can also update agents that run on EC2 cluster instances and deregister task definitions that are no longer needed or refer to outdated revisions.
- Improvements to the ECS console. These enhancements include easier initial setup and better error logging of Docker events, which now include task start and stop time and a brief reason for the error, such as a failed Elastic Load Balancing health check.
AWS also introduced its Fargate service, which removes the need to deploy and manage EC2 clusters, at re:Invent. Fargate, which is similar to Azure's new Container Instances service, takes a container image and deployment requirements for CPU, memory, networking and Identity and Access Management (IAM) policies and then automatically provisions instances on an AWS-managed cluster. Fargate supports ECS primitives and APIs, and it integrates with VPCs, including support for elastic network interfaces, IAM, CloudWatch and load balancers.
Managed Kubernetes vs. ECS
At re:Invent 2017, AWS also released native Kubernetes support via the new Elastic Container Service for Kubernetes (EKS). This service eliminates most of the manual work required to run Kubernetes on an EC2 cluster, such as deployment, upgrades and application patches. Kubernetes handles the physical deployment and scaling of server instances in a cluster -- which it calls a pod. As a result, EKS eliminates much of the extra work of using ECS, including automatic replication of Kubernetes masters across three availability zones for high reliability. Also, EKS is built on open source Kubernetes, so users can migrate applications that run on private Kubernetes clusters to EKS without changing code.
AWS' release of EKS is a tacit admission that Kubernetes is the de facto standard for container cluster management and obviates the need for the AWS Blox scheduler that was introduced in 2016. AWS Blox has all but disappeared from the AWS website, outside of just a handful of references.
Future of AWS container services
The new features and services out of re:Invent 2017 significantly change the AWS container roadmap. ECS could take a secondary role in the future.
New container users and experienced developers that don't need Kubernetes' scale and degree of control will opt for Fargate instead of ECS, as it eliminates many challenges. Experienced container users, particularly those that need massive scalability and deploy applications in a microservices architecture, will likely gravitate to EKS, since it uses an orchestration engine familiar to many developers. EKS also enables easy application and task migration between AWS and other clouds or private infrastructure.