BACKGROUND IMAGE: iSTOCK/GETTY IMAGES
Public cloud encourages the use of distributed applications, enabling companies to take advantage of a large number of servers to run multiple systems -- from large-scale enterprise applications to targeted microservices. But how can administrators easily move all that data to the appropriate servers? One way is through a message bus; and admins looking to ingest large volumes of data in near real time from multiple sources should consider Amazon Kinesis.
Amazon Kinesis is a managed messaging service from Amazon Web Services (AWS) that offers high performance and low administrative overhead for real-time data processing. The service is designed to accept messages from a large number of sources and distribute them to a variety of consuming applications. Kinesis is modeled along the lines of Apache Kafka, which provides a publish-and-subscribe messaging service.
Amazon Kinesis setup
The first step to set up an Amazon Kinesis publish-and-subscribe platform is to define a Kinesis stream. This is typically done in the AWS Management Console.
A stream is a set of resources that receive, store and transfer messages. High-volume data streams can be divided across multiple shards, much like the process of scaling server clusters by using multiple servers. The number of shards you need depends on the average size of messages, the rate at which records are written and the number of consumer applications. The AWS Management Console features a tool to help admins estimate the number of shards needed to meet their requirements.
As with any AWS resource, you need to define access controls. Kinesis privileges allow you to specify the users and roles that can place messages in the queue, get status and details about the queue, and read from the queue.
Consumer applications invoke the GetRecords API function. Typically, the targeted application runs this function continually in a loop. Each record can be up to 50 KB in size. The system can hold up to 2 MB of data in a second on a single shard. If you need additional throughput, add more shards to your stream. The GetRecords function also supports a LIMIT parameter to specify the maximum number of records to get in a single invocation. Administrators can use this to pace the volume of data accepted by the consumer application, especially during periods of peak writing to the message queue.
Where Amazon Kinesis falls short
When it comes to real-time data processing, Kinesis has a few limitations. The service keeps messages for up to 24 hours. This is different from Kafka, which can be configured to store messages for much longer time periods. IT teams should allocate sufficient resources to consuming applications to read all messages within 24 hours.
AWS CloudWatch can help monitor the load on a message queue and the throughput of consuming applications. AWS Elastic Load Balance or Auto Scaling can help ensure there are sufficient compute resources to keep up with the message stream.
If you exceed Kinesis' limits on accepting messages, you'll receive an error known as a ProvisionedThroughtputExceed exception; then the message will be rejected. If limits are exceeded during read operations, you will receive a ProvisionedThroughtputExceed error as well. To add capacity, you can add more shards to your stream.
To add shards to a stream, split existing shards. Each split operation only takes a few seconds; however, you can only split one shard at a time. This becomes an issue if you have hundreds of shards, as it will take some time to significantly increase the relative capacity of the stream.
Development and integration
Kinesis provides a REST API so you can use almost any programming language to write and read from a message queue. The AWS software development kits also provide language-specific bindings for Kinesis functions.
AWS offers several connectors to streamline integration with other services, including DynamoDB, Redshift, Simple Storage Service and ElasticSearch. Kinesis is billed by the shard hour at $0.015 per hour and by the number of PUT operations, or $0.028 per 1,000,000 PUTs.
AWS analytics buy sets stage for next wave of data processing
Amazon Kinesis: When to use Amazon's big data processing service