Get started Bring yourself up to speed with our introductory content.

Set up data analytics apps with this Amazon Kinesis tutorial

The Amazon Kinesis platform is a batch of managed services -- Kinesis Video Streams, Data Streams, Data Firehose and Data Analytics -- used to collect, process and analyze real-time streaming data in AWS. In this Amazon Kinesis tutorial, you'll learn how to navigate through data streams in AWS, create a Kinesis Data Analytics application and connect it to a data stream in Kinesis for real-time analytics.

Kinesis gets its streaming data from an input -- what AWS calls a producer. Then, Kinesis Data Streams or Firehose will process that data through a Lambda function, an EC2 instance, Amazon S3, Amazon Redshift or -- and this will be the focus of the tutorial -- the Amazon Kinesis Data Analytics service.

To get started, navigate to the Amazon Kinesis dashboard. If you don't have a data stream already set up, create one within the dashboard. You can use the AWS SDK or any number of language-specific Kinesis Client Libraries to connect the stream to the right producers.

If you need to write to or get more details on your data stream, you can use the AWS Command Line interface (CLI). To see each stream listed in your account, go into the AWS CLI and run the command aws kinesis list-streams. Then, to get more details on each stream, run the command aws kinesis describe-stream --stream-name <Stream Name>. Follow the Amazon Kinesis tutorial directions to learn how to put data into the stream and retrieve additional information, such as the stream's partition key and shard ID.  

To start analyzing real-time data, go back to the Kinesis Analytics dashboard and open the Data Analytics tab. It will ask you to create an application, which will consume data from your selected stream and aggregate it for real-time analysis. In the Kinesis setup dashboard, select the data stream to connect it to your application. From there, Kinesis will attempt to detect the schema of your data and present an SQL command window where you can watch the data update in real-time. More importantly, you can write queries against the data in your stream and filter out data you're not interested in for easier analysis.

View All Videos

Transcript - Set up data analytics apps with this Amazon Kinesis tutorial

Hello. Today, we're going to talk about how to use Amazon Kinesis Data Streams and Data Firehose for real-time analytics. So to start off, let's look at some of the documentation. We get this data into Data Streams from an application or some process we've set up that produces data. In the Amazon terminology, it's called a producer. It will take that produced data and put it out into a Lambda function or an EC2 instance or a Kinesis Data Analytics application, which is what we're going to be taking a look at later.

If you look at the page for Data Firehose, it's very similar except it still gets the data from a producer. But the consumer part of it, instead of invoking a Lambda or putting it in an EC2 instance, it can put it into Elasticsearch, Redshift or S3. Aside from that, [Kinesis Streams and Data Firehose] are pretty similar. Definitely want to take some time to evaluate which is better for your use case. But as I mentioned earlier, I'm going to focus today on AWS' data analytics service.

So, here I am on the Amazon Kinesis dashboard. As you can see, I already have a data stream up and running, so I'll go into that a little bit more later. But I want to show you guys how to create a data stream as well. So to start off here, let's start with create our data stream and I'll call this techsnips-demo.

And if we scroll down a little bit, we see that it's asking us for one input and that's the number of shards. Now, a shard is the rate at which Amazon will consume data. So if I put in one shard, that means I can write up to 1 MB, or 1000 records per second, and I can read 2 MB per second. And I can scale this as high as I need it to go for my application, but each shard will come with its own cost. So keep that in mind as well. For this demo, I'm just going to use one shard, and then click Next.

So my stream has been created. I'm going to go ahead and access it now. And we'll see we have different monitoring and tags that we can take a look at while we're using this.

We also have our ARN if we want to reference this in an API or a script or something like that. And we have producers that we can put action to send data to the stream. Now, depending on your application, there are several different ways to do this. I can show you one using the AWS CLI, but this is going to be one of those things you want to look up the documentation and see what fits your use case better. So let's get started on the CLI.

I've got my WSL terminal open on my Windows 10 terminal, which, you know, I've already installed the AWS CLI and configured it for use with my account. If you haven't done that yet, go ahead and take some time to do that now. I'm going to use AWS Kinesis list-streams to see what streams I have available. And it shows my kinesis-analytics-demo-stream, which I'll be going more into in just a moment, and the techsnips-demo stream. Let's go ahead and look at our technips-demo stream by using aws kinesis describe-stream and I'll pass a parameter of stream-name techsnips-demo.

Great, so I see I have some information on my shards and my ARN stream name, as well as some other metrics I can use here. Now keep that shard ID in mind, we will be going back to it later. For now, let's put a record into our stream. I'll use aws kinesis put-record and I'll pass the stream name again. Partition-key, which is basically the ID of the shard that we want to write to -- this is not something you have to worry about for the time being, so just put 1 -- and the data I want to write is HelloTechsnips. Great, I've got a confirmation that I've written to the shard, so now let's go ahead and get my record.

I have to make sure I have the sequence number, so let's go ahead and copy that down for now. And then, unfortunately, there's no way to get records from Kinesis directly from the stream. You have to access it from the shard. So what we're going to do, first get our shard iterator and that will let us know which shard we're accessing with our record in it. So if I use kinesis get-shard-iterator, I'll need to pass this three parameters. That is the stream name, the shard iterator type, which there are several of these. So, go ahead and check the documentation. In this case, since I know my sequence number, I'm going to put my shard iterator type as AT_SEQUENCE, and then I will include the starting-sequence-number and that will be my sequence number from the previous step. Next, I need my shard ID, and I also have that from the previous command. So let's go ahead and paste that in there and now we can run it.

All right, we've got our shard iterator now, let's go ahead and get the record that's in that shard. So this command is aws kinesis get-records, and then we're going to pass it the shard iterator.

Now we have our sequence number, the timestamp and the partition key. We don't recognize what's in Data though. This is not exactly what we wrote in there to begin with, but that's because part of Kinesis is it uses Base64 encoding for everything it writes. So if I go ahead and run an echo with this data and then pipe that into base64 and decode it, I can see HelloTechsnips. I don't know why it's at the beginning of my prompt here, but there it is, HelloTechsnips in our shard.

Now, let's go back to our other stream. So as you may remember, I have two streams in my demo here: techsnips-demo, which we used to describe how to put things into a stream using the CLI, and we also have the kinesis-analytics-demo-stream. So for the rest of the demo, I'm going to be working in the AWS Console and show you how to use the Data Analytics application to parse that stream, as it's already got stock ticker data streaming to it.

Okay, so I'm back in my console here, my AWS Kinesis console and under Data Analytics on the left selection bar where I can Create an application. Now, we'll call this application StreamParser, and what this is going to do is it's going to consume data from the stream and then aggregate it in a way where we can analyze it in real time. So let's go ahead and Connect our stream.

And the reason we're using the kinesis-analytics-demo-stream here instead of our techsnips-demo is because that is already set up by Amazon for you to be updated in real time. If I were going to have to do the same thing with techsnips-demo, I would need to write a whole application around that. So let's pick this one for now and we see we have some options with pre-processing some of these records with Lambda or changing our access. We don't really need that right now so let's go ahead and skip straight to Discover schema. There we go.

All right, so it's detected our schema, and it looks like we have a ticker symbol sector change in price. This is a stock ticker after all, and this is how it looks formatted in a SQL-like way. And if we go to Raw, we see what's actually in the stream row by row. So keep that in mind as you're writing your applications. Let's go ahead and Save and continue.

So that's set up. Let's go ahead and go to our SQL editor and I'll start the application, because it is a lot easier to run this way. This is going to take a few seconds to start up. So let's wait.

All right, our application started and we see the data coming in real time from our stream. Let's go ahead and scroll here. We see there are a lot of different sectors here, a lot of different organizations. We don't really care about all of them. So let's see if we can filter out just the technology sector. So if I go up here to the Real-time analytics tab, it gives me different in-application streams. These are defined from the SQL queries that you run here. So I've got one here that's going to create a new stream out of all of our data in the other stream and then filter that based on whether the sector has the word T-E-C-H in it. So let's go ahead and run that. Great. So now I have my new in-application stream called DESTINATION_SQL_STREAM. It's only got records coming in from our Kinesis stream with the sector-like tech or technology. All right, that's how we get started with Data Analytics on Kinesis. Thank you for watching.

+ Show Transcript