Amazon Sagemaker makes machine learning accessible. Developers and data scientists can use it to build and deploy machine learning models on AWS without additional infrastructure management tasks.
Amazon SageMaker provides pre-built algorithms and support for open source Jupyter notebook instances to make it easier to get a machine learning model running in applications. In this Amazon SageMaker tutorial, we'll breakdown how to get a notebook instance up and running and how to train and validate your machine learning model. For this particular Amazon SageMaker demo, we'll use precurated marketing data of banking customers as an example dataset.
To get started, set up the necessary AWS Identity and Access Management (IAM) roles and permissions and then create a Jupyter notebook that will run Python code. Python code will do everything from harmonize and manipulate your dataset to create and manage AWS resources using the Boto3 library.
After you create the notebook, download the data to an S3 bucket, and then split the data for training and testing. In this case, use the majority of your data -- about 70% -- to train the model. Then, use the remaining 30% to test the validity of the model. Do not use the same data for training and testing, as that may compromise the reported effectiveness of the model.
With AWS, you can either bring your own models or use a prebuilt model with your own data. In this Amazon SageMaker tutorial, we are using the XGBoost model, a popular open source algorithm. You can set the parameters on the model to train data using Python code. The training and testing process will take some time, but once it's been tested, the model can be applied to any incoming data using other Amazon services, such as Amazon Kinesis or CloudWatch.
Once you have trained and tested the model, you can destroy the resources that you created using the Boto3 library. It's not recommended to do this in your production system, but if you're just testing the service, this is a crucial step to avoid paying for resources that you no longer need.
After you've watched the Amazon SageMaker tutorial above, find a use for your AWS machine learning capability, such as product forecasting, fraud protections, marketing predictions, data analysis and more.
Transcript - Get started with machine learning in this Amazon SageMaker tutorial
Hello, today we're going to learn how to get started with AWS SageMaker. I'm here in the SageMaker dashboard, and the first thing I want to do just for the demo today is go ahead and open up a notebook instance. These are Jupyter notebooks that are geared towards machine learning algorithms. Let's go ahead and create a new notebook instance. And we'll call this one TechsnipsDemo.
Great, and if we scroll down a little bit, we see we have the ability to create a new IAM role or use an existing one. I'm going to use an existing one, but if you need to make a new one, we can also go to create a new role. The purpose of this role is to give our SageMaker instance access to an S3 bucket, so whether it's a specific bucket or any bucket, this is where you'd set that in the role. Because I already have my role defined, I'm going to go ahead and use that one. And I'm fine with all of my other options. So let's go ahead and create it. It will take some time for this to provision, so let's wait for that to finish up.
All right, it looks like our instance has been created. Let's go ahead and open it up. The first thing I'm going to do with this empty notebook is I'm going to create a notebook using conda_python3. This is a Python3 notebook that we can use to run our code. I've got my prompt here. And the first thing I'm going to do is go ahead and import some Python libraries.
There's two things I want to draw your attention to. The first is when I click run the In brackets on the top left side that's going to put a star in there. There's going to be some text printed at the bottom of the screen. When the code is finished running, the star on the top left is going to turn into a number. And that's the step number that's finished running. So let's go ahead and watch that happen.
Now that we know our code is finished running, let's go ahead and do a few more things here. I'm going to create a bucket that I will then use to continue the rest of the demo.
For full disclosure, we are using a predefined data set here. If you look at this URL, this is some bank data that we're going to use to determine whether bank customers are interested in taking out a certificate of deposit. This is precurated data, just for demo purposes. But we're also loading that into our S3 bucket. Next, we need to download the data from that bucket into our SageMaker instance.
The next thing we need to do is separate our training data from our testing data. As far as machine learning goes, you want to train the model on a certain set of data and then validate that that model is running successfully on the second set of data. So the split is about 70/30, usually. In this case, it's going to be 70/30. We want to train our model on 70% of our data and test it on the remaining 30%. So let's go ahead and separate our data. You see one of those is about two-thirds and the other is about one-third of the whole. Now next, we're going to use a prebuilt model in SageMaker called XGBoost. So we need to, before continuing with that, reformat the header and columns of some of the training data and load it from the S3 bucket again.
All right, now we need to actually create an instance of this model, and then define the parameters. Now that the parameters have been defined and the data is loaded into our model, we need to go ahead and train it. This is going to take some time. So let's just wait for this to finish.
Okay, and we can see the job is finished. So let's scroll down to the bottom of this here. There's a few log outputs in this. If we go down to the bottom, it says the training job is complete. And it tells us how long it took just about to train the model. So next we need to deploy that model onto a server and create the endpoint we can access to test it. This is also going to take a little bit of time, because we're actually deploying a new server for this.
Okay, our model is deployed. Let's go ahead and run the test data against it to see what we can find. And now that that's been run, let's print some of the results from it in a way that we can read.
All right, excellent. So it looks like we do have a prediction rate, it looks like our model has done what it's supposed to do. And now we're all done. So let's go ahead and clean this up by deleting our resources and buckets, so we're not paying for it anymore.
All right, that's it. Thank you for watching.