BACKGROUND IMAGE: iSTOCK/GETTY IMAGES
Successful machine learning involves several steps that can be repeated periodically to refine and update the model. Amazon Web Services defines four general steps to create a machine learning application.
1. Define the problem and answer. Decide what you want to see and how you want to see it. For example, a manufacturer might want to know how many products to make in the next quarter based on the number of sales over the past several quarters. After defining what you want to see, decide how to model the answer. For example, should you use a regression model or a binary classification model?
2. Prepare the data set to train the machine learning application. The data set should represent the answer you want. For instance, if you're creating a spam email filter, supply email messages that are spam and ones that aren't to help the machine learning applications learn differences between the two. For sales predictions, you'll need previous sales data. Failure predictions might require normal vs. abnormal performance data, error message logs and so on.
Additionally, you may need to clean and prepare data as a standard comma-separated variable (CSV) file. Drawing out specific variables from the data, such as spam sender addresses or specific error log messages, can help effectively train the machine learning application. Data preparation can be difficult and time-consuming -- particularly when first creating a model. Check the data to verify that it's accurate.
3. Deliver the process to the machine learning algorithm. AWS Machine Learning uses logistic regression for binary classification, multinomial logistic regression for multi-class classification and linear regression. Developers can also control training parameters, such as learning rate, regularization (i.e., normalization), the number of passes made through the training data and model size. Once the model is created, test it using additional test data not included in the training process. For the spam filter example, take a quantity of known-spam and known-good email messages (i.e., those not used in training) and see if the model distinguishes between them properly. If not, you may need to refine the data and model.
4. Use the proven model to make actual predictions in a live environment using real data. AWS supports bulk predictions such as reviewing customers to target the top candidates for a marketing campaign. AWS Machine Learning can also handle one-off predictions such as spotting spam or identifying a bogus transaction. Trends change over time, so reevaluate and retain models as needed.
AWS also provides a variety of tools to support model management. The AWS Management Console includes wizards such as the Create ML Model that separates training and evaluation data and the Create Evaluation wizard that tests a resulting model. In addition, AWS supplies APIs such as Predict API that can handle specific tasks for running one-off predictions and offers detailed documentation on APIs and other machine learning tools.
Test your big data analytics knowledge
Tricks to finding the value in big data
Understanding analytics as a service