freshidea - Fotolia


Amazon Machine Learning platform packs punch but lacks tooling

AWS' machine learning service eases the onramp for developers new to AI, but tools from cloud competitors like Google and Azure offer more advanced deep learning capabilities.

Machine learning tools enable organizations to gather insights and create new services -- making use of mountains...

of data. This algorithm-driven technology is one of the most consequential trends in the cloud computing market, but enterprises need to know how best to use the Amazon Machine Learning platform and similar cloud offerings versus other artificial intelligence technologies.

Artificial intelligence (AI) refers to a variety of approaches to self-learning and self-optimizing algorithms; machine learning specifically denotes a class of statistical methods that glean patterns and trends from raw data. In contrast, deep learning deals with neural network models that are analogous to synaptic interconnects in the brain, which developers can train to identify specific classes of information, such as objects in a photograph and words in digitized speech.

Amazon Machine Learning is strictly the former: a service that helps developers build predictive applications using historical data to automatically create and train statistical models.

These sophisticated mathematical methods provided a roadblock to broader use, as only statisticians and data scientists could make sense of the process. Developers and business managers, on the other hand, often were flummoxed. This made it difficult to select a technique for a particular problem or type of data. Amazon's Machine Learning platform is built for developers -- not statisticians or AI experts -- and focuses on a few popular scenarios and associated models to remove technical barriers. Developers commonly use the service to build product recommendation engines based on past purchases or fraud-detection systems based on the pattern of current and prior transactions.

It takes several steps to build a machine learning model, starting with the selection of an overall optimization strategy.

Based on Amazon's internal predictive systems, the platform is packaged in such a way that makes the models relatively easy to configure and use. Furthermore, the Amazon Machine Learning platform can work with other Amazon cloud services to build complex application workflows and create data-based models. AWS requires enterprises to store data intended for use with the Machine Learning service in a Simple Storage Service bucket. Developers can move an Amazon Machine Learning model from training and testing in one click.

But for all of Amazon Machine Learning's benefits, the service still requires an understanding of possible problems within the environment and planning on how to fix those problems. And the service has some notable limitations that can preclude its use in certain situations.

Problem one: Model creation, evaluation and testing

It takes several steps to build a machine learning model, starting with the selection of an overall optimization strategy. This requires an understanding of available training data and the questions you want the system to answer.

The Amazon Machine Learning platform handles three classification models: binary, multiclass and regression. An IT team uses a binary classification problem, for example, to build a product satisfaction and refund model in which the fundamental question is if the customer will likely return a particular item. The answer is a simple "yes" or "no."

In the multiclass model, an IT team might want the system to display product recommendations based on a customer's prior purchase, such as a question that can have many possible answers. Developers can work on this model to present answers to the customer in the order of the likelihood that they will buy the item.

Machine learning methods applied to big data

An upsurge in machine learning methods in recent years is riding the big data wave. This podcast breaks the trends into essential component parts.

Numerical predictions incorporate past data in a regression model, such as the demand for winter parkas over Thanksgiving weekend when it's expected to snow, or the amount of ice cream a restaurant might sell when it's over 90 degrees on a Saturday in July.

The service, however, sacrifices more detailed functionality for the sake of simplicity and uses a set optimization technique for each model. It also lacks flexibility in model construction that other cloud machine learning services offer. For example, Azure Machine Learning has more than two dozen models that cover five different problem areas; Google Cloud Machine Learning Engine supports custom-built models using TensorFlow. Enterprises with experienced data scientists might need wider model selection and customization than what AWS provides.

Problem two: Model tuning

Although the Amazon Machine Learning platform is designed for novices, it isn't foolproof. IT teams can struggle with the next phase of machine learning modeling. After choosing the type of analysis to perform and the associated model, follow these steps to develop the model:

  • Prepare the dataset by transforming it into a CSV file and create a data schema.
  • Create and train the machine learning model.
  • Test and evaluate the model. By default, AWS splits the data set into 70% for training and 30% for evaluation, but IT teams can customize those percentages.

AWS doesn't provide automatic tools to prevent overfitting the data to a training set, particularly one with noisy inputs that may not be representative of all future data. To offset this, select a smaller random subset of the total data available for training and tweaking model parameters using data visualization and multiple trials.

Aside from data pruning, parameter selection is the other key element of model optimization. Amazon Machine Learning has default training parameters for the target model size, number of passes over the data and type, and the amount of regularization you apply to the model. Experienced professionals can override these parameters, but machine learning novices should account for some trial and error if they plan to customize parameters. Amazon Machine Learning provides statistical visualization tools that include a Model Insights feature, to assist with model evaluation and tuning.

In comparison, Azure offers a hyper-parameter tuning module that automatically sweeps through a range of values. And Google Cloud Machine Learning Engine recently introduced an automated hyper-parameter tuning feature that runs multiple trials in a single training job and optimizes them according to metrics the user sets.

Problem three: All data must be prepared and reside in AWS

All cloud machine learning services require IT teams to upload both the initial training and subsequent predictive data. Enterprises that want to use Amazon Machine Learning on internally hosted data, such as a financial transaction database, network security logs, or manufacturing process data, must migrate that data to AWS and prepare it for the service. Facilitate data transfer with AWS Direct Connect or another cloud cross-connect service like the AWS Storage Gateway.

Teams can extract and transform data either on premises or in the cloud, and AWS offers tools to streamline the process. Amazon Machine Learning requires data to be in CSV format with variable names in a header line and a schema that describes data types. The service also provides a Data Insights feature that assists IT pros with selecting a training data set; it calculates descriptive statistics and creates visualizations.

Amazon Machine Learning can apply recipes to transform data before using it to create a model with an automatic default selection built based on the raw input. But IT teams might have to select or transform raw data to make it usable in a model. For example, transaction log files might include the customer address as a string, but constituent parts, such as street address, city, state and zip, are more useful.

While Amazon Machine Learning makes powerful data science algorithms accessible to non-specialists, its lack of flexibility limits those with statistical and machine learning expertise -- or customers with highly specialized needs. For these IT professionals, commercial products like SAS and SPSS and open source frameworks like Apache Spark MLlib, TensorFlow and Theano -- all of which run on either Linux or Windows -- might be a better fit.

Next Steps

Get started with the Amazon Machine Learning service

Amazon Machine Learning competes directly with Microsoft Azure

Create machine learning applications on AWS

Dig Deeper on AWS machine learning