Amazon preps AWS Elasticsearch to ease EC2 integration

Elasticsearch is an increasingly popular search engine that can be tricky to integrate onto EC2. AWS plans to solve that with its Elasticsearch service.

Amazon Web Services is preparing to launch an Elasticsearch service next month that will make running the open-source search engine on the Elastic Compute Cloud easier for users.

AWS already offers a search engine as a service called CloudSearch, based on the Apache Solr search engine management platform. Both CloudSearch and Elasticsearch are based on the text-processing engine, but Elasticsearch has eclipsed CloudSearch and Solr in popularity in recent years, according to industry sources.

"CloudSearch hasn't been widely adopted," said one tech industry CTO whose product uses Elasticsearch. Solr has been "plodding along" as an open-source project, the CTO said, while Elasticsearch has been growing since its debut in 2010.

Elasticsearch has seen nearly 10 million downloads since 2012, and some 5 million in 2014 alone, according to Elasticsearch Inc. officials in a webinar that aired last December.

The Java-based Elasticsearch is often used with big data applications, industry sources say. It's built for horizontal scalability and comes with a data ingestion engine called Logstash and a data visualization tool called Kibana. Together these form a composite service known as ELK. Elasticsearch is commonly used with open-source data ingestion plugins such as Kafka, while CloudSearch requires some pre-processing to put data into JSON or XML format.

Elasticsearch offers direct access through RESTful APIs for application programmers, while direct API access in CloudSearch requires users to generate cryptographic signatures, according to AWS documentation, and users are encouraged to interface with CloudSearch through a software development kit (SDK) furnished by AWS.

"If AWS were to release a managed Elasticsearch service, it would definitely be of interest to us," the CTO said, comparing it to supporting both MySQL and Postgres databases in the Relational Database Service (RDS), which manages and optimizes the underlying infrastructure for those databases the same way an AWS Elasticsearch service would.

This is key, say Elasticsearch users, because integrating Elasticsearch onto the Elastic Compute Cloud (EC2) platform can be difficult. ELK clusters can suffer memory shortages and leaks, according to a director of IT for an enterprise that uses Elasticsearch on AWS and has been briefed on the upcoming service.

"AWS has assigned people who work very hard to solve these issues," the director said. "I'd like to see a services-based approach, because we don’t want to get involved in that kind of troubleshooting."

Maintaining ELK cluster health also involves some fairly advanced data storage management to ensure that the best performance is achieved from underlying system disks. On the Elasticsearch Inc. webinar, much of the broadcast time was devoted to choosing the correct underlying storage for ELK clusters, as well as advice on how to modify I/O schedulers to get the best performance.

It’s also important not to saturate back-end disks with I/O since some overhead is needed for data movement around the cluster and data recovery operations. None of these things would be a worry for users any longer with AWS managing the underlying clusters as part of a managed service.

"I wouldn't be surprised to see this kind of offering," said Dan Sullivan, a consultant with DS Applied Technologies, located in Portland, Ore, who did not have any direct knowledge of the upcoming service, but said it would make sense. "Elasticsearch is growing in popularity … and [an AWS service] would be something a lot of people would be interested in."

AWS declined to comment for this story.

Beth Pariseau is senior news writer for SearchAWS. Write to her at [email protected] or follow @PariseauTT on Twitter.  

Dig Deeper on AWS instances strategy and setup