AWS chief data scientist talks cloud pricing, big data trends

AWS' data science chief addresses burning questions for enterprises considering cloud, covering cloud price wars, federation, big data and more.

SAN FRANCISCO -- Amazon Web Services has made frequent changes to its cloud in recent years, from price cuts to tighter integration with private data centers, to accommodate the enterprise market and branch out beyond its original clientele of developers and startups.

Matt WoodMatt Wood

Among the executives leading the charge is Matt Wood, general manager of data science for Amazon Web Services (AWS). SearchCloudComputing caught up with him at the AWS Summit here this week to discuss hot-button issues for enterprise customers. Here is what he had to say about cloud pricing, cloud federation, compliance and data localization.

Cloud pricing is a hot topic right now, as Google and Amazon both cut prices steeply this week. Exactly how low can prices go before everybody's offering everything for free?

Matt Wood: We've always known that, a little bit like our retail side of the business, cloud computing is a high-volume, low-margin game, and it's a business model that we are very, very comfortable with.

If you look back over the past eight years, we've reduced prices 42 times, without any real competitive pressure to do so. Reducing prices is just part of what we do -- it's part of the pulse of the organization, and we have this virtuous circle … where we have more customers that adopt the platform, and they drive more use, and because of that we get to go out and do custom deals with our vendors, and we get to go out and take advantage of that economy of scale, and that results in cost savings for us. We could just pocket that as profit. That's a perfectly reasonable thing to do. But we choose to pass those savings back to customers. … We just keep doing these things, and you can expect us to keep doing them.

Your particular area of focus is data science and big data analytics. Are you seeing any new trends in that area?

Wood: One of the biggest trends is the augmentation -- not the replacement, but the augmentation -- of traditional business intelligence reporting with more real-time services. … Being able to use the two together is very empowering.

[Finnish gaming company] Supercell [is] a really good example of that. … They run popular mobile games, like Clash of Clans, and they've got 8 million people playing this in one day, on iOS alone. Ideally, if you're a gaming company, you want to capture as much of that value as possible. You want to know how people are interacting with the game world. You want to know how your in-game economies are performing. You want to know who's buying what and who's talking to whom and what the drop-out is at a particular level, if it's too hard, so you can take that information and improve the game.

One of the biggest trends is the augmentation -- not the replacement, but the augmentation -- of traditional business intelligence reporting with more real-time services.

Matt Wood,
general manager of data science, AWS

Some of that is done by collecting all of the data. There's no limit to what you collect … with Amazon Kinesis, which is a real-time managed streaming service -- it's just a big pipe, where you can throw your data in and connect up sensors onto that which sample at different rates, and you can do different things with the same data stream.

Amazon describes itself as 'customer-obsessed.' What kinds of features and services are customers asking you for right now?

Wood: They're asking for things like, 'Can it be easier to access high-value, public data sets?' That's a request we get a lot. There's a lot of data out there. … We spend a lot of time identifying and working with publicly available data and making it as easy as possible to use.

A good example of that would be the Common Crawl, which is a routinely updated, very large set of the Web -- it's every page on the Web, downloaded and pre-computed and put into an index, which makes it very easy to run Hadoop on. You don't have to go off and do all the spidering and crawling yourself -- someone's done that. You don't have to go to the raw materials to pre-compute all the tags and remove the HTML and all those sort of things -- that's been done for you as well.

What you get is data in a format that is very easy to use in a distributed way. … You can start querying billions of webpages in less than 10 minutes from a cold start. We take that data and store that and host it for free, because there's benefit to it for the community, and then we make sure it follows all the best practices in terms of [Simple Storage Service (S3)] access, so that is very easy to fire up big Hadoop clusters and run queries against them.

Some customers that we've been talking with see a future of cloud federation -- how does Amazon view that?

Wood: That's not something that we hear routinely from customers at the moment, but I'm not saying it won't be important in the future.

What we do hear today is that customers at some larger organizations have typically already made a large investment in an infrastructure. They've got a footprint already. When we talk to those guys, we try to guide them toward the idea that there's not a choice, where you have to run everything on-premises or you have to run everything on AWS. …

We've spent a lot of time over the last 18 months building out integration points, making it much easier for customers to run workloads where it makes sense. We've built out direct connectivity between their data centers and ours, we've got private storage options, we've got private compute options, we've got identity federation options and things like WorkSpaces, which integrate with Active Directory on the back end. … All those integration points help customers make the right choice for their workload.

Some customers need to keep data in particular regions for compliance reasons -- is Amazon able to sign legal agreements guaranteeing that customers' data won't leave a certain availability zone or a certain region?

Wood: You actually can't use our platform at all without choosing where your data resides. Customers have to make a conscious decision about where their data will reside on a regional basis. We have these regions, and inside each region are multiple availability zones, and availability zones have data centers inside them. With a service like S3, for example, we will mirror data across availability zones, but what we don't do is mirror data between regions.

Beth Pariseau is senior news writer for SearchCloudComputing. Write to her at bpariseau@techtarget.com or follow @PariseauTT on Twitter.

Dig deeper on AWS big data and data analytics

Pro+

Features

Enjoy the benefits of Pro+ membership, learn more and join.

Related Discussions

Beth Pariseau, Senior News Writer asks:

What feature do you want to see from AWS?

1  Response So Far

Join the Discussion

0 comments

Oldest 

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to:

-ADS BY GOOGLE

SearchCloudApplications

SearchSOA

TheServerSide

SearchSoftwareQuality

SearchCloudComputing

Close