One of the big drawbacks to being an organization in a highly-regulated industry like healthcare is the fact that the IT team is never going to be on the leading edge of technology adoption. Given the number of rules, regulations and privacy constraints, there is no way an organization that manages patient data and health records will ever lead the way by adopting an Apache project that is only in incubation, let alone installing software that has a track record of only six to eight months. The risk is too great, and the potential penalties that regulatory bodies can hand down when something goes wrong are just too high. In the healthcare industry, it is especially prudent to wait for all of the unknown bugs to shake themselves out of a given piece of software before integrating it into the architecture.
But of course, there does come a time when the value proposition for adopting new software becomes so large that pushing forward into new areas of computing begins to make sense. According to Vin Sharma, a big data analytics strategist at Intel, that time is now. Between producers, payers, providers and regulators, there are massive savings to be seen by properly applying big data techniques to the healthcare sector. "We know from various studies that there is a $450 billion dollar savings that is available from the use of big data analytics in the healthcare industry alone" said Vin Sharma.
Herding the Hadoop cats
Now when Vin Sharma talks about bringing big data analytics to the healthcare industry, he's talking primarily about the Hadoop family of products, which include things like Hive, Pig, MapReduce and a variety of popular Apache products. And while the open source community is likely pretty proud of the number of successful projects that have been created that can be mixed and matched together to create highly customized solutions for big data problems, from the standpoint of polishing up Hadoop to make sure it can support a variety of compliance rules, the number of projects upon which Hadoop depends actually becomes a problem.
"There are a lot of inter-operating components. They are loosely coupled. Each project has a team that focuses on their requirements and goes off in their quasi-independent direction," said Sharma. And that means that there is no common security mechanism, no common encryption technology and no common access control framework among all of the different projects that make up the Hadoop ecosystem. But Intel is intent on changing all of that.
Through the open source Project Rhino, Intel is working on bringing encryption, token-based authentication, granular access control and enhanced auditing to the Hadoop and HBase family of products. It's not an insignificant challenge, but the belief is that the goal is achievable. Let's hope it is, because an effective solution for processing sensitive and legally protected data has been missing from the industry for far too long.