The following are our security recommendations to address security issues with Hadoop and NoSQL database clusters.

The last time we made recommendations we joked that many security tools broke Hadoop scalability; you’re cluster was secure because it was likely no one would use it. Fast forward four years and both commercial and open source technologies have advanced considerably, not only addressing threats you’re worried about, but were designed specifically for Hadoop. …

In fact, it’s because of the rapid technical advancements in the open source community that we have done an about-face on where to look for security capabilities. We are no longer focused on just 3rd party security tools, but largely the open source community, who helped close the major gaps in Hadoop security.

Our Recommendations

In the end, our big data security recommendations boil down to a handful of standard tools which can be effective in setting a secure baseline for Hadoop environments:

1. Use Kerberos for node authentication: We believed – at the outset of this project – that we would no longer recommend Kerberos. Implementation and deployment challenges with Kerberos suggested customers would go in a different direction. We were 100% wrong. Our research showed that adoption has increased considerably over the last 24 months, specifically in response to the enterprise distributions of Hadoop have streamlined the integration of Kerberos, making it reasonably easy to deploy.

Now, more than ever, Kerberos is being used as a cornerstone of cluster security. It remains effective for validating nodes and – for some – authenticating users. But other security controls piggy-back off Kerberos as well. Kerberos is one of the most effective security controls at our disposal, it’s built into the Hadoop infrastructure, and enterprise bundles make it accessible so we recommend you use it.

2. Use file layer encryption: Continue reading

Apache Hadoop logo