Here is a step into some of the plumbing involved in Big Data Analytics solutions. When architecting Big Data Analytics solutions, you cannot ignore the production and operational requirements of such a solution. Getting your analytics to work against Big Data sources in a proof of concept, trial or laptop environment is a great first step and a very important phase of your project. This will help you to get the executive-level buy-in that you need to get the funding and approval to move forward with your project.
Now, as you move into operationalizing a production-ready version of your BDA solution for your enterprise, you will find a mixed bag of available security solutions to harden the data layer of a Big Data Analytics solution. Here is a look at some of the options:
- Hadoop as a source
This is (my opinion) the most complicated aspect of your BDA solution. While you may be providing security on reports, analytical models and portals, to secure data in Hadoop, the traditional method of relying on Linux file permissions may be insufficient for your enterprise IT requirements, auditing and standards. Here are 3 options to look at here:
a. Secure the data stored in HDFS with Kerberos or a 3rd party security provider such as Voltage Security or Protegrity.
b. If using Cloudera, they are now offering their own security on Hadoop called Sentry that provides authority, authorization and compliance with regulations including SOX, HIPAA, PCI.
c. If using Hortonworks, they are steering users toward the Apache project Knox: http://hortonworks.com/blog/introducing-knox-hadoop-security/. Knox is a gateway that sits between your client accessing the data in HDFS and your Hadoop cluster.
- MPP databases as source
In this case, if your organization has invested in MPP databases, when you store your data in those databases, you enjoy the added benefit of data security, auditing, auth, etc. from the database layer (Teradata, PDW, Vertica, etc.)
- OLAP as source
If your analytics will be built from an OLAP engine (SAS, SSAS, Mondrian, etc.) then you can secure the data at this layer with ACLs and roles. However, if you do allow detailed reporting off of the source data, below the OLAP layer in your solution, then you still need to secure the data layer for those BDA solutions.