In this blogpost, we are going to take a look at some of the OpDB related security features of a CDP Private Cloud Base deployment. We are going to talk about auditing, different security levels, security features of Data Catalog, and Client Considerations. You can find part 1 of this series, here.
Comprehensive auditing is provided to enable enterprises to effectively and efficiently meet their compliance requirements by auditing access and other types of operations across OpDB (through HBase).
Access audits are mastered centrally in Apache Ranger which provides comprehensive non-repudiable audit log for every access event to every resource with rich access event metadata such as:
- User, business classification of asset accessed
- Policy outcome (access or deny)
- Specific access policy that granted or blocked access
- Actual query run
Audit framework scales to very high volumes (billions of audit events per day) and it is indexed and searchable using the intuitive UI within Ranger.
Ranger supports Read, Write, Create, and Admin access controls which means it audits read, write, create and admin operations.
Cloudera’s platform can support piping of audit data to HDFS, Kafka, Syslog or to SIEM systems for long-term retention and archival. For access audits the most recent 90-days worth of audits is indexed and can be accessed in the Apache Ranger UI.
The system provides all of the necessary information and components required to audit security, but the security assessment process is not fully automated. Our platform provides a rich set of APIs that enable audits automation and gaps identification.
For example, the Data Catalog service can provide indirect summaries of how effective the security policies are by summarizing access audits to determine how many accesses for a particular asset was allowed and how many were blocked due to security policies. It also summarizes the specific policies that exist to secure a particular asset. In this manner, the security posture of specific assets can be assessed across the platform.
Database object security
Database object-level security is available through the centralized authorization framework of Apache Ranger.
Both fine-grained access control of database objects and access to metadata is provided. Protected database objects include: database, table, column, view and User Defined Functions (UDFs).
Fine-grained access control for special administrative operations that can be performed on OpDBMS is also supported.
Cloudera’s platform can provide the ability to implement row-level security in two ways:
- Through views access to specific rows or more typically ranges based on the condition specified in the view can be restricted.
- Apache Ranger fine-grained policies enable dynamic row filtering through SQL query compile time when SQL based relational constructs are used on OpDB (Hive on HBase).
Dynamic row filtering inserts additional filtering and other SQL predicates at compile time to filter out the rows at query time which means the query result sets are not rewritten and performance is provided at a very high scale.
For example, a dynamic row filter can be written to filter rows of data in a customer table to restrict the query results to only include rows that belong to EU objects customers that have given explicit consent to use their data for specific purposes (such as marketing or loyalty programs) and such consent has not expired.
Since this filter predicate is inserted dynamically, the end-user does not have to know about these restrictions. They will only be shown the rows in compliance with these data consent requirements as required by recent regulations such as GDPR or CCPA.
Cloudera’s platform provides column-level security through Apache Ranger and Apache Atlas. Restriction of access is supported both at columns and column families levels, in addition to table, namespace, global and cell level.
Permissions include: admin, create, write, read, and execute.
Classification based security is also supported. For example, you can tag columns or column families as having PII and providing conditional access to users and groups based on such classifications.
Metadata security is also provided to prevent users from viewing or updating specific metadata about such objects.
Rich user groups are provided in the security model which supports grouping by functional departments or other organizations.
In addition, users and groups can be synced from enterprise user directories (LDAP, AD, and others) and those mappings can be used to dynamically control access at a fine-grained level to various resources in the OpDB using Apache Ranger.
Cloudera’s OpDB offers multitenancy. Namespaces can be used in multi-tenant environments where individual tenants can create tables in their respective namespaces.
Quotas can be configured at namespace and table level to prevent noisy neighbor problems. Tenants can also be restricted to subsets of the cluster, or specified groups of tenants can be created which share subsets of the hardware isolating them from other groups of tenants.
The security model allows to create tenant admins which have restricted access compared to the global admins for the system.
Data Catalog ) is a service within CDP’s Shared Data Experience (SDX). It enables you to understand, manage, secure, and govern data assets across enterprise data clouds. Data Catalog helps you understand data across multiple clusters and across multiple environments (on-premises, cloud, and IOT).
Activity monitoring is supported through a combination of audit data aggregation and summarization in Data Catalog. It can aggregate and summarize access patterns from multiple data lakes. From the profiled data summaries of access patterns, one can put in place security policies using Apache Ranger to detect and handle any problematic access.
In addition, Data Catalog can also provide summarized views of access patterns for sensitive data types across multiple data lakes.
Sensitive data identification
Data Catalog can profile a variety of sensitive data types (such as personal data types for GDPR, CCPA) and identify such sensitive data across relational assets sitting across multiple data lakes.
Data Catalog also provides functionality to configure policies to protect sensitive data from unauthorized access using classification-based policy features in Apache Ranger.
Authentication between HBase and ZooKeeper is configured automatically. HBase Thrift gateway support impersonation out of the box. HBase REST service uses Simple authentication by default, but it can be configured for Kerberos.
Java Client applications accessing a secure HBase cluster using HBase Java Client API must authenticate themselves against same security domain for HBase with one of the following approaches:
- The user running the client application must have acquired kerberos credentials prior to launching the application. For example, considering an application called HBaseSecureClientAccess, the following actions would allow it to successfully access HBase:
$ kinit -kt mykeytab myuser $ java -cp $(hbase classpath):hbase-secure-client.jar com.cloudera.hbase.client.HBaseSecureClientAccess
- The application can do it programatically, using Hadoop Security UserGroupInformation API within a keytab file to authenticate. The following code snippet shows how to acquire credentials programmatically, before creating HBase client connection:
Configuration config = HBaseConfiguration.create(); … UserGroupInformation.setConfiguration(config); UserGroupInformation ugi = UserGroupInformation.loginUserFromKeytab("myuser", "mykeytab"); … Connection connection = ConnectionFactory.createConnection(config) Table table = connection.getTable(TableName.valueOf("mytable")); Get get = new Get(Bytes.toBytes("my_rowkey")); Result r = table.get(get); …
- Preferred method: Defining extra configurations on the client hbase-site.xml configuration file for transparent authentication:
<property> <name>hbase.client.keytab.file</name> <value>/local/path/to/client/keytab</value> </property> <property> <name>hbase.client.keytab.principal</name> <value>[email protected]</value> </property>
This method is preferred because it discards the need for additional implementation of credentials renewal logic by the client application.
This was Part 2 of the Operational Database Security blogpost. We looked at various security features and capabilities that Cloudera’s OpDB provides.
For more information about the security-related features and capabilities of Cloudera’s OpDB read Part 1 of this blogpost: Operational Database – Security Part 1
For more information about Cloudera’s Operational Database offering, see Cloudera Operational Database.