For Your Eyes Only: Using Vertica Access Policies

Posted April 8, 2015 by Sarah Lemaire, Manager, Vertica Documentation

For Your Eyes Only: Using Vertica Access Policies

Everyone knows there’’s virtually no limit to the amount of data you can store, manage, and query in Vertica. But you may not be aware of how easy it is to control who sees what in your database. In this blog, Alexey Kireyev, the lead security developer for Vertica, explains how you can use the new Access Policies feature to manage your private and sensitive data.

Let’s say you spend months setting up your database, creating tables, projections, and other good stuff. Now, let’s say you have a table like this:

CREATE TABLE customers (ssn VARCHAR(11), name VARCHAR(255), dob DATETIME);

And it has some potentially sensitive customer data, like Social Security numbers and birthdates:

SELECT * FROM customers;
ssn     |    name    |         dob
------------+------------+---------------------
123-45-6789 | Bill Smith | 1960-01-01 00:00:00

Obviously, it’s not good that everybody and their cat can see this private data. And, you may also need to make sure your database is compliant with things like HIPPA, PCI, and SOX. So, what can you do? Well, you can create views that contain only the information you want certain employees to see; and you can grant permissions on those views to different roles. Like this:

CREATE VIEW customers_data_for_operators AS SELECT SUBSTR(ssn, 8, 4) AS ssn, name FROM customers;
CREATE VIEW customers_data_for_managers AS SELECT ssn, name, dob FROM customers;
REVOKE ALL ON customers FROM managers;
REVOKE ALL ON customers FROM operators;
GRANT ALL ON customers_data_for_managers TO managers;
GRANT ALL ON customers_data_for_operators TO operators;

Hey! That wasn’’t hard, right? Now, you can verify that users with the managers role use the customers_data_for_managers view, users with the operators role use customers_data_for_operators, and so on. Managers can see everything, and operators can see only some data. And access to the original table is restricted to dbadmin.
c - super_manager;
SET ROLE managers;
SELECT * FROM customers;
ERROR 4367:  Permission denied for relation CUSTOMERS
SELECT * FROM customers_data_for_managers;
SSN     |    NAME    |         DOB
-------------+------------+---------------------
123-45-6789 | Bill Smith | 1960-01-01 00:00:00
c - just_an_operator;
SET ROLE operators;
SELECT * FROM customers;
ERROR 4367:  Permission denied for relation CUSTOMERS
SELECT * FROM customers_data_for_operators;
SSN  |    NAME
------+------------
6789 | Bill Smith

If there’s another role that needs to see a different set of data, just create another view, and another, and so on. Yes, like this:

CREATE VIEW customers_data_for_analysts AS SELECT name, dob FROM customers;
GRANT ALL ON customers_data_for_analysts TO analysts;

But, wait! After you created these views, it looks like your client application doesn’’t work. Why is that? Well, your application has no idea about all these views, and it cannot access the original table anymore. It’s probably not a big deal. Someone just needs to update the application. You do have source code and that someone, right? But, later on, what if there is a new role and new view? Yep, you need to update the application again and again. And what if there are more tables like this——say, 10 tables and 4 different roles? You end up with 40 views.

Well, this is not fun anymore.

Is there a better way?

Of course!

Introducing Access Policies

Vertica 7.1.1 introduced a new feature called access policies. Access policies allow the Vertica server to take full control of what a user or role can or cannot see in each table. An application can use the same queries for all users, but will get different results from the Vertica server depending on the access policy defined on the table:

c - analyst_user
SELECT * FROM customers;
ssn  |    name    |         dob
------+------------+---------------------
| Bill Smith | 1960-01-01 00:00:00
c - just_an_operator
SELECT * FROM customers;
ssn  |    name    |         dob
------+------------+---------------------
6789 | Bill Smith | 1960-01-01 00:00:00
c - super_manager
SELECT * FROM customers;
ssn     |    name    |         dob
------------+------------+---------------------
123-45-6789 | Bill Smith | 1960-01-01 00:00:00

With access policies, there’s no need for views and no additional work needed to make sure the application is aware of users or roles. Everything is taken care of on the server itself.

How to Create an Access Policy

Now, how do you create an access policy? Like so:
CREATE ACCESS POLICY ON customers FOR COLUMN SSN
CASE WHEN ENABLED_ROLE('managers') THEN ssn
WHEN ENABLED_ROLE('operators') THEN SUBSTR(ssn, 8, 4)
ELSE NULL
END
ENABLE;

Currently, columns can only have one access policy, but each column in the table can have its own access policy. To create an access policy on customers for the name column:

  CASE
WHEN ENABLED_ROLE('analysts') THEN NULL
ELSE name<
END
ENABLE;

Access policies can be modified, like so:

  ALTER ACCESS POLICY ON customers FOR COLUMN name
CASE
WHEN ENABLED_ROLE('managers') THEN name
WHEN ENABLED_ROLE('operators') THEN name
WHEN ENABLED_ROLE('analysts') THEN 'Dummy Name'
ELSE NULL
END
ENABLE;

Access policies can also be disabled, like so:

  ALTER ACCESS POLICY ON customers FOR COLUMN name DISABLE;

ENABLED_ROLE is a new runtime function. It returns true when a role is set for the user, either by default or manually.

How Does This Work? And What About Performance?

Internally, access policies are implemented by query rewriting——the original query is modified with the access policy expression.

Let’s say you have a table t1 with two columns a and b and an access policy for column a:

CREATE TABLE t1 (a VARCHAR, b INT);
CREATE ACCESS POLICY ON t1 FOR COLUMN a CASE WHEN ENABLED_ROLE('R') THEN SUBSTR(a, 4, 8) ELSE a END;

Let’s say the running query is:

SELECT * FROM t1;

With an access policy, this query becomes:

SELECT * FROM (SELECT CASE WHEN ENABLED_ROLE('R') THEN SUBSTR(a, 4, 8) ELSE a END, b FROM t1) t1;

All this happens during query planning at the same time when views are parsed——right after query parsing, but before optimizing. As such, no real performance changes are expected, and the optimizer will work very, very hard to return this rewritten query as fast as possible.

Also, the use of stable functions (values that don’’t change during a query) in an access policy expression will additionally help to ensure the best performance.

The optimizer constant folds stable functions. In our example, ENABLED_ROLE is a stable function, so based on the current user’s role, the optimizer rewrites the query as follows:

For a user who belongs to role R:

SELECT * FROM (SELECT SUBSTR (a, 4, 8), b FROM t1) t1;

For all other users:

SELECT * FROM (SELECT a, b FROM t1) t1;

Yes, if you try really hard, you can kill the performance of your queries. Like putting something nonlinear, and not optimizable into an access policy expression. So, please, don’t do this.

If you’’re considering using Access Policies, keep in mind that they are not supported on all schemas. You can find out more about these limitations, as well as more detailed information about Access Policies in the Vertica documentation, available at www.vertica.com/documentation. We hope you’’ll find this feature as useful as we think it is… …and we hope to make it better and better in future releases.