Introduction to entitlements engines
Modern Analytics Data Platforms should support their users with seamless and robust access to data. Impediments and bottlenecks occurring at any stage of producing, delivering, processing, and provisioning data may negatively impact the overall end-user experience.
This is not a hypothetical scenario, especially when we look at heavily regulated environments such as financial and administration institutions, where users sometimes struggle for weeks to complete all approvals and get access to required datasets.
To efficiently manage access, one must recognize an obvious conflict of interest between cybersecurity officers and business users. A security team is focused on ensuring that data is not used in an inappropriate way or by individuals not entitled to use it. In their ideal world, data should be ring-fenced with no access to it. Such a scenario won’t be acceptable to business users as they would like to have data most useful and accessible.
To find the optimal compromise between appropriate security measures and easy access to data, we must take into account all obligations and regulations, collect necessary information about our data, about end users and their needs, and finally implement a solution that will work at speed and scale.
In this article, I will introduce topics necessary when you think of implementing an entitlement engine, sometimes also called an Entitlement Management System (EMS).
Initial considerations
I intentionally avoid referring to any specific vendor that provides entitlement management solutions, leaving vendor selection decisions to your own judgment. My intention is to empower you with knowledge that will help you take into consideration all key factors valid in a successful EMS adoption.
Let’s start by introducing example obligations to realize what challenges we might want to solve with the entitlements management system.
Personal data
When an organization deals with the personal data of their clients, employees, and third parties, such data must be treated in a special way.
Let’s imagine a massive dataset with hundreds of columns, where 5 of them contain personal and sensitive data.
- Shall we restrict access to the entire dataset just because of these 5 columns?
- Maybe detaching restricted data into another dataset would be the case?
- Are we able to manage user access at the column level?
- Do we have our data tagged or described at the column level?
Cross-border data transfers
When an organization operates in multiple markets, or there are overseas teams working on data, such data is subject to regulations for cross-border data transfers.
- Do we know which country and jurisdiction the specific records come from?
- Do we know what countries residents can access, process, or store data coming from another country?
- Are we able to codify such knowledge?
- Can we restrict access to specific rows when data from multiple jurisdictions is collected in one table?
Security access and control management
Before explaining how entitlement management systems are built, let’s look closer at the access provisioning process from a cyber security perspective.
There are three fundamental steps in such a process: authentication, authorization, and access control.
Step 1: Authentication
The aim of authentication is to ensure that the individual requesting access is verified. The most commonly used method of authentication utilizes a username and password. There are alternative methods, such as fingerprint scanning, PINs, software tokens, smart cards, and one-time codes sent by e-mail or text message.
In many cases, access provisioning processes implement two-factor authentication, where a user proves that they really are who they claim to be, using more than one method.
The ultimate result of authentication is a yes/no answer, reflecting whether a user passed the verification successfully.
Step 2: Authorization
Authorization happens after authentication (once the user is verified). Simply speaking, it determines what the user is allowed to do, as well as what is forbidden to do.
Depending on whether we are talking about a user of a single service or an individual who works for a company, the number of assigned roles, responsibilities, and associated access rights may vary drastically.
In a service such as Facebook, the set of rules and privileges is, in the vast majority of cases, uniformed and limited to matters such as which posts you can see, edit, comment or tag, who can contact you, what messages should be displayed in front of you, etc.
In a company, employees usually have multiple user roles. Let’s give an example:
- Daniel, a software developer, is involved in several development projects. In these projects, he is granted access to code base repositories, application development environments, communication channels, and emailing groups.
- Daniel is based in Dublin; therefore, he has access to Dublin office communication channels.
- He is also an active member of a cyclist group in Dublin; thus, he has access to all group-related resources on SharePoint and Teams channel and is a member of an email group.
- Six months ago, he was asked to help with a serious production issue that caused service unavailability to the company’s clients for almost 20 hours. At that time, he got access to one of the production databases where he was analyzing and patching application configuration data.
- Recently, Daniel’s manager asked him to take part in an agile adoption process as a representative of a software developers team. He is now granted access to the Agile group resources, comms channels, and mailing lists.
Regardless of membership in all groups and the resulting rights, the company may have additional policies (such as cross-border data transfer rules needed to comply with regulations), which will take precedence over the granted access rights. The authorization must take into account all of them and decide what effective set of privileges an individual possesses in regard to the specific resource (so only authorized users have access rights to specific datasets).
Step 3: Access control
This is the final step that turns the information acquired during authentication and authorization into action. Access control prevents a user from reaching resources not meant to be accessed. Several types of access management can be distinguished, such as RBAC (role-based access control), ABAC (attribute-based access control), DAC (discretionary access control), and MAC (mandatory access control).
Role-based access control (RBAC)
Role-based access control establishes permissions based on groups and roles. Individuals can perform any action that is assigned to their role and may be assigned multiple roles as necessary. Users cannot change the level of access control that has been assigned to their role.
Attribute-based access control (ABAC)
Attribute-based access control is a logical access control model that is distinguishable because it controls access to objects by evaluating rules against the attributes assigned to users and resources. Attribute values may constitute a list, a hierarchy, or anything else that may help in determining access.
As an example, let’s presume that the company's area of operation is divided into regions and regions into districts. Each account manager has an attribute specifying the home district associated with their user account. In the same way, every single customer record contains information about the customer’s home district. Now, using ABAC, we can create a rule that each account manager is permitted to view transaction data related to all customers within the same region.
To enforce such a rule, access request workflows must determine step-by-step several pieces of information:
Account manager (AM) → AM’s district → region → all districts in a region → all customers in these districts → customers transaction data.
Implementing such a rule with RBAC is extremely difficult, incurs the creation of dozens of groups per region, and requires time and effort to carefully maintain assignments to these groups, while ABAC allows to implement it with a single policy definition.
Discretionary access control (DAC)
Once a user is given permission to access an object (usually by a system administrator or through an existing access control list), they can grant access to other users on an as-needed basis. This may introduce security vulnerabilities, however, as users are able to determine security settings and share permissions without strict oversight from the system administrator.
Mandatory access control (MAC)
Mandatory access control establishes strict security policies for individual users and the resources, systems, or data they are allowed to access. These policies are controlled by an administrator; individual users are not given the authority to set, alter, or revoke permissions in a way that contradicts existing policies.
How does an entitlements engine plug into architecture?
The entitlements engine will ease the process of rule management, authorization, and access control. Authentication is not covered by entitlements engine features. Moreover, in most cases, such engines are able to integrate with existing SSO/IAM solutions.
When we look at typical data platform reference architecture, data passes there through several stages from left to right:
- Ingestion from source systems
- Data processing and enrichment (ETL/ELT)
- Data storage at several stages - from RAW to consumption-ready
- Data provisioning for consumption
- Data use/consumption performed by end users or downstream applications
- Underlying data management process (ensuring that data at every stage is well documented, secured, monitored, and of known quality)
From a data platform perspective, entitlements engines are included in the process of data provisioning.
Key functions of entitlements engines ensure that data is accessed and consumed by entitled users and applications in a lawful way. To achieve that, we can distinguish several features such tools should contain.
- Centralized policy management – a central repository of policies that spans across all data repositories and data consumption solutions with a convenient management function and an ability to abstract policies from physical data objects.
- Distributed policy execution – where defined policies are applied and executed against all instances of data providers irrespective of their location or technology.
- Self-service / automated entitlement provisioning – data access provisioning frameworks capable of evaluating users, so access to requested datasets is automatically granted to evaluated users in minutes. Such frameworks turn access controls into a data marketplace experience.
- Fine-grained data access – dynamic column- and row-level data filtering.
- Data anonymization, masking, and tokenization – dynamic data obfuscation functions.
- Geo-fencing – functions that implement GDPR-compliant cross-border data transfer policies that regulate the moving and sharing of personal data between different jurisdictions.
- Logging and monitoring – enable tracking of data access and triggering events upon non-standard activities.
Entitlements engine building blocks and operating scenarios
The set of functions listed above is being realized by several entitlement management systems building blocks, in cooperation with solutions and systems of our data ecosystem. Let’s look closer then at what an entitlement engine is built of.
- Central policy repository – entitlement application and repository that maintains all policies.
- Physical connectivity to data – a set of plugin agents capable of interacting with specific data stores or applications.
- Logging and monitoring agents – plugins that allow collecting the activity information, making insights out of it, and applying event-triggering rules that track suspicious activity.
- Integration with company repositories – integration with data catalog and IAM systems.
Physical connectivity to data repositories and implementation of policy execution may be realized in a few scenarios.
Leveraging native governance features
Some database platforms already have built-in data governance features that allow the implementation of fine-grained filters and data obfuscation rules on available datasets. In such cases, the entitlement management platform may leverage existing functions and RDBMS computing power acting as a policy rule aggregator (single place of definition) for different database instances and technologies in use.
Physical query filtering solution
In this scenario, the entitlement engine is a tangible step between the data platform and the user. When a query is passed to a platform, a result sent in return goes through an entitlement engine filtering capability before it is shown to the user. Such an approach requires the entitlement engine to have a scalable computing capability.
SQL injection approach
In such a scenario, the entitlement management system contains a plugin applied to the data repository or a federated query solution. The plugin intercepts user query requests. Then, based on the query and user entitlements definitions, additional filtering conditions are added to such a query with SQL. Finally, the result is returned to the user.
SDK integration approach
This scenario is dedicated to reporting solutions and custom end-user applications where user credentials may differ from those in internal data repositories. In such scenarios, the application has to apply data filtering and obfuscation on its own - based on the information provided by an entitlement engine about what a specific user is entitled to do.
Entitlement management solutions implementation considerations
Let me bring a few final thoughts about key factors that will allow us to choose an optimal technology for an entitlement engine, as well as to define all integration points with existing components and capabilities of your data platform.
What is the most prominent data consumption style in your organization?
The way end users and downstream applications consume data should determine the data repositories and applications on which the entitlement solution should be able to operate.
If users mainly operate on datasets stored in databases and object stores, the entitlement management solution should be able to control access to such data stores, either directly or via federated query engines. If users primarily operate via a reporting solution, the entitlement engine should be able to control what users see through such reporting solutions.
How is your data organized?
How do you describe your data? Is there a firm classification that will allow you to understand how policies should apply to given data? Is such metadata associated with your datasets directly or stored in a central repository or data catalog?
When a data repository contains hundreds of datasets, the implementation of an entitlement engine will face the scenario with hundreds or even thousands of policies to be maintained.
In such a scenario, it may help to classify datasets and attributes using abstract data categories and tagging and refer to these categories while entitlement policies are being defined. Ideally, such classification and tagging tasks should be of a wider Data Management process stored in a central Data Catalog. Then, the entitlement engine can leverage existing classification and tagging definitions via integration with the Data Catalog.
Are your systems scalable enough?
Fine-grained data filtering, obfuscation, and geofencing require computing power. If fine-grained filtering is a key functionality for you, your entitlements solution should support an adequately scalable way of filtering data. Ideally, these functions are native to your database or utilize RDBMS computing power by using techniques such as SQL Injection.
When a solution requires data to pass through a dedicated filtering engine, this component should leverage a scalable technology, for example, MPP (massively parallel processing). Otherwise, it will quickly become a bottleneck.
Is your environment ready for entitlement management systems?
The entitlement engine does not exist in a vacuum. Policy management will require you to understand what the data to be accessed is and who the users to be entitled are. Self-service capabilities may utilize other dedicated applications that your company uses for access approvals and data discovery. In all such cases, the seamless ability to integrate with company tools and processes via an open architecture and set of APIs that allow controlling the entitlement management solution programmatically may become crucial success factors.
Further reading on access management
For those interested in the topic of entitlement management solutions, I would recommend the following books:
- “Access Control and Identity Management, 3rd Edition” by Mike Chapple
- “IT Security Controls: A Guide to Corporate Standards and Frameworks” by Virgilio Viegas, Oben Kuyucu
- “Snowflake Access Control: Mastering the Features for Data Privacy and Regulatory Compliance” by Jessica Megan Larson
- “Data Management at Scale, 2nd Edition” by Piethein Strengholt
Achieve the best results by working with experts
Maxima Consulting has successfully supported its clients in facing technological challenges since 1993. Schedule a meeting with our expert consultant hailing from an office near you and make the first step to accessing top global IT talent, innovative solutions, and dependable managed services.