Apply Row and Column Level Access Control on Glue Data Catalog Tables with Lake Formation in CDK
databaseawsLake Formation is a service that provides RDBMS-like permission model access control for Glue Data Catalog metadata and S3 data. It handles transparent permission checks when returning metadata to engines, and issues temporary credentials for S3 access. Since the objects themselves can still be retrieved, row and column level filtering is the engine’s responsibility. Care should be taken when accessing from non-AWS managed environments like EMR.
Set up the data lake location to be governed by Lake Formation and create a service role.
new lakeformation.CfnResource(this, 'DataLakeLocation', {
resourceArn: `arn:aws:s3:::${dataBucket.bucketName}/employee_db/`,
useServiceLinkedRole: true,
});
const lakeFormationServiceRole = new iam.Role(this, 'LakeFormationServiceRole', {
assumedBy: new iam.ServicePrincipal('lakeformation.amazonaws.com'),
managedPolicies: [
iam.ManagedPolicy.fromAwsManagedPolicyName('AWSLakeFormationDataAdmin'),
],
});
dataBucket.grantReadWrite(lakeFormationServiceRole);
Make this role a Lake Formation Admin. Note that even IAM AdministratorAccess cannot perform any operations, including DESCRIBE, without Lake Formation permissions. With the following settings, newly created tables will have no PrincipalPermissions by default, making them inaccessible via IAM policies alone. However, for tables in databases created before this setting, the IAMAllowedPrincipals group (which includes all users and roles with IAM policies) is granted Super permissions. You need to uncheck “Use only IAM access control ~” and Revoke these permissions for Lake Formation access control to take effect. However, if the location’s hybrid access mode is enabled, only principals that have opted in to the database or table are subject to Lake Formation control.
const synthesizer = this.synthesizer as cdk.DefaultStackSynthesizer;
const cdkExecRoleArn = cdk.Fn.sub(synthesizer.cloudFormationExecutionRoleArn);
const dataLakeSettings = new lakeformation.CfnDataLakeSettings(this, 'DataLakeSettings', {
admins: [
{ dataLakePrincipalIdentifier: lakeFormationServiceRole.roleArn },
{ dataLakePrincipalIdentifier: grantPermissionsFunction.role!.roleArn },
{ dataLakePrincipalIdentifier: cdkExecRoleArn },
],
createDatabaseDefaultPermissions: [],
createTableDefaultPermissions: [],
});
database.addDependency(dataLakeSettings);
Create a DataCellsFilter that specifies rows and columns for the table.
const salesFilter = new lakeformation.CfnDataCellsFilter(this, 'SalesFilter', {
tableCatalogId: this.account,
databaseName: database.ref,
tableName: table.ref,
name: 'sales_department_filter',
rowFilter: {
filterExpression: "department = '営業部'",
},
columnNames: ['employee_id', 'name', 'department', 'position', 'hire_date'],
});
salesFilter.addDependency(table);
When attempting to create PrincipalPermissions to SELECT through this DataCellsFilter, the stack update failed because the DataCellsFilter had not stabilized. Adding a retry via CustomResource resolved the issue. Note that Lake Formation APIs are throttled, so bulk updates take time.
Call AWS API with AwsCustomResource in CDK - sambaiz-net
// ../lambda/grant-lf-permissions/index.ts
import { LakeFormation } from '@aws-sdk/client-lakeformation';
const lakeformation = new LakeFormation({});
interface ResourceProperties {
RoleArn: string;
CatalogId: string;
DatabaseName: string;
TableName: string;
FilterName: string;
}
export const handler = async (event: any): Promise<any> => {
console.log('Event:', JSON.stringify(event, null, 2));
const props = event.ResourceProperties as ResourceProperties;
const { RoleArn, CatalogId, DatabaseName, TableName, FilterName } = props;
const resource = {
DataCellsFilter: {
TableCatalogId: CatalogId,
DatabaseName: DatabaseName,
TableName: TableName,
Name: FilterName,
},
};
if (event.RequestType === 'Delete') {
await lakeformation.revokePermissions({
Principal: { DataLakePrincipalIdentifier: RoleArn },
Resource: resource,
Permissions: ['SELECT'],
});
return {
PhysicalResourceId: event.PhysicalResourceId || 'SalesPermissions',
};
}
// Create or Update
await sleep(30000);
await lakeformation.grantPermissions({
Principal: { DataLakePrincipalIdentifier: RoleArn },
Resource: resource,
Permissions: ['SELECT'],
});
console.log('Successfully granted permissions');
return {
PhysicalResourceId: 'SalesPermissions',
};
};
function sleep(ms: number): Promise<void> {
return new Promise(resolve => setTimeout(resolve, ms));
}
Associate the DataCellsFilter sales_department_filter with salesRole.
const grantPermissionsFunction = new NodejsFunction(this, 'GrantPermissionsFunction', {
entry: path.join(__dirname, '../lambda/grant-lf-permissions/index.ts'),
handler: 'handler',
runtime: lambda.Runtime.NODEJS_22_X,
timeout: cdk.Duration.minutes(2),
});
grantPermissionsFunction.addToRolePolicy(new iam.PolicyStatement({
actions: [
'lakeformation:GrantPermissions',
'lakeformation:RevokePermissions',
'glue:GetDatabase',
'glue:GetTable',
],
resources: ['*'],
}));
const salesPermissionsProvider = new customResources.Provider(this, 'SalesPermissionsProvider', {
onEventHandler: grantPermissionsFunction,
});
const salesPermissions = new cdk.CustomResource(this, 'SalesPermissions', {
serviceToken: salesPermissionsProvider.serviceToken,
properties: {
RoleArn: salesRole.roleArn,
CatalogId: this.account,
DatabaseName: database.ref,
TableName: table.ref,
FilterName: 'sales_department_filter',
},
});
salesPermissions.node.addDependency(salesFilter);
Direct permission grants to databases or tables without going through DataCellsFilter could be done with CfnPrincipalPermissions.
new lakeformation.CfnPrincipalPermissions(this, 'SalesDatabasePermissions', {
principal: {
dataLakePrincipalIdentifier: salesRole.roleArn,
},
resource: {
database: {
catalogId: this.account,
name: database.ref,
},
},
permissions: ['DESCRIBE'],
permissionsWithGrantOption: [],
});
When querying with salesRole, you can confirm that rows and columns are filtered in the results.
$ QUERY="SELECT * FROM employee_db.employees"
$ QUERY_ID=$(aws athena start-query-execution \
--query-string "$QUERY" \
--profile admin \
--query QueryExecutionId \
--output text) && \
sleep 5 && \
aws athena get-query-results \
--query-execution-id "$QUERY_ID" \
--profile admin \
--query "ResultSet.Rows[*].Data[*].VarCharValue" \
--output table
+-------------+-------------+-------------+-----------+----------+-------------------------+--------------+
| employee_id| name | department | position | salary | email | hire_date |
| 1001 | 山田太郎 | 営業部 | 部長 | 8000000 | [email protected] | 2015-04-01 |
| 1002 | 佐藤花子 | 営業部 | 課長 | 6500000 | [email protected] | 2017-06-15 |
| 1003 | 鈴木一郎 | 営業部 | 主任 | 5500000 | [email protected] | 2019-08-20 |
| 1004 | 田中美咲 | 営業部 | 一般 | 4500000 | [email protected] | 2021-04-01 |
| 1005 | 高橋健太 | 営業部 | 一般 | 4200000 | [email protected] | 2022-03-15 |
| 1006 | 伊藤由美 | 人事部 | 部長 | 7800000 | [email protected] | 2016-05-10 |
| 1007 | 渡辺直樹 | 人事部 | 課長 | 6200000 | [email protected] | 2018-07-01 |
| 1008 | 中村恵 | 人事部 | 主任 | 5300000 | [email protected] | 2020-03-15 |
| 1009 | 小林雅人 | 人事部 | 一般 | 4700000 | [email protected] | 2021-09-01 |
| 1010 | 加藤さくら | 人事部 | 一般 | 4500000 | [email protected] | 2022-06-01 |
+-------------+-------------+-------------+-----------+----------+-------------------------+--------------+
$ QUERY_ID=$(aws athena start-query-execution \
--query-string "$QUERY" \
--profile sales \
--query QueryExecutionId \
--output text) && \
sleep 5 && \
aws athena get-query-results \
--query-execution-id "$QUERY_ID" \
--profile sales \
--query "ResultSet.Rows[*].Data[*].VarCharValue" \
--output table
---------------------------------------------------------------------
| GetQueryResults |
+-------------+-----------+-------------+------------+--------------+
| employee_id| name | department | position | hire_date |
| 1001 | 山田太郎 | 営業部 | 部長 | 2015-04-01 |
| 1002 | 佐藤花子 | 営業部 | 課長 | 2017-06-15 |
| 1003 | 鈴木一郎 | 営業部 | 主任 | 2019-08-20 |
| 1004 | 田中美咲 | 営業部 | 一般 | 2021-04-01 |
| 1005 | 高橋健太 | 営業部 | 一般 | 2022-03-15 |
+-------------+-----------+-------------+------------+--------------+