Aggregate metadata into OpenMetadata to view table descriptions, change history, data quality, and lineage all in one place

datamanagement

OpenMetadata is a platform that aggregates various metadata such as table schemas and pipelines.

Let’s run using Docker Compose.

$ wget https://github.com/open-metadata/OpenMetadata/releases/download/1.5.7-release/docker-compose.yml
$ docker compose -f docker-compose.yml up --detach
...
[+] Running 10/10
 ✔ Network openmetadata-docker_app_net                        Created                                                                                                                                                                         0.0s 
 ✔ Volume "openmetadata-docker_ingestion-volume-dags"         Created                                                                                                                                                                         0.0s 
 ✔ Volume "openmetadata-docker_ingestion-volume-tmp"          Created                                                                                                                                                                         0.0s 
 ✔ Volume "openmetadata-docker_es-data"                       Created                                                                                                                                                                         0.0s 
 ✔ Volume "openmetadata-docker_ingestion-volume-dag-airflow"  Created                                                                                                                                                                         0.0s 
 ✔ Container openmetadata_elasticsearch                       Healthy                                                                                                                                                                        16.9s 
 ✔ Container openmetadata_mysql                               Healthy                                                                                                                                                                        28.5s 
 ✔ Container execute_migrate_all                              Exited                                                                                                                                                                         27.6s 
 ✔ Container openmetadata_server                              Started                                                                                                                                                                        27.7s 
 ✔ Container openmetadata_ingestion                           Started 

You can login to http://localhost:8585 with [email protected] / admin.

Connecting to Data Sources

Select the data source from Settings > Services > Databases and set the connection information.

Create Metadata Ingestions to import table metadata.

When executed manually or at scheduled intervals, tables will be displayed.

If you change the schema from Glue Data Catalog and re-run it,

it is reflected in OpenMetadata and the change diff can be confirmed.

There is a feature similar to GitHub Issues where you can request insufficient information.

In addition to tags, fields can be attached Glossary Terms to foster common understanding. Terms can be grouped by similar concepts through hierarchy.

Checking Data Quality

When you run a Profiler Ingestion, sample data will be displayed.

You can add data quality tests to tables

to check if any unintended data is included.

Checking Lineage

When you connect with Airflow or other sources, the Lineage, which shows the origin of data and how it was created, will be displayed.

Run Apache Airflow with Docker Compose and execute a workflow - sambaiz-net