Aggregate metadata into OpenMetadata to view table descriptions, change history, data quality, and lineage all in one place
datamanagementOpenMetadata is a platform that aggregates various metadata such as table schemas and pipelines.
Let’s run using Docker Compose.
$ wget https://github.com/open-metadata/OpenMetadata/releases/download/1.5.7-release/docker-compose.yml
$ docker compose -f docker-compose.yml up --detach
...
[+] Running 10/10
✔ Network openmetadata-docker_app_net Created 0.0s
✔ Volume "openmetadata-docker_ingestion-volume-dags" Created 0.0s
✔ Volume "openmetadata-docker_ingestion-volume-tmp" Created 0.0s
✔ Volume "openmetadata-docker_es-data" Created 0.0s
✔ Volume "openmetadata-docker_ingestion-volume-dag-airflow" Created 0.0s
✔ Container openmetadata_elasticsearch Healthy 16.9s
✔ Container openmetadata_mysql Healthy 28.5s
✔ Container execute_migrate_all Exited 27.6s
✔ Container openmetadata_server Started 27.7s
✔ Container openmetadata_ingestion Started
You can login to http://localhost:8585 with [email protected] / admin.
Connecting to Data Sources
Select the data source from Settings > Services > Databases and set the connection information.
Create Metadata Ingestions to import table metadata.
When executed manually or at scheduled intervals, tables will be displayed.
If you change the schema from Glue Data Catalog and re-run it,
it is reflected in OpenMetadata and the change diff can be confirmed.
There is a feature similar to GitHub Issues where you can request insufficient information.
In addition to tags, fields can be attached Glossary Terms to foster common understanding. Terms can be grouped by similar concepts through hierarchy.
Checking Data Quality
When you run a Profiler Ingestion, sample data will be displayed.
You can add data quality tests to tables
to check if any unintended data is included.
Checking Lineage
When you connect with Airflow or other sources, the Lineage, which shows the origin of data and how it was created, will be displayed.
Run Apache Airflow with Docker Compose and execute a workflow - sambaiz-net