Lineage and Metadata Classification Using Apache Atlas on Amazon EMR – Whitepaper

With the ever-expanding and ever-evolving role of data in today’s world, data governance is an essential aspect of effective data management. Many organizations use a data lake as a single repository to store data that is in multiple formats and that belongs to a business entity in the organization.The use of metadata, cataloging, and data lineage is key to the effective use of the lake.

This Whitepaper walks you through how Apache Atlas installed on Amazon EMR can provide the ability to do so. You can use this setting to dynamically classify data and view data lineage as it progresses through various processes. As part of this, you can use a domain specific language (DSL) in Atlas to look up the metadata.

