Data Governance and Security
Reduce security and privacy risks and meet compliance needs with a de-identified data lake
Today’s companies amass a large amount of consumer data, including personally identifiable information (PII). This data contains a wealth of information that analysts can use to improve your business offerings; yet the sensitive data contained within must be protected to retain customer trust as well as comply with privacy regulations and mandates.
A de-identified data lake (DIDL) is an architectural approach designed to help organizations use data as a competitive differentiator, while reducing the risks associated with managing all their data, particularly personally identifiable information. A DIDL helps solve the data privacy problem by de-identifying and protecting sensitive information, before it even enters your data lake. By minimizing storage and use of PII, you can significantly reduce the risk for data breaches and misuse of data, and lower compliance costs. All without losing the ability to understand and use your data for competitive advantage.
DIDL solutions help enterprises get to the root cause of risk when it comes to data architectures and protecting PII. A DIDL can help you discover, identify, catalog, monitor, and protect your data. By remove personally identifiable information before it enters your data lake, you can continue to create value for you and your customers, without the risk.
This includes defining and enforcing policies; security and management of personal information; creating data catalogs and glossaries; data lineage, data masking; and more. We help customers achieve their data governance and compliance requirements to get maximum value from their data.
Data governance is not a new concept, but new regulations such as General Data Protection Regulation(GDPR), as well as the size, complexity, and scope of today’s data, require a completely new approach. Big data technologies made it feasible to collect different types of data from myriad sources such as browsers, mobile applications, and connected devices, building huge repositories of data in the process. However, storing data is not enough.
In most cases, organizations don’t know what data they have, where it’s stored, or how sensitive and accurate it is. This prevents organizations from getting value from their data. It also creates significant risk—from data breaches and misuse of data to loss of corporate secrets and customer trust.
Organizations need to easily discover and understand all the data they manage. They need to automatically identify, categorize, and secure the data, including sensitive and personally identifiable information (PII). They need to make sure data is only accessed by the right people, and only for permitted uses. And they need to do it without hindering their ability to extract value from the data.
Advantages of AWS Data Governance for Data and Analytics
- Data Catalog: A data catalog management system that monitors every asset in the data lake and provides data stewards the ability to manage access to data assets.
- ETL: Extract, Transform, and Load services that integrate with policy-based masking services.
- Masking: A policy-based solution that extracts and masks sensitive PII data before it ever lands in a data lake.
- Matching and de-identified data transfer: Securely transfer second-party data using a decentralized trust model.
There are numerous AWS services that have particular significance for customers focusing on GDPR compliance, including:
- Amazon GuardDuty – a security service featuring intelligent threat detection and continuous monitoring
- Amazon Macie – a machine learning tool to assist discovery and securing of personal data stored in Amazon S3
- Amazon Inspector – an automated security assessment service to help keep applications in conformity with best security practices
- AWS Config Rules – a monitoring service that dynamically checks cloud resources for compliance with security rules