- Posted by redglue
- On June 4, 2017
- 0 Comments
- gdpr, machine learning, open source, security
GDPR (EU General Data Protection Regulation) is around the corner and bigger companies are getting ready to adopt it as they already know what kind of penalties come from non-compliance.
It replaces replaces the Data Protection Directive 95/46/EC and was designed to harmonize data privacy laws across Europe and it is the biggest change on data privacy regulation in 20 years for Europe.
While GDPR main elements can be a little tricky to understand, one thing is clear as sensitive Data Discovery is mandatory, so you can find the Personal and sensitive information on your data repositories, that can be almost everything from databases to files.
Basically, we focus our data discovery on three main areas: column discovery, data discovery and file discovery.
Column discovery is easy to understand, based on specific keywords or sentences we find column names on databases and match it with possible sensitive data.
Where the fun begins is on data discovery and file discovery.
We use a tool that is based on Apache OpenNLP that is a library for machine learning based toolkit for the processing of natural language text. From that we use pre-trained Machine Learning (OpenNLP) models (a few examples here that are public, but only for English language) and using techniques like tokenization, sentence segmentation, named entity extraction and parsing to understand if the data is sensitive or not.
As example, if you column is called “X_DATA” but has personal information like Address, column discovery will not help. From them, we explore into a sample of data inside “X_DATA” and apply our pre-trained models based on OpenNLP to understand if that sample contains any Address.
Our pre-trained ML models are applied for data in Portuguese and English and the tool we use in produce the following result (screenshot) with a probability column for each “suspect”.
In the end, instead of download reports, pdfs and pptx about GDPR drop us an email (firstname.lastname@example.org) an we can help you to define your GDPR technical challenges.