GDPR, why is it so hard?
Simply put, General Data Protection Regulation (GDPR) means that all organizations need to know- and be able to show what kind of information they possess on individuals plus the reason for having this kind of information. EU Citizens have the right to ask what kind of information is stored about them AND have it deleted. This is more popularly known as the right to be forgotten or right to erasure (RTE). Therefore, organizations need to have a way to detect and delete information on an individual. This means looking through multiple systems, which are often not suited for digging out “personally identifiable information” (PII). To achieve this, it is necessary to have clean and well-structured data and handle a variety of different data formats..
Unstructured data – the biggest challenge with GDPR- compliance
What does the organizational data landscape look like?
Part of the reason why GDPR is so difficult to manage is a large part of the data in any organization is unstructured, making detecting and deleting information a time-consuming task. According to Gartner as much as 80% of the total amount of data an organization possesses is unstructured and found in e-mail clients, file shares, intranets and so on. These sources usually have enormous amounts of data, often with very limited Findability. Have you tried searching for words regarding ethnicity or sexual orientation in your intranet? Or something as simple as a Social Security number?
At Findwise we use search technology to help organizations find and act on all type of data and have done so for 15 years! By combining search technology with new and popular AI techniques you achieve a powerful method to handle GDPR critical data across all data sources and data types.
Findwise’s mission has always been to make relevant data easily accessible, when GDPR was announced, the data our customers considered relevant now included PII. Relevant data sources are processed and indexed into a search engine. This processing consists of several actions; raw data is parsed, normalized, and enriched before it is indexed. Our processing pipeline is called Findwise i3 and can be used with most search technologies including Elasticsearch and Solr. The strength of search technology is that it can handle all types of data and sources, including textual, numerical, geospatial, structured, and unstructured.
In addition, we enrich and structure the content with natural language processing and machine learning and other AI techniques. From a GDPR perspective these can enrich the unstructured content and tag it with entities like names, Social Security numbers, addresses, religion and other things that could be personally identifiable information (PII). By using machine learning we can look at the context these identifiers occur in and identify patterns and combinations between identifiers. Thus, users will not need knowledge of how to navigate different systems or worry about the specific data type containing PII. Instead they will be able to get an overview and handle enquires from one point of entry.
What about Security?
The solution is prepared to run with encryption during transport and behind any certificate provided by the customer.
The multipurpose platform
The use of these technologies doesn’t have to focus on handling GDPR alone!
With this method, you can start with the use case of GDPR-compliance and scale the solution to smarter business critical applications:
- Knowledge Management
- Enterprise Search
- Product Search
- 360-customer view
- And much more…