Removal of delicate personal data in documents is a very substantial work. Manual approach is almost impossible because of millions of documents and information hidden in the free textual fields.
A challenge of the sort was presented to us by E-health SA, when they asked to analyse all the epicrises of Digital Story of the two years. Our task was find a way how to automatically remove the delicate data in a situation where they were found numerously and in different forms.
Anonymisation software was developed as a solution, which will help to remove names, personal identification codes, phone numbers and email addresses from the unwanted locations.
In order to test the universal character of the developed software, we have tried it on different document types. Today it works on the data of Estonian Genome Center and removing personal data from judicial decisions.