GDPR Challenge 2: State of the art security of documents and metadata

Posted by Daniela Di Noi on 1/18/18 1:06 PM

Find me on:

Do you remember the first GDPR challenge? We talked about how to identify documents that contain personal data and label them appropriately.

Continuing our GDPR series, in this post, Bart van Bouwel and Jean-Luc Goedermans, from  CDI-Partner, they will elaborate on the second challenge: State of the art security of the documents and the metadata.

When we store documents and other files, we not only store files that might contain personal data, we also create personal data by doing this.

When we run a classification algorithm, as discussed in the previous post, we extract personal data from a document and store it in metadata. As a result, databases and indexes are filled with personal data in a semi-structured way. This makes finding and retrieving documents far easier, but it also increases the risk related with keeping personal data.

But there is more. Personal data is information related to an identified or identifiable natural person. For compliance reasons we will keep records of persons accessing, copying, and printing documents containing personal data. This audit trail links natural persons, for instance employees, to the documents they use. By definition, this information is also personal data.

And usage history is not harmless, it can lead to criminal convictions. In the UK there are some examples of hospital staff convicted for accessing medical files for non-professional reasons. This is in fact the very reason of keeping an audit trail, but unauthorized access to this information is a data breach in its own right.

Security of processing is covered in article 32 of the GDPR. We need to have appropriate security related to the risk to the rights and freedoms of natural persons. This is a very broad description that will lead to a lot of interpretation and discussion.


GDPR - Security of processing - Xenit.png


Typically, documents are stored on local or network drives or in the Cloud and distributed by email or by copying them on USB keys. Protecting documents in this way is not easy and options are limited to granting read or write access.

When we store documents in Alfresco, we gain a lot of options. Next to the original document, we can generate and store previews. These can even be personalized with a watermark. It is far easier to address people leaving documents on the printer (a possible personal data breach!) when their name is on every page.

Instead of sending documents by email, we will send links to documents or links to PDF previews. For very sensitive documents a separate login process can be enforced. This reduces the risk of sending documents to the wrong person (the number one type of data breach). And when a person asks to be forgotten, we can automatically inform the receiving third party of this request.

We will need to protect the documents, the database, the indexes, and the log files. For sensitive metadata, for instance the national number, we will use one-way encryption or cryptographic hashing. Only a user that knows the correct key can retrieve the documents related to the key, but the indexes can’t be used to retrieve all the existing keys.

Documents stored in Alfred Archive are always protected and set up according best object storage practices. There is no direct access on a file system basis, which neutralizes 99% of known malware & intrusion risks. Furthermore, documents in the store are encrypted, with proper handling of the private and public key. Access to the document store is under detailed auditing. Life points can set the document to be 'immutable'. Moreover, a health processor continuously checks the binary integrity of all content - which can include the validity of a digital signature.

As for metadata, the database that contains the metadata can be encrypted on database level. Specific metadata fields with personal data can be encrypted and will not be exploitable by any direct access to the database. Index information is protected with encryption as well. No system can directly access the indexes in our reference architecture, as we have a safety gateway in front of the archive, and we have the Alfresco content services in front of the document store.

 Alfred Archive Architecture-1.png


As you can see, storing files and documents in Alfresco with Alfred GDPR ensures adequate and appropriate security to your company’s most vulnerable digital assets.

Coming up next week: Challenge 3 “Monitor and control access to the documents and store the right metadata to proof legitimate purpose”.

The series is not legal advice for your company to use in complying with EU data privacy laws like the GDPR. Instead, it provides background information to help you better understand the GDPR. 


Topics: GDPR, Content Services, Alfresco, Handling Documents, Compliance, Edit Online, Alfred, Edit Offline, PRIVACY, PROCESSING, Security, breaches, sensitive data, governance, personal data, document, Storing data, securing of processing, Alfred Edge, Alfred Inflow, Alfred Object Storage, Alfred Finder, Alfred Desktop

About Xenit 


Xenit delivers Products and Solutions to create Return on Content, on top of Alfresco, the Digital Business Platform. 

Our platform, Alfred, is a blueprint content services architecture with prefabricated components, to unlock the value of Alfresco.

  • Alfred Desktop is a desktop application for Alfresco, that acts as Alfresco and looks like Microsoft Explorer
  • Alfred Finder is a web application to find and retrieve documents on Alfresco, preview them and edit metadata
  • Alfred Edge is an API Gateway, a single point of entry to Alfresco that simplifies and decouples your architecture
  • Alfred Archive is a secure, durable and extremely low cost storage service for data archiving and long-term backup.


Subscribe to Email Updates

Recent Posts