Last month, Ethias, one of the largest insurance companies in Belgium, had reached the milestone of 100 million documents managed in the Alfresco Digital Business Platform. Reaching this milestone in our customer’s environment reaffirms our commitment to continuous re-evaluations of the system to look for optimisations to current or future scalability limitations.
As we have done leading up to 100 million documents, we are running various scaling scenarios and tests in our own infrastructure for amounts of data beyond the current milestone. Using pseudo-randomly generated data sets based on statistical distribution and value ranges of real-life production data, we can reliably reproduce emerging concerns and issues, test our theories about root causes as well as the mid- to long-term solutions to address these.
Ethias is the third-largest insurer in Belgium, employing around 1,800 people with 2 main headquarters in Liege and Hasselt and more than 40 regional offices across Belgium. With more than a million loyal customers, Ethias directly insures public authorities, companies and individuals without requiring them to use a broker. Ethias had a legacy instance of Documentum which couldn’t fulfill the need of bringing new services to their external customers and couldn’t keep up with the standards their internal customers were requesting regarding the collaboration under the branch offices.
Searching and retrieving documents was a particular pain point for end users and led to poor productivity when using the system. In 2016, NRB & Xenit worked together to provide a 99.9% high availability, cloud hosted Alfresco Content Services solution, integrated with object storage technology from Caringo and an Oracle database, with limitless content scalability, in just 6 months.
Successfully running a 100M docs
Before scaling beyond 100M, we did successfully run Alfresco up to 100M documents. This was made possible by some choices we made in software and application architecture.
We started off by defining a business API, modeling the concepts of the client in a clear REST API and hiding away the implementation in Alfresco. The advantage over offering one of the generic built-in API’s offered by Alfresco (REST API, CMIS), is that it defines a clear contract between the ECM application and the business applications. Since it is documented using swagger, it is easy to develop against. Since this API is only offering exactly what is needed, it is much easier to write automated tests that cover the full functionality. Changes in the implementation (document model, …) can be offered in a new version of that API while maintaining backward compatibility. This approach enabled us to decouple Alfresco deployments from the deployment of the applications using the API’s.
The permission requirements for the document repository did not really fit the standard Alfresco permission model. The permissions of a document can be strictly derived from the metadata of the document. Implementing this based on folder inheritance of ACL’s would have resulted in an explosion of ACL’s and complex bookkeeping when changing metadata. That’s why we opted for a custom permission implementation, based on the metadata of the document.
The whole project, with all customizations, is built using Docker. The images that are tested on the build server, and published to our Docker registry. That way, it is easier to ship tested artifacts to client environments.
In the next blog, Axel Faust, Alfresco Expert / IT Consultant / Software Architect, will talk about how to scale up beyond 100 million documents in Alfresco with sharding and cache pre-warming.