Introduction

At Aditi Consulting, we routinely develop mission-critical systems for enterprise and ISV customers. The systems are spread across embedded, mobile, web, desktop, and distributed channels. Over two decades, the Aditi software engineering team has gained insights into building robust, reliable, and resilient systems. The team would like to share these insights with the community through monthly and quarterly articles.

GDPR – An IT Survival Guide

Overview

GDPR, CCPA, and other data-related regulations grant people new rights related to privacy and data ownership. In this article, we will look at GDPR, its origins, and its impact on data collecting and processing entities. We will also offer guidance on building GDPR-compliant systems. CCPA and other regulations will be reviewed in subsequent articles.

GDPR 2016/679 stands for General Data Protection Regulations. It is a regulation under the EU law on data protection and privacy in the European Union and the European Economic Area. It also addresses the transfer of personal data outside the EU and EEA.

The most significant of the rights granted under GDPR are:

– the right to know, and
– right to erasure

In other words, when an organization collects or stores data pertaining to an EU citizen, the latter has the right to know what information was (or is being) collected and also the authority to compel the former to delete the said information in its entirety.

The regulations cover data held by businesses collected in the course of providing services to those citizens. While GDPR originated in the EU, we need to consider the global nature of the Internet and the impact of entities outside the EU and their role in the genesis of GDPR.

Background

Historically, in the US, data has belonged to the entity that collected it. For example, while the consumer credit history is associated with an individual, it has belonged to the maintaining
organizations (the credit bureaus). Over time, individuals were granted limited access to verify the information or correct errors therein.

As information technology was adopted by various industries, vast amounts of data were collected, cataloged, and correlated, but the usage was largely commercial and sector-specific. In the political sphere, the notable data collection and cataloging exercise was the census. Here the respective governments took great pains to ensure and maintain the privacy of the subjects (even from other governmental agencies). In contrast, the commercial entities bought, sold, swapped, and licensed data while maintaining confidentiality and chain-of-custody across entities, but violating the privacy of the individuals who were represented in that data.

All that started to change in the mid-90s just as GeoCities launched the first social networking site and, in an unrelated move, the European Union introduced the Data Protection Regulation, which reiterated an EU citizen’s right to, “private and family life, his home, and his correspondence.” In the US, however, the regulatory regime favored the rights of the data collector over those of the subject, thus fueling the proliferation and growth of social networking sites such as LinkedIn, Facebook, Twitter, etc. As these companies grew both in terms of size and reach, they acquired EU-based users and opened offices across the continent, and thus became subject to the EU jurisdiction.

Regulation

Fast forward to 2016, when the European Union, in response to cyber threats, technology advances, and concerns about data misuse in general, introduced the General Data Protection Regulation, which increased the individual’s control over their personal data and provided stiff penalties for violations by data collectors/processors.

GDPR requires all data controllers and data processors that handle personal data of data subjects to apply appropriate security and organizational measures in order to safeguard the confidentiality, integrity, and availability of processing services.

GDPR is considered by many in the industry to be one of the most significant information security and privacy laws of our time. Not surprisingly, its import is concomitant with the technical challenges it poses.

Since the data storage and processing systems employed by organizations were designed only under commercial and technical constraints, retrofitting regulatory constraints has proven to be a significant challenge even to the brightest minds in the field. Google, for example, took its appeal to the highest court in the EU to obtain relief from some of GDPR’s most burdensome restrictions.

GDPR Compliance Guide

Information technology systems use abstract mechanisms to store and process data and to perform computations. The mechanisms are designed with specific, limited business and technology objectives in mind. One such example is the pipeline mechanism that processes data in stages where each stage performs a distinct operation on the data. Another example is the incremental backup, which archives data by appending changes. While both mechanisms process data, regulatory concerns fall outside their scope. It is true that some features can be added after the fact. However industry experts understand that retrofitting or extending mechanisms beyond the original scope is, fundamentally, an intractable problem. Therefore, technology leaders often let legacy systems degrade gracefully and implement new ones in their stead.

The directives under GDPR can be considered as requirements to retrofit or extend existing data processing systems with one crucial distinction – it applies to a specific type of data, i.e., personal data.

The implication for system designers is that components must be context-aware in order to distinguish between personal and impersonal data. Now, most organizations have had the ability to distinguish between confidential and public-domain data. So, at first glance, the requirement to discriminate between personal and impersonal data seems trivial, but this viewpoint conflates data-confidentiality and data-classification. We have found that complying with GDPR requires organizations to adopt data-classification schemes of the kind used by defense agencies of national governments.

With this in mind, let us look at the fundamental principle of GDPR compliance – the Canonical Addressability Principle (CAP).

Canonical Addressability Principle (CAP)

Ensure that every byte of data stored/processed is canonically addressable along four dimensions at all times: local, logical, logistical, and legal.

Local

This involves knowing the locality of the data. i.e., which disk, host, network-endpoint, or cloud-environment houses a data element along with the ability to constrain data elements to certain geographies.

Logical

This involves knowing the classification, URL/path, integrity, inter-relationships, and access mechanisms.

Logistical

This involves being able to access selected versions of the same data element and to execute CRUD/archival operations.

Legal

This involves being able to place legal holds, perform audits, or constrain data to specific jurisdictions.

Ideally, the addressing mechanisms must be invertible in a fully automated fashion, i.e., in addition to knowing if a specific data element is subject to a regulatory rule, it must be possible, when given a regulatory rule, to automatically extract all applicable data elements across the organizational domain.

Guidelines

Our experience building data-storage engines has yielded some insights (stated below as guidelines). These may help you in reviewing existing systems for GDPR compliance or in designing new systems to be GDPR compliant:

1. Ensure every byte of data collected, cataloged, and processed by a system adheres to, and is validated by, a well-specified, versioned schema.

2. Annotate data using tags that adhere to a strict or adaptive taxonomy (as applicable).

3. Classify access to data using set membership, ACLs, or rule-based mechanisms (as applicable).

4. Use policies, processes, or procedures at the required level of granularity.

5. Store data in temporal order when possible. If data is not naturally in temporal order, consider applying a synthetic temporal order.

6. Employ data-expiration, i.e., tag each data-element by a timestamp and drop all data older than 8 (say) years.

7. Ensure only one copy of a data element is stored in a canonical location in a portable, industry-standard format. Multiple copies may be stored for archival purposes. The copies, however, must be individually addressable and erasable by a well-specified or easily derivable reference.

8. Use well-specified processes to create, generate, access, protect, and destroy data.

9. Measure the lifetime cost of storing every single byte of data under an organization’s domain.

10. Supplement schema+record-centric view of data with a field+specification-centric view.

Conclusion

Complying with GDPR poses a complex, multi-dimensional challenge for organizations. Practical solutions require new ways of thinking about designing, developing, and deploying information systems. Architects at Aditi Consulting have created mission-critical, GDPR-compliant systems for our clients ranging from large financial institutions to early-stage startups. If you would like to learn more about our services, please contact our customer success team for a complimentary consulting session on GDPR compliance.