Introduction

In a previous article, we looked at the next generation of privacy-regulations such as GDPR and their impact on organizations that gather, store and process data. We also looked at the Canonical Addressability Principle and guidelines for designing compliant data storage services.

In this article, we study the impact of regulations on system architecture and explore the solution space.

Background

Every modern organization depends upon one or more [Systems-of-Record (SOR)] (https://en.wikipedia.org/wiki/Systemofrecord). They help organizations operate, learn, and improve. An SOR, should, in theory, offer retrieval, access control, audit, archival and secure-destruction mechanisms. In practice, these capabilities are distributed across multiple sub-systems. For example, a query is executed against a database stored on a filesystem whose access-control-list resides in a federated directory service.

Diffused capabilities reflect the evolution of information systems in response to ever-changing business and technical requirements. Over decades, the best ideas from these systems were formalized into architectural patterns. While the data may have resided in the SORs, the processing contexts proliferated as various departments used the same data for different purposes. Oftentimes, the context itself was persisted because computation was more expensive than persistence (caching and memoization).

Regulations & Implications

The indiscriminate persistence of data and context was acceptable until such systems were internetworked, or cloud enabled. At which point, the systems became susceptible to remote data-breaches and cybersecurity threats. And given the impact of these threats- on citizens, systems, and infrastructure- governments began introducing a regulatory safeguard.

Which brings us to now. At a first glance, data-privacy regulations appear to merely stack another layer onto existing constraints or requirements. It is only when we consider the combinatorial interaction of the set of constraints across various architectural patterns do, we grasp the magnitude of the problem. From the standpoint of designing and implementing systems, regulations such as GDPR and CCPA inject residency, sovereignty, and privacy contexts into low-level data operations. Effectively stacking another dimension onto existing constraint vectors of business and technical requirements. This extra dimension of regulatory requirements introduces new challenges for system designers and new obligations for organizations.

For example, CCPA requires data processing systems to extend certain privileges to Californians. To accomplish this, a residency flag must be available in all contexts where user-data is processed. This problem is further compounded when another state introduces similar regulations. System designers are faced with questions such as:

  • How do we decompose regulatory clauses into code and data?
  • Are existing systems set up to accept new and arbitrary contexts?
  • How are exceptions handled in regulatory contexts?
  • Can third parties provide services while satisfying regulations?

Since regulations carry both civil and criminal penalties, the liability involved in building compliant information systems approaches levels previously observed only in defense and aerospace industries.

Solution Space

Given this overview of the problem domain, let us look at the solution space. A rough, workable, guideline that emerges from our deliberations thus far may be stated as follows:

Specified data fields must always be bounded by a regulatory envelope: at-rest, in-transit, and in-process.

Most organizations have achieved data security at-rest and in-transit using cryptographic constructs such encryption-keys, certificates, and algorithms. Achieving in-process security may involve broader, sweeping changes such as:

  • Developing an organization-wide, informational architecture model
  • Restructuring & reorganizing some departments
  • Emphasizing processes & specifications
  • Focusing on system architecture, design & documentation

Practical implementation would entail injecting applicable regulatory context comprising of data, schema, and rules into an existing data-processing pipeline. We refer it this as RCAP (Regulatory-Context Aware Pipelines). A schematic diagram is provided below:

In conclusion, GDPR and similar data-governance regulations would require organizations to become deliberate and methodical about data and operations, perhaps becoming self-reflective and even.

In the next article, we will look at canonical data-journal with an embedded governance engine. Designed from the ground-up to enable financial, telecom, retail, automotive and healthcare companies to store and retrieve mission critical data.