News
Reference Data Management Best w/ Enterprise NoSQL
Posted by Amir Halfon on 12 March 2014 03:06 PM

[Note: Amir Halfon is a frequent blogger at FinExtra, a site devoted to the financial technology community. Following is an adaptation from his blog post of January 4.]

In an earlier post, I wrote about the differences between NoSQL, what it is, what it isn’t, and some of the misconceptions surrounding it. First I touched on the Operational Trade Store. Now I will focus on enterprise reference data management. While reference data has a very specific meaning in Financial Services – i.e. data about financial instrument and the legal entities associated with them – the term has a broader meaning and relevance to industries at large. And the challenges that financial institutions face are similar for any large organization.

Here’s is a typical situation: numerous M&A activities lead to numerous reference data systems for different lines of business, geographies, etc. This proliferation causes data inconsistencies, which in turn lead to costly processing exceptions. (In the case of investment banks, trade exceptions substantially increase the cost per trade – a key profitability metric.)

In an effort to resolve this challenge, most firms will attempt to consolidate data from these disparate systems into a single “golden copy.” The problem with this approach is the same problem any data consolidation effort faces: It takes such a long time to come up with a “canonical” data model that combines all the sources, and also handles all the data consumers’ requirements, that by the time the modeling exercise is done it is no longer relevant. And so the holy grail of having a single source of truth never comes to fruition.

The alternative? Schema on read. This term refers to loading data directly into the database in its original form , without first creating a common schema – letting the data contain its own structure, then transforming it to any required downstream format in situ, without the need to pre-define a structure that would address all the possible data consumers’ needs as well.

This is the core difference between NoSQL and relational technologies. In fact that’s one of the main reasons NoSQL is gaining so much momentum – the agility that comes from schema-on-read means that changing business needs can be addressed without extensive data modeling exercises and without expensive ETL middleware feeding into them.

And now for the enterprise part: Enterprise reference data is about managing it at an enterprise-wide scale, across lines of business, applications and geographies. Enterprise NoSQL is about accommodating the needs of such enterprise data management, especially in terms security, transactions, availability and scalability.

The need for enterprise-grade security (fine grain, role-based authorizations), availability (HA, DR, etc.) and scalability are fairly self evident, but let’s take a closer look at transactions: Without ACID transactions, a change in a critical legal entity attribute (called a “corporate action” in finance) has to be visible to all systems processing transactions related to this entity at the same exact time; otherwise processes will break (e.g. a confirmation will be send to the wrong party, a trade will fail to clear, etc). And those breakages then have to be fixed using human intervention that carries exorbitant costs.

There are many other examples for where transactions are essential, and the need for full (cross-record) ACID transaction would seem just as obvious as the need for security — if it weren’t for some of the confusion surrounding NoSQL. Much of this confusion resulted from claims that schema flexibility and full transactional consistency were somehow mutually exclusive. Nothing could be further from the truth. It might be hard to implement a schema agnostic, horizontally scaling database that offers ACID transactions across records, but we’ve done it.

I’ve written more about it here: or better yet come to the MarkLogic World conference in April. You’ll hear not only what we at MarkLogic think — but more importantly — what other’s do.

Reference Data Management Best w/ Enterprise NoSQL from MarkLogic.


Comments (0)