Chicken Soup for the Event-Driven Soul
Posted by Stuart Moorhouse on 04 November 2019 08:00 AM
The title of the blog had been so appealing: “No More Silos: How to Integrate Your Databases with Apache Kafka and CDC“—but then I got hit with a power question:
Because, as a Solution Architect at MarkLogic, I have to say I’m quite partial to adding databases into architectures. And, the blog had already raised some very good points, like this one:
“Yes,” I agreed. But surely not the we-need-a-database-here assumption.
But it’s a good question, isn’t it? In an event-driven architecture, why should systems get data out of databases when they can get it straight from Kafka? Wouldn’t it be easier, quicker and cheaper to cut out the middleman? Do you really need a database if you’re already streaming data into Kafka?
This question posed a bit of a problem for me. And as they say, a problem shared is a problem two people have, so I decided to share that problem with some fellow MarkLogic folks.
David Gorbet, Engineering SVP at MarkLogic, wasn’t ruffled by it. Although he agreed and stated that “a message-/event-based system is a smart way to go for many problems, and I think Kafka is a good technology to use for this,” he made it clear that for many architectures, a database is essential. That’s because if a database is used to harmonize siloed data (including your event messages) then:
And it’s not just spotting errors that a database within an event-driven architecture can help with:
David left me with something to ponder:
Now, it turned out that Ken Krupa, MarkLogic’s VP of Global Solutions Engineering, had heard that question (“Do we need a database in an event-based architecture?”) before. He explained how he’d found that customers had been unable to get an agreed-to, trusted, comprehensive view of things from messages alone. In fact, one customer referred to it as being:
As Kafka becomes ever more popular, and more architectures that span the whole enterprise are built using event-based patterns, the question of why databases should be introduced at all looks as though it is set to be one that a lot of people are going to be asking.
However, I think the answer to this really boils down to one key factor: If you’ll ever need to get an answer about an entity as it exists across the business as a whole (and in a hurry), you’ll need a persistent, harmonized representation of it that you can easily and quickly retrieve (for example, via indexes).
And if you’re not sure if you’ll need a database? Just remember the chicken.