Are your streams noisy?


As event-driven platform take off ✈️ with modern integration practices such as data-in-motion, organisations are integrating significantly large number of data and event streams 📈 without deliberate and up front design these growing streams are contributing to noise, complexity and coupling leading to higher implementation (CAPEX) and operating (OPEX) costs. In this post we look at common patterns in stream noise to signal conversion and cost optimal choices which lead to a strategic outcome

What’s the issue?

Data and event streams originate from core systems, new connectors are making it simpler to dump core system entity information into streaming platforms like apache Kafka. The goal is to accelerate legacy applications and service modernization through simple and quick connectors, this is in the scope of not just traditional messaging with small event footprint but more data oriented pipes to provide a stream of changes (note: If your organisation is not actively exploring event streaming and relying solely on messaging for integration then let us chat!)

Now, while these system connectors are easy to setup and implement – without deliberate upfront design (i.e accidental architecture) these connectors can end up producing streams of system entity change events which are noise without appropriate context. These require consumers to parse them and in doing so consumers must first understand the internal structure of the systems then process them with duplicated logic and additional context

Noisey streams push onus of translation to consumers leading to extra effort in implementation and maintenance

Costly data swamps: This not only leaks internal system implementation leading to coupling between system and consumers but also adds significant cost of processing to convert to noise to signal in the process diluting business value delivered. Over time, these streams lead to high operational costs and data swamps

Are your streaming services building a data swamp?

State of data and event Streaming

InfoQ’s annual architecture trend has event driven architecture in late majority in 2022 and 2023, implying there is a slow adoption. My observation has been that the adoption rate has increased and will accelerate as organisations look to build more data-driven insights and modernise legacy applications

Messaging

Consider streaming (kafka) vs messaging (queues, topics) for larger volumes, ordering, replay-ability of messages and exactly once semantics (EOS) to boot your integration offering. So are you streaming?

InfoQ architecture trend – 2022 vs 2023 “Event-driven” architecture in late majority for both years

Are your streams noisy?

Change Data Capture (CDC) streams are where organisations start their streaming journey, this is often a quick way to get plugged straight into the data store of a core system and start publishing events, however as this practice accelerates the entity changes published from core systems are meaningless noise without business context. Business events with more coarse-grained information are more consumable and provide business domain based information vs pure system data oriented signals

Business events vs System Entity change events – if you are publishing pure CDC to your consumers then you may be pushing out noise and encouraging model based coupling

3 key patterns for noise into signal

There are 3 key patterns based on where you can convert signal to noise – 1. at the source, 2. in the middle or 3. at the end consumer

When messages travel from provider to consumers, there are 3 ways to transform noise to signal

Pattern #1 and #3 work well in a 1-1 ecosystem with a single provider and consumer, however as this scales the cost of providers and consumers doing the transformation into business oriented messages increases leading to the broker pattern with pattern 2. In implementing pattern #2 use something in the middle to transform raw system events into business events allowing systems to plug-in as providers or consumers

Noise to Signal Processing: Business Domain APIs and Events

Converting from noise to signal using pattern #2 can lead to a cleaner architecture along with reusable business events streams that can decouple stream providers and consumers. This has the added benefit of centralising operational costs to a single component. The broker in pattern #2 is a converter from “system” to “common vocabulary” (Published language in DDD) and aligned to business domains. This is a domain service which encapsulates the business domain capabilities with services and outbound events and data streams which adhere to a common model aligned to the business language of this domain.

The domain service consumes system entity change event streams from core systems and publishes these messages to domain aligned streams after transforming the message to a standard format. These services and the domain streams are maintained and operated by a domain aligned team (in de-centralised or federated models) or an integration practice (centralised model)

Noise-to-Signal Processing: business domain services produce Domain Events

How do you design Business Events

With the business!

Getting to a domain services requires upfront domain analysis with business SMEs to understand what business events are and this can be done through techniques such as event-storming and domain storytelling which are part of strategic domain driven design (DDD)

Summary

We looked at messaging vs event streaming and how late majority adoption of event streaming is giving way to faster data integration to core systems with streams of entity data change being published. This method is leading to more data noise for consumers leading to greater IT spend on building and maintaining processing logic for converting noise to signal and leading to duplication and coupling concerns architecturally.

If you are still into plain old messaging then as integration practice owners, architects and engineers consider using event streaming as the data-in-motion practice provider broader capabilities, especially ones needed today for data insights and AI models. Also, when implementing streaming consider domain oriented event streams with change data capture streams to publish signal instead of noise to consumers

Leave a Comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s