Messaging, Events and Data-in-Motion: Asynchronous communication patterns

<div class="cs-rating pd-rating" id="pd_rating_holder_1819065_post_1710"></div> <p class="wp-block-paragraph">When integrating systems we often end-up writing <strong>asynchronous</strong> messaging interfaces for mostly system-to-system communications. This conversation technique is great because it does not require the sender and receiver to stay connected to each other in a session at the same instance in time, it is non-blocking and often brokered (like the post office which brokers letters between a consumer and receiver)</p> <div class="wp-block-image"> <figure class="aligncenter size-large is-resized"><a href="https://alok-mishra.com/wp-content/uploads/2021/04/alok-mishra-sync-async2.png"><img src="https://alok-mishra.com/wp-content/uploads/2021/04/alok-mishra-sync-async2.png?w=666" alt="" class="wp-image-2725" width="304" height="273" /></a></figure> </div> <p class="wp-block-paragraph">This conversation technique can seem “slow” and “unreliable” compared to a synchronous real-time call, however this is a common fallacy because a good message broker and message persistence can make this communication style more reliable than real-time communications over a unreliable network</p> <h3 class="wp-block-heading">1. <strong>Asynchronous and Synchronous communications can happen within a broad context</strong></h3> <p class="wp-block-paragraph">Note cross-system asynchronous communication happens in the context of an end-to-end conversation and there may be synchronous real-time communication between the front end and back end (search, update etc) leading to asynchronous cross-system integration behind the scenes (sync of updated data)</p> <div class="wp-block-image"> <figure class="aligncenter size-large is-resized"><a href="https://alok-mishra.com/wp-content/uploads/2023/05/alok-mishra-async-broader-context2.png"><img src="https://alok-mishra.com/wp-content/uploads/2023/05/alok-mishra-async-broader-context2.png?w=1024" alt="" class="wp-image-2727" width="473" height="433" /></a><figcaption class="wp-element-caption">Broader conversational context could consist of synchronous and asynchronous communication</figcaption></figure> </div> <p class="wp-block-paragraph">There may also be asynchronous client conversations like one-way form submission or outbound notification to client which trigger async communication or are triggered by async notifications. Knowing and recognising these wider contextual communication patterns makes integration solution robust beyond mere “sync” and “async” patterns</p> <p class="wp-block-paragraph">Lets examine next the patterns used by users and systems communicate asynchronously </p> <h3 class="wp-block-heading">2. There are 3 key asynchronous communication patterns using events and data</h3> <p class="wp-block-paragraph">There many ways in which users and systems communicate asynchronously and we can broadly classify these into the follow key patterns: </p> <ol class="wp-block-list"> <li><strong>State change</strong> notification</li> <li>State change with<strong> state snapshot data </strong>(lightweight or bulk )</li> <li>Events with <strong>commands </strong></li> </ol> <h4 class="wp-block-heading">2.1. State change events</h4> <p class="wp-block-paragraph">Events containing lightweight information about the state of an entity are state change events. These events are great for triggering actions in downstream consumers (event choreography) for de-coupled integrations. For example, state of a business entity is propagated via an event queue or topic e.g “Order Confirmed”, “Claim Submitted”, “Student Enrolled” etc. </p> <div class="wp-block-image"> <figure class="aligncenter size-large is-resized"><a href="https://alok-mishra.com/wp-content/uploads/2023/05/alok-mishra-state-events2-1.png"><img src="https://alok-mishra.com/wp-content/uploads/2023/05/alok-mishra-state-events2-1.png?w=939" alt="" class="wp-image-2729" width="471" height="512" /></a><figcaption class="wp-element-caption">State Transitions</figcaption></figure> </div> <p class="wp-block-paragraph">2.1.1 <strong>Considerations</strong></p> <p class="wp-block-paragraph">Use timestamps and correlation identifiers to order and relate state change events as consumers may need ways to determine when and in which order these happened. With state change events consumers need a way to query the latest state and often these messages are accompanied by query APIs to get the latest state. The state change event pattern is safer than full state transfer event mechanism (pattern 2) because it does not explicit require the messages to be in order (i.e. the consumer can react to an event and query the latest state vs learning the latest state from a series of messages)</p> <p class="wp-block-paragraph">2.1.2 <strong>Delivery</strong></p> <p class="wp-block-paragraph">State change events are delivered by using a broker and a message topic in a publish/subscribe (pub-sub) model. The last mile delivery of these messages to the subscribers can be a queue but the central platform is often “durable” topic for multiple subscribers. A <strong>durable topic </strong>keeps the message for a while until all consumers have read the message (like a bulletin board)</p> <div class="wp-block-image"> <figure class="aligncenter size-large is-resized"><a href="https://alok-mishra.com/wp-content/uploads/2023/05/alok-mishra-broker.png"><img src="https://alok-mishra.com/wp-content/uploads/2023/05/alok-mishra-broker.png?w=680" alt="" class="wp-image-2730" width="-368" height="-462" /></a><figcaption class="wp-element-caption">A Broker is the infrastructure component that facilitates message transfer in an asynchronous communication pattern</figcaption></figure> </div> <p class="wp-block-paragraph">Note a durable topic may or may not keep a series of messages and do not allow consumers to replay messages from a specific point in time, this is where<strong> streaming platforms </strong>shine. For example, Salesforce platform events provides the ability for each consumer to move a pointer back to a point up to 24-hours in the past and replay the messages</p> <p class="wp-block-paragraph">The state transitions happen on “saved”, “submitted” etc above and the events from master system to downstream subscribers can trigger reactions (or nothing). These reactions can be part of a Choreography solution</p> <h4 class="wp-block-heading">2.2. State Change with Snapshot (data)</h4> <p class="wp-block-paragraph">In this pattern, events contain full information about the entity. These events are typically used for data synchronisation with downstream consumers in enterprise application integration (EAI) scenarios. The message provider (source) either sends the entire snapshot or adapters (integration adapters) for the provider intercept an event and publish the full snapshot in a query-enrich pattern</p> <div class="wp-block-image"> <figure class="aligncenter size-large is-resized"><a href="https://alok-mishra.com/wp-content/uploads/2023/05/alok-mishra-state-change-event-with-data2.png"><img src="https://alok-mishra.com/wp-content/uploads/2023/05/alok-mishra-state-change-event-with-data2.png?w=983" alt="" class="wp-image-2731" width="467" height="485" /></a><figcaption class="wp-element-caption">Data with the state change event</figcaption></figure> </div> <p class="wp-block-paragraph">2.2.1 <strong>Considerations</strong></p> <p class="wp-block-paragraph">This pattern of sending “chunks” of data vs the notification of state change requires more stability in the way the messages are sent for robust and reliable delivery. </p> <ol class="wp-block-list"> <li><strong>Order</strong>: First it requires the messages be in-order to avoid overwriting new updates with older ones</li> <li><strong>Durability</strong>: Secondly, it requires durability and persistence in messaging infrastructure so as to not lose any in-flight messages as dropped updates can have a big impact. </li> <li><strong>Observability</strong> & <strong>Reconciliation</strong>: Data synchronisation issues across integrated systems is a common challenge therefore this approach also requires observability and reconciliation processes orthogonal to the integrations</li> <li><strong>Lightweight vs Bulk data transfer</strong>:The size of the data synchronised across the systems needs to be considered as well given some of the messaging infrastructure is not suitable to handle large volumes of data streaming across systems. Traditionally these are handed over to bulk ETL jobs for large data synchronisation and more recently this is done through streaming data technologies like Apache Kafka </li> </ol> <p class="wp-block-paragraph">2.2.2 <strong>Delivery</strong></p> <p class="wp-block-paragraph">Events with state and data are delivered using similar messaging infrastructure however given the need for ordering and persistence there are first-in-first-out (FIFO) channels and durable stores to support the transport</p> <p class="wp-block-paragraph">2.2.2.1 <strong>Using Message Brokers or Event Bus</strong> : For light weight data synchronisation across systems a pub/sub model with a message topic (or event bus) is suitable</p> <div class="wp-block-image"> <figure class="aligncenter size-large"><a href="https://alok-mishra.com/wp-content/uploads/2023/05/alok-mishra-durable-store-or-esb-for-events.png"><img src="https://alok-mishra.com/wp-content/uploads/2023/05/alok-mishra-durable-store-or-esb-for-events.png?w=1024" alt="" class="wp-image-2735" /></a><figcaption class="wp-element-caption">FIFO queues and durable stores are often needed for events that require message ordering and persistence </figcaption></figure> </div> <p class="wp-block-paragraph">2.2.2.2 <strong>Using streaming platform for Events like Salesforce Platform Events: </strong>For more advanced consumption requirements (with longer persistence, message playback across a small window) an event streaming platform can be used. For example, we have seen an evolution of the messaging infrastructure into streaming platforms which can provide the consumers the ability to <em><strong>replay</strong></em> messages up to a certain period in time (e.g. <a href="https://trailhead.salesforce.com/content/learn/modules/platform_events_basics/platform_events_architecture?trail_id=architect-solutions-with-the-right-api">Salesforce platform events vs generic events</a>) using a consumer specific pointer. </p> <div class="wp-block-image"> <figure class="aligncenter size-large is-resized"><a href="https://alok-mishra.com/wp-content/uploads/2023/05/alok-mishra-platform-events-store-1.png"><img src="https://alok-mishra.com/wp-content/uploads/2023/05/alok-mishra-platform-events-store-1.png?w=1024" alt="" class="wp-image-2737" width="542" height="232" /></a><figcaption class="wp-element-caption">Streaming platform for events +1 on the traditional event bus and message brokers by providing better protocol, durability and access</figcaption></figure> </div> <p class="wp-block-paragraph">2.2.2.3 <strong>Message Broker vs Event Streaming Platform</strong></p> <p class="wp-block-paragraph">A message broker and a platform for streaming events both have durability however the key difference is the in how they provide the data to the consumers. A platform for streaming events provides longer duration for storing messages (days vs hours), allows consumers to navigate the sequence of messages using their consumer “pointer” to replay from a point in time and allows for multiple reads (vs one time per subscriber)</p> <div class="wp-block-image"> <figure class="aligncenter size-large is-resized"><a href="https://alok-mishra.com/wp-content/uploads/2023/05/alok-mishra-platform-durable-topic-vs-events-platform.png"><img src="https://alok-mishra.com/wp-content/uploads/2023/05/alok-mishra-platform-durable-topic-vs-events-platform.png?w=1024" alt="" class="wp-image-2738" width="573" height="228" /></a><figcaption class="wp-element-caption">Durable message topic vs a Streaming platform</figcaption></figure> </div> <p class="wp-block-paragraph">2.2.2.4 <strong>Data Streams vs Event Streams: </strong></p> <p class="wp-block-paragraph">A streaming platform for events as mentioned above has better features for client applications and more durability than traditional message brokers however these products can be limited in size and may not fit large data transfer use cases. For example, traditional extract transfer and load (ETL) workloads which were done using batch processes cannot be solved using an event streaming platform given the need for larger and longer storage, direct connection to database change logs and ability to do processing on streams of data</p> <p class="wp-block-paragraph">This is where tools like Apache Kafka (<a href="https://kafka.apache.org/)">[1])</a> shine. Kafka provides a highly scalable, durable and robust streaming platform for events and data with connectivity to traditional databases. Companies like Confluent lead the implementation space with connectors and query language over the streams (KSQL) and provide features which have made event and data driven solutions highly reliable, extensible and robust (<a href="https://developer.confluent.io/what-is-apache-kafka/">%5B2%5D</a>). </p> <div class="wp-block-image"> <figure class="aligncenter size-large is-resized"><a href="https://alok-mishra.com/wp-content/uploads/2023/05/alok-mishra-streaming-platforms.png"><img src="https://alok-mishra.com/wp-content/uploads/2023/05/alok-mishra-streaming-platforms.png?w=1024" alt="" class="wp-image-2740" width="600" height="374" /></a><figcaption class="wp-element-caption">Event Streaming platforms use pattern 2 where changes in data is transmitted to the stream by the producer. Instead of an application sending this, the data changes can come from the database change logs (e.g. via Kafka connectors) </figcaption></figure> </div> <p class="wp-block-paragraph">This technique of <em><strong>using streaming platforms like Apache Kafka as the backbone for processing large streams of data to integrate providers and consumers in real-time</strong></em> is called <strong>Data in Motion</strong></p> <h4 class="wp-block-heading">2.3 Command Event </h4> <p class="wp-block-paragraph">Events used to create or update data or invoke some action in an end system (service provider) are called command events. These are an <strong>anti-pattern </strong>and should be avoided given they is no way the invoker/consumer of a service can know if the command was successful or not</p> <div class="wp-block-image"> <figure class="aligncenter size-large is-resized"><a href="https://alok-mishra.com/wp-content/uploads/2023/05/alok-mishra-event-command-pattern.png"><img src="https://alok-mishra.com/wp-content/uploads/2023/05/alok-mishra-event-command-pattern.png?w=643" alt="" class="wp-image-2708" width="194" height="309" /></a><figcaption class="wp-element-caption">Command events </figcaption></figure> </div> <p class="wp-block-paragraph"><strong>But there are exceptions no?</strong> A <strong>buffered command</strong> pattern violates this and perhaps we <em>need</em> a command event pattern for scenarios like this example below. Imagine a scenario where a flood of requests are expected with one-way invocation, for example, application submission (voting) where the actor submitting the request needs an ack and the service needs to scale to an arbitrary number of actors. </p> <p class="wp-block-paragraph">In this scenario since the actor does not need a response from the core system (backend) right away (perhaps a notification is sent after), we can add a buffer in between in the form of a message queue and smooth the flood of submissions from the front door to a manageable set of requests to the backend</p> <div class="wp-block-image"> <figure class="aligncenter size-large is-resized"><a href="https://alok-mishra.com/wp-content/uploads/2023/05/alok-mishra-event-command-example.png"><img src="https://alok-mishra.com/wp-content/uploads/2023/05/alok-mishra-event-command-example.png?w=959" alt="" class="wp-image-2711" width="366" height="391" /></a><figcaption class="wp-element-caption">A buffered command pattern uses a command event with some data. It buffers the service providers from a flood of requests which are one-wa</figcaption></figure> </div> <h2 class="wp-block-heading">3. Building complex things with events</h2> <p class="wp-block-paragraph">After you recognise and understand the key event-driven patterns you can now build complex solutions using event choreography. Note this is a specific solution style and different from orchestration where a central orchestrator drives the outcome. An orchestration can use events between the orchestrator and the services </p> <div class="wp-block-image"> <figure class="aligncenter size-large"><a href="https://alok-mishra.com/wp-content/uploads/2023/05/alok-mishra-choreography-vs-orchestration.png"><img src="https://alok-mishra.com/wp-content/uploads/2023/05/alok-mishra-choreography-vs-orchestration.png?w=1024" alt="" class="wp-image-2742" /></a><figcaption class="wp-element-caption">Choreography and orchestration both use events but choreography specifically relies on events and rules to handle those events to drive an end-to-end outcome</figcaption></figure> </div> <h2 class="wp-block-heading">4. Wrap up</h2> <p class="wp-block-paragraph">Asynchronous event-driven communications are different from real-time synchronous communications and knowing the context, constraints and patterns can help engineers and architects select the right integration tools and design more robust integrations</p> <p class="wp-block-paragraph">We all (integration engineers) have horror stories about event-driven solutions gone wrong (especially in production) so I want to wrap up this post with emphasis on tracking, correlation and observability as key quality attributes of an event-driven solution</p> <p class="wp-block-paragraph">Hope you can now distinguish between the event patterns depending on the communication content and classify them into state change, command or data events</p>

When integrating systems we often end-up writing asynchronous messaging interfaces for mostly system-to-system communications. This conversation technique is great because it does not require the sender and receiver to stay connected to each other in a session at the same instance in time, it is non-blocking and often brokered (like the post office which brokers letters between a consumer and receiver)

This conversation technique can seem “slow” and “unreliable” compared to a synchronous real-time call, however this is a common fallacy because a good message broker and message persistence can make this communication style more reliable than real-time communications over a unreliable network

1. Asynchronous and Synchronous communications can happen within a broad context

Note cross-system asynchronous communication happens in the context of an end-to-end conversation and there may be synchronous real-time communication between the front end and back end (search, update etc) leading to asynchronous cross-system integration behind the scenes (sync of updated data)

Broader conversational context could consist of synchronous and asynchronous communication

There may also be asynchronous client conversations like one-way form submission or outbound notification to client which trigger async communication or are triggered by async notifications. Knowing and recognising these wider contextual communication patterns makes integration solution robust beyond mere “sync” and “async” patterns

Lets examine next the patterns used by users and systems communicate asynchronously

2. There are 3 key asynchronous communication patterns using events and data

There many ways in which users and systems communicate asynchronously and we can broadly classify these into the follow key patterns:

State change notification
State change with state snapshot data (lightweight or bulk )
Events with commands

2.1. State change events

Events containing lightweight information about the state of an entity are state change events. These events are great for triggering actions in downstream consumers (event choreography) for de-coupled integrations. For example, state of a business entity is propagated via an event queue or topic e.g “Order Confirmed”, “Claim Submitted”, “Student Enrolled” etc.

2.1.1 Considerations

Use timestamps and correlation identifiers to order and relate state change events as consumers may need ways to determine when and in which order these happened. With state change events consumers need a way to query the latest state and often these messages are accompanied by query APIs to get the latest state. The state change event pattern is safer than full state transfer event mechanism (pattern 2) because it does not explicit require the messages to be in order (i.e. the consumer can react to an event and query the latest state vs learning the latest state from a series of messages)

2.1.2 Delivery

State change events are delivered by using a broker and a message topic in a publish/subscribe (pub-sub) model. The last mile delivery of these messages to the subscribers can be a queue but the central platform is often “durable” topic for multiple subscribers. A durable topic keeps the message for a while until all consumers have read the message (like a bulletin board)

A Broker is the infrastructure component that facilitates message transfer in an asynchronous communication pattern

Note a durable topic may or may not keep a series of messages and do not allow consumers to replay messages from a specific point in time, this is where streaming platforms shine. For example, Salesforce platform events provides the ability for each consumer to move a pointer back to a point up to 24-hours in the past and replay the messages

The state transitions happen on “saved”, “submitted” etc above and the events from master system to downstream subscribers can trigger reactions (or nothing). These reactions can be part of a Choreography solution

2.2. State Change with Snapshot (data)

In this pattern, events contain full information about the entity. These events are typically used for data synchronisation with downstream consumers in enterprise application integration (EAI) scenarios. The message provider (source) either sends the entire snapshot or adapters (integration adapters) for the provider intercept an event and publish the full snapshot in a query-enrich pattern

2.2.1 Considerations

This pattern of sending “chunks” of data vs the notification of state change requires more stability in the way the messages are sent for robust and reliable delivery.

Order: First it requires the messages be in-order to avoid overwriting new updates with older ones
Durability: Secondly, it requires durability and persistence in messaging infrastructure so as to not lose any in-flight messages as dropped updates can have a big impact.
Observability & Reconciliation: Data synchronisation issues across integrated systems is a common challenge therefore this approach also requires observability and reconciliation processes orthogonal to the integrations
Lightweight vs Bulk data transfer:The size of the data synchronised across the systems needs to be considered as well given some of the messaging infrastructure is not suitable to handle large volumes of data streaming across systems. Traditionally these are handed over to bulk ETL jobs for large data synchronisation and more recently this is done through streaming data technologies like Apache Kafka

2.2.2 Delivery

Events with state and data are delivered using similar messaging infrastructure however given the need for ordering and persistence there are first-in-first-out (FIFO) channels and durable stores to support the transport

2.2.2.1 Using Message Brokers or Event Bus : For light weight data synchronisation across systems a pub/sub model with a message topic (or event bus) is suitable

FIFO queues and durable stores are often needed for events that require message ordering and persistence

2.2.2.2 Using streaming platform for Events like Salesforce Platform Events: For more advanced consumption requirements (with longer persistence, message playback across a small window) an event streaming platform can be used. For example, we have seen an evolution of the messaging infrastructure into streaming platforms which can provide the consumers the ability to replay messages up to a certain period in time (e.g. Salesforce platform events vs generic events) using a consumer specific pointer.

Streaming platform for events +1 on the traditional event bus and message brokers by providing better protocol, durability and access

2.2.2.3 Message Broker vs Event Streaming Platform

A message broker and a platform for streaming events both have durability however the key difference is the in how they provide the data to the consumers. A platform for streaming events provides longer duration for storing messages (days vs hours), allows consumers to navigate the sequence of messages using their consumer “pointer” to replay from a point in time and allows for multiple reads (vs one time per subscriber)

Durable message topic vs a Streaming platform

2.2.2.4 Data Streams vs Event Streams:

A streaming platform for events as mentioned above has better features for client applications and more durability than traditional message brokers however these products can be limited in size and may not fit large data transfer use cases. For example, traditional extract transfer and load (ETL) workloads which were done using batch processes cannot be solved using an event streaming platform given the need for larger and longer storage, direct connection to database change logs and ability to do processing on streams of data

This is where tools like Apache Kafka ([1]) shine. Kafka provides a highly scalable, durable and robust streaming platform for events and data with connectivity to traditional databases. Companies like Confluent lead the implementation space with connectors and query language over the streams (KSQL) and provide features which have made event and data driven solutions highly reliable, extensible and robust ([2]).

Event Streaming platforms use pattern 2 where changes in data is transmitted to the stream by the producer. Instead of an application sending this, the data changes can come from the database change logs (e.g. via Kafka connectors)

This technique of using streaming platforms like Apache Kafka as the backbone for processing large streams of data to integrate providers and consumers in real-time is called Data in Motion

2.3 Command Event

Events used to create or update data or invoke some action in an end system (service provider) are called command events. These are an anti-pattern and should be avoided given they is no way the invoker/consumer of a service can know if the command was successful or not

But there are exceptions no? A buffered command pattern violates this and perhaps we need a command event pattern for scenarios like this example below. Imagine a scenario where a flood of requests are expected with one-way invocation, for example, application submission (voting) where the actor submitting the request needs an ack and the service needs to scale to an arbitrary number of actors.

In this scenario since the actor does not need a response from the core system (backend) right away (perhaps a notification is sent after), we can add a buffer in between in the form of a message queue and smooth the flood of submissions from the front door to a manageable set of requests to the backend

A buffered command pattern uses a command event with some data. It buffers the service providers from a flood of requests which are one-wa

3. Building complex things with events

After you recognise and understand the key event-driven patterns you can now build complex solutions using event choreography. Note this is a specific solution style and different from orchestration where a central orchestrator drives the outcome. An orchestration can use events between the orchestrator and the services

Choreography and orchestration both use events but choreography specifically relies on events and rules to handle those events to drive an end-to-end outcome

4. Wrap up

Asynchronous event-driven communications are different from real-time synchronous communications and knowing the context, constraints and patterns can help engineers and architects select the right integration tools and design more robust integrations

We all (integration engineers) have horror stories about event-driven solutions gone wrong (especially in production) so I want to wrap up this post with emphasis on tracking, correlation and observability as key quality attributes of an event-driven solution

Hope you can now distinguish between the event patterns depending on the communication content and classify them into state change, command or data events

Messaging, Events and Data-in-Motion: Asynchronous communication patterns

1. Asynchronous and Synchronous communications can happen within a broad context