When integrating systems we often end-up writing asynchronous messaging interfaces for mostly system-to-system communications. This conversation technique is great because it does not require the sender and receiver to stay connected to each other in a session at the same instance in time, it is non-blocking and often brokered (like the post office which brokers letters between a consumer and receiver)
This conversation technique can seem “slow” and “unreliable” compared to a synchronous real-time call, however this is a common fallacy because a good message broker and message persistence can make this communication style more reliable than real-time communications over a unreliable network
1. Asynchronous and Synchronous communications can happen within a broad context
Note cross-system asynchronous communication happens in the context of an end-to-end conversation and there may be synchronous real-time communication between the front end and back end (search, update etc) leading to asynchronous cross-system integration behind the scenes (sync of updated data)
There may also be asynchronous client conversations like one-way form submission or outbound notification to client which trigger async communication or are triggered by async notifications. Knowing and recognising these wider contextual communication patterns makes integration solution robust beyond mere “sync” and “async” patterns
Lets examine next the patterns used by users and systems communicate asynchronously
2. There are 3 key asynchronous communication patterns using events and data
There many ways in which users and systems communicate asynchronously and we can broadly classify these into the follow key patterns:
- State change notification
- State change with state snapshot data (lightweight or bulk )
- Events with commands
2.1. State change events
Events containing lightweight information about the state of an entity are state change events. These events are great for triggering actions in downstream consumers (event choreography) for de-coupled integrations. For example, state of a business entity is propagated via an event queue or topic e.g “Order Confirmed”, “Claim Submitted”, “Student Enrolled” etc.
Use timestamps and correlation identifiers to order and relate state change events as consumers may need ways to determine when and in which order these happened. With state change events consumers need a way to query the latest state and often these messages are accompanied by query APIs to get the latest state. The state change event pattern is safer than full state transfer event mechanism (pattern 2) because it does not explicit require the messages to be in order (i.e. the consumer can react to an event and query the latest state vs learning the latest state from a series of messages)
State change events are delivered by using a broker and a message topic in a publish/subscribe (pub-sub) model. The last mile delivery of these messages to the subscribers can be a queue but the central platform is often “durable” topic for multiple subscribers. A durable topic keeps the message for a while until all consumers have read the message (like a bulletin board)
Note a durable topic may or may not keep a series of messages and do not allow consumers to replay messages from a specific point in time, this is where streaming platforms shine. For example, Salesforce platform events provides the ability for each consumer to move a pointer back to a point up to 24-hours in the past and replay the messages
The state transitions happen on “saved”, “submitted” etc above and the events from master system to downstream subscribers can trigger reactions (or nothing). These reactions can be part of a Choreography solution
2.2. State Change with Snapshot (data)
In this pattern, events contain full information about the entity. These events are typically used for data synchronisation with downstream consumers in enterprise application integration (EAI) scenarios. The message provider (source) either sends the entire snapshot or adapters (integration adapters) for the provider intercept an event and publish the full snapshot in a query-enrich pattern
This pattern of sending “chunks” of data vs the notification of state change requires more stability in the way the messages are sent for robust and reliable delivery.
- Order: First it requires the messages be in-order to avoid overwriting new updates with older ones
- Durability: Secondly, it requires durability and persistence in messaging infrastructure so as to not lose any in-flight messages as dropped updates can have a big impact.
- Observability & Reconciliation: Data synchronisation issues across integrated systems is a common challenge therefore this approach also requires observability and reconciliation processes orthogonal to the integrations
- Lightweight vs Bulk data transfer:The size of the data synchronised across the systems needs to be considered as well given some of the messaging infrastructure is not suitable to handle large volumes of data streaming across systems. Traditionally these are handed over to bulk ETL jobs for large data synchronisation and more recently this is done through streaming data technologies like Apache Kafka
Events with state and data are delivered using similar messaging infrastructure however given the need for ordering and persistence there are first-in-first-out (FIFO) channels and durable stores to support the transport
184.108.40.206 Using Message Brokers or Event Bus : For light weight data synchronisation across systems a pub/sub model with a message topic (or event bus) is suitable
220.127.116.11 Using streaming platform for Events like Salesforce Platform Events: For more advanced consumption requirements (with longer persistence, message playback across a small window) an event streaming platform can be used. For example, we have seen an evolution of the messaging infrastructure into streaming platforms which can provide the consumers the ability to replay messages up to a certain period in time (e.g. Salesforce platform events vs generic events) using a consumer specific pointer.
18.104.22.168 Message Broker vs Event Streaming Platform
A message broker and a platform for streaming events both have durability however the key difference is the in how they provide the data to the consumers. A platform for streaming events provides longer duration for storing messages (days vs hours), allows consumers to navigate the sequence of messages using their consumer “pointer” to replay from a point in time and allows for multiple reads (vs one time per subscriber)
22.214.171.124 Data Streams vs Event Streams:
A streaming platform for events as mentioned above has better features for client applications and more durability than traditional message brokers however these products can be limited in size and may not fit large data transfer use cases. For example, traditional extract transfer and load (ETL) workloads which were done using batch processes cannot be solved using an event streaming platform given the need for larger and longer storage, direct connection to database change logs and ability to do processing on streams of data
This is where tools like Apache Kafka () shine. Kafka provides a highly scalable, durable and robust streaming platform for events and data with connectivity to traditional databases. Companies like Confluent lead the implementation space with connectors and query language over the streams (KSQL) and provide features which have made event and data driven solutions highly reliable, extensible and robust ().
This technique of using streaming platforms like Apache Kafka as the backbone for processing large streams of data to integrate providers and consumers in real-time is called Data in Motion
2.3 Command Event
Events used to create or update data or invoke some action in an end system (service provider) are called command events. These are an anti-pattern and should be avoided given they is no way the invoker/consumer of a service can know if the command was successful or not
But there are exceptions no? A buffered command pattern violates this and perhaps we need a command event pattern for scenarios like this example below. Imagine a scenario where a flood of requests are expected with one-way invocation, for example, application submission (voting) where the actor submitting the request needs an ack and the service needs to scale to an arbitrary number of actors.
In this scenario since the actor does not need a response from the core system (backend) right away (perhaps a notification is sent after), we can add a buffer in between in the form of a message queue and smooth the flood of submissions from the front door to a manageable set of requests to the backend
3. Building complex things with events
After you recognise and understand the key event-driven patterns you can now build complex solutions using event choreography. Note this is a specific solution style and different from orchestration where a central orchestrator drives the outcome. An orchestration can use events between the orchestrator and the services
4. Wrap up
Asynchronous event-driven communications are different from real-time synchronous communications and knowing the context, constraints and patterns can help engineers and architects select the right integration tools and design more robust integrations
We all (integration engineers) have horror stories about event-driven solutions gone wrong (especially in production) so I want to wrap up this post with emphasis on tracking, correlation and observability as key quality attributes of an event-driven solution
Hope you can now distinguish between the event patterns depending on the communication content and classify them into state change, command or data events