Journal of Distributed Software Engineering, Architecture and Design
Integration Patterns 2026: Data Integration + Messaging (Bulk, Delta, Notifications)
<div class="cs-rating pd-rating" id="pd_rating_holder_1819065_post_3115"></div>
<p class="wp-block-paragraph">I started the year using Gemini Pro to take our integration patterns and covert them into “playing card” style pattern library so that we can print and play! The content is mine, I used AI to generate the images with still a few issues with image text – oh well, maybe something to refine later in 2026 or next year</p>
<p class="wp-block-paragraph">Based on our last post we notice that not every integration is real-time and bulk/batch patterns take an integration engineer closer to data/ETL engineers. So these patterns cover bulk/Delta movement and notification-style messaging that avoids rebuilding whole state and are for integrations engineering learning about data patterns</p>
<h2 class="wp-block-heading">Included patterns</h2>
<ul><li><a href="#dsp-1">DSP-1 — Stream Processing to Materialized View</a></li><li><a href="#dsp-2">DSP-2 — Backfill and Replay</a></li><li><a href="#msg-1">MSG-1 — Claim Check for Large Payloads</a></li></ul>
<h2 class="wp-block-heading" id="dsp-1">DSP-1: Stream Processing to Materialized View</h2>
<figure class="wp-block-image size-large"><img src="https://alok-mishra.com/wp-content/uploads/2026/01/dsp-1_stream_processing_to_materialized_view.png?w=1024" alt="" class="wp-image-3132" /></figure>
<p class="wp-block-paragraph">Use stream processing (Kafka Streams/Flink) to build a queryable materialized view for fast reads.</p>
<h3 class="wp-block-heading">When to use</h3>
<ul><li>High-volume event streams need queryable state.</li><li>You want low-latency reads without hitting SoR.</li><li>Derived views (counts, status, joins) are needed.</li></ul>
<h3 class="wp-block-heading">Pros</h3>
<ul><li>Fast reads.</li><li>Scales with partitions.</li><li>Decouples read model from write model.</li></ul>
<h3 class="wp-block-heading">Cons</h3>
<ul><li>Eventual consistency.</li><li>State store management.</li><li>Reprocessing complexity on schema changes.</li></ul>
<h3 class="wp-block-heading">PlantUML</h3>
<pre class="wp-block-code"><code>@startuml
title Stream Processing to Materialized View
participant "Kafka Topic" as Kafka_Topic
participant "Stream Processor" as Stream_Processor
participant "State Store/DB" as State_Store_DB
participant "API" as API
Kafka_Topic -> Stream_Processor: Consume events
Stream_Processor -> State_Store_DB: Update materialized view
API -> State_Store_DB: Query view
API -> API: Return result
@enduml</code></pre>
<h2 class="wp-block-heading" id="dsp-2">DSP-2: Backfill and Replay</h2>
<figure class="wp-block-image size-large"><img src="https://alok-mishra.com/wp-content/uploads/2026/01/dsp-2_backfill_and_replay.png?w=1024" alt="" class="wp-image-3131" /></figure>
<p class="wp-block-paragraph">Rebuild downstream state by replaying historical events or re-exporting bulk snapshots, then switching back to streaming.</p>
<h3 class="wp-block-heading">When to use</h3>
<ul><li>New consumer onboarding needs historical data.</li><li>State store corruption or schema evolution requires rebuild.</li><li>You are migrating systems.</li></ul>
<h3 class="wp-block-heading">Pros</h3>
<ul><li>Repeatable recovery.</li><li>Supports new consumers.</li><li>Improves resilience to data loss.</li></ul>
<h3 class="wp-block-heading">Cons</h3>
<ul><li>Operationally heavy.</li><li>Needs deterministic processing.</li><li>Requires retention and access controls.</li></ul>
<h3 class="wp-block-heading">PlantUML</h3>
<pre class="wp-block-code"><code>@startuml
title Backfill and Replay
actor "Ops" as Ops
participant "Bulk Snapshot Export" as Bulk_Snapshot_Export
participant "Kafka Topic" as Kafka_Topic
participant "Consumer" as Consumer
Ops -> Bulk_Snapshot_Export: Generate snapshot
Bulk_Snapshot_Export -> Consumer: Load snapshot (seed)
Kafka_Topic -> Consumer: Replay events from point-in-time
Consumer -> Consumer: Switch to live consumption
@enduml</code></pre>
<h2 class="wp-block-heading" id="msg-1">MSG-1: Claim Check for Large Payloads</h2>
<figure class="wp-block-image size-large"><img src="https://alok-mishra.com/wp-content/uploads/2026/01/msg-1_claim_check_for_large_payloads.png?w=1024" alt="" class="wp-image-3130" /></figure>
<p class="wp-block-paragraph">Store large payloads in object storage and send only a reference (URL/key) through Kafka or APIs.</p>
<h3 class="wp-block-heading">When to use</h3>
<ul><li>Payloads exceed broker/message size limits.</li><li>Multiple consumers need the same large document.</li><li>You need to minimise broker load.</li></ul>
<h3 class="wp-block-heading">Pros</h3>
<ul><li>Keeps Kafka lean.</li><li>Avoids broker memory/throughput issues.</li><li>Enables parallel downloads.</li></ul>
<h3 class="wp-block-heading">Cons</h3>
<ul><li>External storage becomes dependency.</li><li>Reference security and expiry management.</li><li>Two-phase access semantics.</li></ul>
<h3 class="wp-block-heading">PlantUML</h3>
<pre class="wp-block-code"><code>@startuml
title Claim Check for Large Payloads
participant "Producer" as Producer
participant "Object Store" as Object_Store
participant "Kafka Topic" as Kafka_Topic
participant "Consumer" as Consumer
Producer -> Object_Store: PUT large payload -> objectKey
Producer -> Kafka_Topic: Publish event with objectKey
Kafka_Topic -> Consumer: Deliver event
Consumer -> Object_Store: GET objectKey
@enduml</code></pre>
<ul><li><a href="#kaf-1">KAF-1 — Transactional Outbox + CDC</a></li><li><a href="#kaf-2">KAF-2 — Idempotent Consumer</a></li><li><a href="#kaf-3">KAF-3 — Retry with Backoff + Dead Letter Queue</a></li></ul>
<h2 class="wp-block-heading" id="kaf-1">KAF-1: Transactional Outbox + CDC</h2>
<figure class="wp-block-image size-large"><img src="https://alok-mishra.com/wp-content/uploads/2026/01/kaf-1_transactional_outbox__cdc.png?w=1024" alt="" class="wp-image-3138" /></figure>
<p class="wp-block-paragraph">Write business state and an outbox event in the same DB transaction; a CDC/outbox publisher reliably emits events to Kafka.</p>
<h3 class="wp-block-heading">When to use</h3>
<ul><li>You need reliable events that match committed DB state.</li><li>At-least-once event delivery is acceptable with idempotent consumers.</li><li>You want to avoid dual-write inconsistency.</li></ul>
<h3 class="wp-block-heading">Pros</h3>
<ul><li>Eliminates classic dual-write race.</li><li>Supports replay from DB log.</li><li>Clear producer reliability model.</li></ul>
<h3 class="wp-block-heading">Cons</h3>
<ul><li>Requires outbox table and CDC tooling.</li><li>Event ordering/partitioning still needs design.</li><li>Operational overhead (connectors, monitoring).</li></ul>
<h3 class="wp-block-heading">PlantUML</h3>
<pre class="wp-block-code"><code>@startuml
title Transactional Outbox + CDC
participant "Service" as Service
participant "DB" as DB
participant "CDC/Outbox Publisher" as CDC_Outbox_Publisher
participant "Kafka Topic" as Kafka_Topic
participant "Consumers" as Consumers
Service -> DB: TX: update state + insert OutboxEvent
CDC_Outbox_Publisher -> DB: Read new outbox rows / log
CDC_Outbox_Publisher -> Kafka_Topic: Produce event
Kafka_Topic -> Consumers: Deliver event
@enduml</code></pre>
<h2 class="wp-block-heading" id="kaf-2">KAF-2: Idempotent Consumer</h2>
<figure class="wp-block-image size-large"><img src="https://alok-mishra.com/wp-content/uploads/2026/01/kaf-2_idempotent_consumer.png?w=1024" alt="" class="wp-image-3137" /></figure>
<p class="wp-block-paragraph">Consumer deduplicates and safely reprocesses events using idempotency keys and/or processed-offset checkpoints.</p>
<h3 class="wp-block-heading">When to use</h3>
<ul><li>At-least-once delivery is used (default for Kafka consumers).</li><li>Retries and replays are expected.</li><li>Downstream side-effects must not duplicate.</li></ul>
<h3 class="wp-block-heading">Pros</h3>
<ul><li>Safe retries and replays.</li><li>Enables robust recovery.</li><li>Improves correctness under failures.</li></ul>
<h3 class="wp-block-heading">Cons</h3>
<ul><li>Requires idempotency keys and storage.</li><li>Edge cases for out-of-order updates.</li><li>Needs careful transactional boundaries.</li></ul>
<h3 class="wp-block-heading">PlantUML</h3>
<pre class="wp-block-code"><code>@startuml
title Idempotent Consumer
participant "Kafka Topic" as Kafka_Topic
participant "Consumer" as Consumer
participant "Idempotency Store" as Idempotency_Store
participant "Downstream System" as Downstream_System
Kafka_Topic -> Consumer: Event(key, idempotencyId)
Consumer -> Idempotency_Store: Check/record idempotencyId
Consumer -> Downstream_System: Apply side-effect
Consumer -> Kafka_Topic: Commit offset
@enduml</code></pre>
<h2 class="wp-block-heading" id="kaf-3">KAF-3: Retry with Backoff + Dead Letter Queue</h2>
<figure class="wp-block-image size-large"><img src="https://alok-mishra.com/wp-content/uploads/2026/01/kaf-3_retry_with_backoff__dead_letter_queue.png?w=1024" alt="" class="wp-image-3136" /></figure>
<p class="wp-block-paragraph">Transient failures are retried with backoff; poisoned messages are quarantined to a DLQ with context for remediation.</p>
<h3 class="wp-block-heading">When to use</h3>
<ul><li>Downstream dependencies can fail transiently.</li><li>You need controlled retries and isolation of bad records.</li></ul>
<h3 class="wp-block-heading">Pros</h3>
<ul><li>Prevents consumer stalls.</li><li>Supports operational remediation.</li><li>Improves MTTR.</li></ul>
<h3 class="wp-block-heading">Cons</h3>
<ul><li>Requires retry topic strategy and tooling.</li><li>Risk of silent DLQ accumulation.</li><li>Needs clear ownership for DLQ handling.</li></ul>
<h3 class="wp-block-heading">PlantUML</h3>
<pre class="wp-block-code"><code>@startuml
title Retry with Backoff + Dead Letter Queue
participant "Consumer" as Consumer
participant "Retry Topic(s)" as Retry_Topic_s_
participant "DLQ" as DLQ
participant "Ops/Remediation" as Ops_Remediation
Consumer -> Retry_Topic_s_: Publish for delayed retry
Consumer -> DLQ: Publish poisoned message + error
Ops_Remediation -> DLQ: Investigate and reprocess/fix
@enduml</code></pre>
<h2 class="wp-block-heading">Summary</h2>
<p class="wp-block-paragraph">More data patterns to come and this is an evolving post and I will look to fix the images! I am reading Data Intensive Patterns book as working with some amazing data engineers to bring key patterns to life</p>
I started the year using Gemini Pro to take our integration patterns and covert them into “playing card” style pattern library so that we can print and play! The content is mine, I used AI to generate the images with still a few issues with image text – oh well, maybe something to refine later in 2026 or next year
Based on our last post we notice that not every integration is real-time and bulk/batch patterns take an integration engineer closer to data/ETL engineers. So these patterns cover bulk/Delta movement and notification-style messaging that avoids rebuilding whole state and are for integrations engineering learning about data patterns
Use stream processing (Kafka Streams/Flink) to build a queryable materialized view for fast reads.
When to use
High-volume event streams need queryable state.
You want low-latency reads without hitting SoR.
Derived views (counts, status, joins) are needed.
Pros
Fast reads.
Scales with partitions.
Decouples read model from write model.
Cons
Eventual consistency.
State store management.
Reprocessing complexity on schema changes.
PlantUML
@startuml
title Stream Processing to Materialized View
participant "Kafka Topic" as Kafka_Topic
participant "Stream Processor" as Stream_Processor
participant "State Store/DB" as State_Store_DB
participant "API" as API
Kafka_Topic -> Stream_Processor: Consume events
Stream_Processor -> State_Store_DB: Update materialized view
API -> State_Store_DB: Query view
API -> API: Return result
@enduml
DSP-2: Backfill and Replay
Rebuild downstream state by replaying historical events or re-exporting bulk snapshots, then switching back to streaming.
When to use
New consumer onboarding needs historical data.
State store corruption or schema evolution requires rebuild.
You are migrating systems.
Pros
Repeatable recovery.
Supports new consumers.
Improves resilience to data loss.
Cons
Operationally heavy.
Needs deterministic processing.
Requires retention and access controls.
PlantUML
@startuml
title Backfill and Replay
actor "Ops" as Ops
participant "Bulk Snapshot Export" as Bulk_Snapshot_Export
participant "Kafka Topic" as Kafka_Topic
participant "Consumer" as Consumer
Ops -> Bulk_Snapshot_Export: Generate snapshot
Bulk_Snapshot_Export -> Consumer: Load snapshot (seed)
Kafka_Topic -> Consumer: Replay events from point-in-time
Consumer -> Consumer: Switch to live consumption
@enduml
MSG-1: Claim Check for Large Payloads
Store large payloads in object storage and send only a reference (URL/key) through Kafka or APIs.
When to use
Payloads exceed broker/message size limits.
Multiple consumers need the same large document.
You need to minimise broker load.
Pros
Keeps Kafka lean.
Avoids broker memory/throughput issues.
Enables parallel downloads.
Cons
External storage becomes dependency.
Reference security and expiry management.
Two-phase access semantics.
PlantUML
@startuml
title Claim Check for Large Payloads
participant "Producer" as Producer
participant "Object Store" as Object_Store
participant "Kafka Topic" as Kafka_Topic
participant "Consumer" as Consumer
Producer -> Object_Store: PUT large payload -> objectKey
Producer -> Kafka_Topic: Publish event with objectKey
Kafka_Topic -> Consumer: Deliver event
Consumer -> Object_Store: GET objectKey
@enduml
Write business state and an outbox event in the same DB transaction; a CDC/outbox publisher reliably emits events to Kafka.
When to use
You need reliable events that match committed DB state.
At-least-once event delivery is acceptable with idempotent consumers.
You want to avoid dual-write inconsistency.
Pros
Eliminates classic dual-write race.
Supports replay from DB log.
Clear producer reliability model.
Cons
Requires outbox table and CDC tooling.
Event ordering/partitioning still needs design.
Operational overhead (connectors, monitoring).
PlantUML
@startuml
title Transactional Outbox + CDC
participant "Service" as Service
participant "DB" as DB
participant "CDC/Outbox Publisher" as CDC_Outbox_Publisher
participant "Kafka Topic" as Kafka_Topic
participant "Consumers" as Consumers
Service -> DB: TX: update state + insert OutboxEvent
CDC_Outbox_Publisher -> DB: Read new outbox rows / log
CDC_Outbox_Publisher -> Kafka_Topic: Produce event
Kafka_Topic -> Consumers: Deliver event
@enduml
KAF-2: Idempotent Consumer
Consumer deduplicates and safely reprocesses events using idempotency keys and/or processed-offset checkpoints.
When to use
At-least-once delivery is used (default for Kafka consumers).
Retries and replays are expected.
Downstream side-effects must not duplicate.
Pros
Safe retries and replays.
Enables robust recovery.
Improves correctness under failures.
Cons
Requires idempotency keys and storage.
Edge cases for out-of-order updates.
Needs careful transactional boundaries.
PlantUML
@startuml
title Idempotent Consumer
participant "Kafka Topic" as Kafka_Topic
participant "Consumer" as Consumer
participant "Idempotency Store" as Idempotency_Store
participant "Downstream System" as Downstream_System
Kafka_Topic -> Consumer: Event(key, idempotencyId)
Consumer -> Idempotency_Store: Check/record idempotencyId
Consumer -> Downstream_System: Apply side-effect
Consumer -> Kafka_Topic: Commit offset
@enduml
KAF-3: Retry with Backoff + Dead Letter Queue
Transient failures are retried with backoff; poisoned messages are quarantined to a DLQ with context for remediation.
When to use
Downstream dependencies can fail transiently.
You need controlled retries and isolation of bad records.
Pros
Prevents consumer stalls.
Supports operational remediation.
Improves MTTR.
Cons
Requires retry topic strategy and tooling.
Risk of silent DLQ accumulation.
Needs clear ownership for DLQ handling.
PlantUML
@startuml
title Retry with Backoff + Dead Letter Queue
participant "Consumer" as Consumer
participant "Retry Topic(s)" as Retry_Topic_s_
participant "DLQ" as DLQ
participant "Ops/Remediation" as Ops_Remediation
Consumer -> Retry_Topic_s_: Publish for delayed retry
Consumer -> DLQ: Publish poisoned message + error
Ops_Remediation -> DLQ: Investigate and reprocess/fix
@enduml
Summary
More data patterns to come and this is an evolving post and I will look to fix the images! I am reading Data Intensive Patterns book as working with some amazing data engineers to bring key patterns to life
Alok brings experience in engineering and architecting distributed software systems from over 20 years across industry and consulting. His posts focus on Systems Integration, API design, Microservices and Event driven systems, Modern Enterprise Architecture and other related topics
View all posts by alokmishra
Discover more from Alok Mishra
Subscribe now to keep reading and get access to the full archive.