Change Data Capture
Published on: 18 September 2025
High-Level Overview of CDC
graph TD
subgraph Source System
B{Transaction
INSERT, UPDATE, DELETE}
A[Source Database]
end
subgraph CDC Process
C["Change Data Capture
(Captures row-level changes)"]
end
subgraph Downstream Systems
D["Target System
(Data Warehouse, Analytics Platform, etc.)"]
end
B -- writes to --> A
A -- streams changes to --> C
C -- delivers events to --> D
Comparison of CDC Methodologies
graph TD
subgraph "Log-Based CDC (Most Efficient)"
direction TB
A1[Source Database]
A2[Transaction Log]
A3[Log-Parsing Process]
A4[Change Events Stream]
A1 --> A2
A2 -- is read by --> A3
A3 --> A4
end
subgraph "Trigger-Based CDC (Adds DB Overhead)"
direction TB
B1[Application]
B2[Source Table]
B3{Database Trigger}
B4[Change Table]
B5[CDC Process]
B6[Change Events Stream]
B1 -- writes to --> B2
B2 -- fires --> B3
B3 -- inserts copy into --> B4
B4 -- is read by --> B5
B5 --> B6
end
subgraph "Polling-Based CDC (Query-Based)"
direction TB
C2{"Scheduler
(e.g., Cron Job)"}
C3["Polling Query
SELECT * FROM ...
WHERE last_updated > ?"]
C1["Source Table
(with 'last_updated' column)"]
C4[Change Events Stream]
C2 -- triggers --> C3
C3 -- runs against --> C1
C3 --> C4
end
CDC in a Modern Data Architecture
graph LR
subgraph Source Databases
db1[(PostgreSQL)]
db2[(MySQL)]
db3[(MongoDB)]
end
subgraph CDC and Streaming Platform
cdc["CDC Tool
(e.g., Debezium)"]
broker["Message Broker
(e.g., Apache Kafka)"]
cdc -- publishes changes to --> broker
end
subgraph Consumers
consumer1["Stream Processor
(e.g., Apache Flink)"]
consumer2["Data Warehouse
(e.g., Snowflake)"]
consumer3[Microservices]
end
db1 -- captured by --> cdc
db2 -- captured by --> cdc
db3 -- captured by --> cdc
broker -- consumed by --> consumer1
broker -- loaded into --> consumer2
broker -- consumed by --> consumer3