For decades, Change Data Capture (CDC) has been a popular method for synchronizing data across platforms: the system gets configured to detect and log updates, then the CDC products read transaction journals and replay the changes downstream. Elegant in theory — often brittle in practice.
The Universal CDC Pain Points
Most CDC tools depend on transaction logging to detect changes. That design choice creates several real-world issues in most environments:
- Fragility: Log readers are sensitive to configuration details and depend on complex and proprietary interfaces that are highly specific to the platform and resource. Operational complexity is increased too, especially due to the need to define and manage the log streams the CDC software depends on — a common source of operational error in many sites.
- Overhead & throughput cliffs: Logging overhead can slow critical applications dramatically. A simple update that would normally require one read and one write can balloon into multiple log writes, catalog updates, and I/O operations once CDC logging is enabled — sometimes increasing resource usage by 5x or more. Large batch jobs or database reorganizations add to the problem, flooding pipelines and triggering lag or outright failures.
- Cost pressure: Continuous log scraping drives up CPU/MIPS and related licensing costs. Transaction logs require human attention and management, as well as the expense of physical resources.
- Coverage gaps: Many platforms and datatypes don’t support transactional change logging — simple QSAM sequential files are perfect examples. Heterogeneous estates having mixes of DB2, VSAM, IMS, IDMS, Datacom, Adabas, etc. also find that a single CDC product can’t handle all their data sources.
- Integrity issues: CDC would be a worthy approach if it could promise guaranteed transactional integrity, but this often isn’t the case. Most systems either apply changes as log entries are created (which could lead to unexpected errors should application rollbacks occur) or the CDC can wait until a transaction is committed before propagating changes — but in this case, there’s no guarantee that the remote update will be successful. You pay all the overheads of real-time synchronization, but it’s still not safe to write truly distributed transactions that access source and target concurrently.
Duplicate Records and Unkeyed Data
In many environments, it’s common to have data with duplicate records or data that is completely unkeyed, and this can also present a challenge to most CDC implementations.
For instance, in many IMS systems, the parent-child relationship of the records means that child segments don’t require unique key values. When an update occurs, the CDC product might see that there’s an update for “Record X,” but there might be thousands of “Record X” instances on the target database. If you’re the CDC software, which of these instances do you update?
IMS is a great example, but this situation exists in any network or hierarchical database, or other environments where duplicate records might exist. In practice, any system where data isn’t stored as rows and columns with unique keys makes it challenging to understand how to apply changes when the update might apply to many possible candidate records. Whether an IMS database, a VSAM ESDS, or even a simple sequential file, this is a severe limitation.
Consider a simple example. Suppose we have a database or file having five records that are all duplicates, and an application updates just one of these record instances. The CDC sees a log record notifying it that a record has changed from “A” to “B,” but there’s generally little in the transaction log to let the CDC understand exactly which instance of the record needs to be updated. With keyed data, the CDC could build a database query to update a particular record based on a key contained in the actual record data, but when the records are not unique, this becomes impossible. Which records containing “A” need to be changed? This is not an easily solved shortcoming.
PropelZ™: Replication Without Log Scraping or Keys
PropelZ takes a different path. Rather than scraping logs and inferring change sets, PropelZ performs asynchronous, incremental replication directly from the source data — no dependency on keys and no parsing of transaction logs.
How It Works (Technical Deep Dive)
Think of PropelZ as a three-stage pipeline that you can mix, match, or extend:
1) Input Stage (Pluggable Sources)
- Connect to any type of data, from VSAM or sequential files to DB2, IMS, Adabas, IDMS, Datacom, remote JDBC databases, FTP sites, and more.
- The input stage is open-ended: you can even pre-process the data externally and feed records into PropelZ if you prefer. This is especially useful for specialized extracts, joining data from multiple sources, or for consuming the output of software rather than processing simple files.
- Goal: access records predictably — not their logs. And by operating in this mode, PropelZ can understand structure as well as content, key to supporting duplicate or unkeyed records.
2) Transformation Engine (Smart, Lightweight ELT)
- Digital Signature-based Incrementals: PropelZ computes lightweight, pre-record “digital signatures” (hashes) to determine what actually changed between runs — independent of keys or log formats. These signatures persist from one run to the next, so it’s easy for PropelZ to spot new, deleted, and changed records from any input.
- Key or Record-aware Incrementals: When your data does have reliable keys or record identifiers, PropelZ can use them as a way to do efficient incremental processing. As an example, consider a time-series dataset where the timestamp is available as a key field in the record. PropelZ can be setup to periodically scan this file, starting with whatever timestamp it saw on its last run and continuing to the end of the file. On its next run, the same would happen, keeping the target up to date without needing to compute and store the digital signatures described above.
- Schema Awareness: Provide a simple metadata descriptor (e.g., a COBOL copybook or XML layout) and PropelZ understands your records at a field or column level. Whether the field is character data, binary, packed decimal, floating point, or any other format, the values are automatically transformed into the correct formats as the target is written. If you like, you can even decide what datatypes are used based on the type of target database.
- Flexible Shaping:
- Filter/routing: Process subsets of data or send different records to different tables/targets based on filtering criteria.
- Structure choice: Land as normalized rows & columns or as BLOBs that are bit-for-bit copies of the source data. You can even convert your data to JSON, XML, or CSV if you are feeding it to a custom output stage.
- Automatic “subtable” creation for arrays: If your source data has arrays or exists in unnormalized form, these can be optionally broken out into separate subtables, ensuring easier relational processing.
- De-Duping/Uniqueness: PropelZ stores information about the source data’s structure, eliminating problems of duplicate or unkeyed records. For data having native keys, PropelZ includes these keys as separate columns on the target, making it easy to fetch records by their original keys or by record numbers. For data not having native keys (or having duplicates), PropelZ also generates unique IDs (e.g., row numbers, UUID/GUID, and original RBA/RRNs) for reliable target identity — no source redesign required. When using network or hierarchical databases like IMS, this also allows for including things like the parent record identity in child record tables, preserving the original parent-child relationships.
- Customizable indexing and queries: PropelZ allows you to tailor the SQL queries it uses to target databases. If you want secondary columns to be indexed, that’s easy to do. If you want extra columns added to include record insertion timestamps or other data (for example), that’s easily done too.
3) Output Stage (Targets & APIs)
- Direct JDBC writes to your targets (e.g., Microsoft SQL Server, Snowflake, Databricks, BigQuery, Postgres, Oracle, MySQL, and other JDBC-compatible systems). PropelZ supports precompiled query execution, bulk insert, deferred commits, and includes the industry’s fastest JDBC drivers for the best possible performance with JDBC targets.
- Or bypass databases entirely and call APIs — e.g., ship transformed events into observability or security platforms via their ingestion endpoints.
- Output stage is also replaceable — if you have a custom sink, plug it in. Rather write your data to a Publish/Subscribe queue or use some other approach? Just replace the output stage with something that does whatever you like.
Near Real-Time Without the CDC Headaches
- Micro-batch cadence: Configure runs to check for changes as frequently as your use case demands (e.g., every 30 seconds, every 15 minutes, hourly, nightly). With PropelZ, it’s also not “one size fits all:” if you need different schedules for different data, that’s easy to accomplish.
- Parallelism where it counts: Run multiple jobs to match different workloads (heavy batch vs. chatty transactional). Also, if you need to synchronize data to multiple different targets (including to different types of databases), PropelZ can do this as well.
- Resilience and fault tolerance: In the event of a target or network outage, PropelZ automatically detects what needs to be done to bring the systems back in sync. As soon as service is restored, the next PropelZ run brings you back in proper synchronization with no special intervention.
- Operationally simple: No tuning of log readers; no brittle dependencies on journal internals; no need for logging at all unless you require it for other reasons.
Designed for Mainframe Economics
- zIIP-eligible workload: PropelZ’s processing is zIIP-eligible, minimizing chargeable CPU and helping keep mainframe software costs in check — even when you scale parallel jobs.
- Minimal footprint: Because PropelZ looks at the data directly and uses signatures for deltas, you avoid the constant log churn and parsing overhead typical of CDC.
Security & Access
- Honors z/OS security policy: PropelZ is on-prem software that runs on z/OS and inherits the security context of the user running PropelZ. There’s no cloud provider accessing your mainframe data, and PropelZ only has access to the data that user would normally have access to. For sensitive data, you can point PropelZ to on-prem database targets and then your data never leaves your network.
- Supports database target credentials: The credentials needed to update your target databases are those that the target database system normally supports. Whatever options your target database supports for credentials can be used with PropelZ.
- Network posture: The JDBC network flows that PropelZ uses respect your TCP/IP network security. Most database targets support various SSL/TLS configurations for network data encryption, and any of these may be used with PropelZ. If you leverage VPNs or other network security technology, PropelZ JDBC network flows can be protected as you see fit. The same is true if you are using custom PropelZ output stages to invoke APIs or other transactions.
- Separation of duties: Keep support and operators in your time zone; PropelZ doesn’t require vendor personnel to log into your environments.
Typical Deployment Patterns
- Ad hoc: PropelZ is easy to use, even for non-technical staff. A simple command-line interface lets any user with appropriate permissions use PropelZ whenever needed on an ad hoc basis.
- Batch-aligned sync: Trigger PropelZ at the end of a batch job to push updated files to targets as soon as the data is created. PropelZ is designed to be easy to integrate with any job scheduling software you might have, so these tasks can be automated and triggered in many different ways depending on your needs. Even without job scheduling software, it’s easy to just add a simple processing step to run PropelZ once an application is complete.
- Bulk processing: However you run PropelZ, you always have the option of doing full file synchronization (that is, loading all records to the target). This is useful for smaller files or in cases where a file might be substantially changed, say after being generated by a batch process.
- Incremental processing: Poll for changes as frequently as you like to capture incremental changes that boost interactive experiences (e.g., avoid read traffic on Z by serving from SQL/Snowflake while keeping data fresh).
- Hybrids: Combine each of the above to meet different needs — bulk sync during batch processing, scheduled incremental updates for capturing ongoing changes.
What You Get on Day One
- Out-of-the-box target support (e.g., Microsoft SQL Server) with automatic schema creation from metadata (copybook → tables).
- Fast time to value: Installation is straightforward on z/OS; configuration focuses on your metadata and target connection. No log mining to wrangle.
- Future-proofing: As sources/targets evolve, swap input/output adapters without re-architecting the core pipeline.
The Payoff
- Unblock IMS and other legacy sources where CDC stalls — especially in no-key scenarios.
- Reduce mainframe read traffic by serving operational reads from a cloud/SQL lakehouse while keeping freshness windows tight.
- Feed AI/ML, BI, search, and observability stacks reliably — without brittle CDC pipelines.
The Result
Reliable, repeatable replication from any mainframe data source to any mix of targets — without invasive schema surgery, application coding/changes, or costly operational changes.
If your goal is read-optimized integration for analytics, AI/ML, BI, search, observability, or offloading read traffic from the mainframe, PropelZ’s incremental replication is typically simpler, faster to stand up, and more resilient than traditional CDC.
Next Steps
- Schedule a briefing.
- Watch a demo of PropelZ.
- Try PropelZ.
Learn More
- Visit our Customer Briefing Center.
- Review all VirtualZ Use Cases, Thought Leadership papers, and Solution Briefs.
- Explore our YouTube channel, podcasts, and blog.
- Still have questions? Contact us.