How Lozen™ Stands Apart from Other IBM zSystems Data Access Approaches
- The IBM zSystems platform stores an estimated 80% of the world’s critical business data. As customers migrate applications to the cloud and distributed platforms, mainframe data access demands intensify.
- There are a variety of mainframe data access approaches, but many fall short. Legacy data access approaches can be complicated, wasteful, and cost-intensive. Customers spend a significant amount of time and money writing and maintaining custom code to manually extract static, out-of-sync data using ETL, FTP, and APIs.
- Our flagship product, Lozen, revolutionizes mainframe data access, providing robust real-time, read-write access for the first time. With Lozen, your data remains in place, avoiding the need to fully understand and reconcile application-to-data dependencies for shared data. You can safely migrate your high-impact applications to the cloud and distributed platforms without risking other applications that depend on that same data. The result is faster, safer, lower-cost migrations and earlier success in your application migration projects.
In this blog, we explore several IBM zSystems data access methods; what differentiates Lozen from them; and how Lozen can work in tandem with existing methods to transform data access in a way that best suits your organization’s business needs.
What is Data Access?
Data access refers to the ability to examine and manipulate data stored within a database or other repository. Most organizations authorize users to access company data in a manner that meets security, privacy, and compliance requirements.
Determining Data Access Needs
Before determining a data access method, it’s important to determine your organization’s data access needs and which data access tools will best meet these needs.
One important consideration in determining data access needs is to distinguish between cases where the data are “read only” and those where the data need to be “read-write.” For example, if a copy of the data is needed for decision-making, there may be no need to generate direct updates to the original data sources. But if the goal is to perform data cleansing or to support other data quality initiatives, this usually involves some form of update to the original data sources.
Another key issue is data currency. It’s important to determine the tolerance to or the impact of working with data that are minutes, hours, or days old with varying degrees of consistency with the original data source. There are cases where data currency is not as important. If the use case is decision-making based solely on past transactions or events and there is no need to generate notifications in close to real time, then it may not be as important to have access to the most current data.
It is also increasingly common for new customer-facing applications and services to be reliant on mainframe data that must continue to reside on the mainframe. Customer self-service applications commonly allow customers to update their data. New services may rely on the aggregation of multiple different mainframe and non-mainframe data sources.
Legacy Data Access Approaches and Tools
Data access tools are used in a variety of contexts, including data and application integration, data migration, hybrid and/or multicloud data access, and Master Data Management.
Such tools generally fall into a few familiar categories:
- Bulk data transfer, including ETL, ELT, and simple file transfer methods such as FTP
- Data replication
- Data virtualization
- Message- and stream-based integration, retrieval, and/or update
- API-based data access
With the possible exception of data virtualization, all of the legacy data access approaches require some form of “change data capture” methods to maintain synchronization between replicated data and the original data source.
Change Data Capture: A Real-World Example
Let’s look at a simple scenario to illustrate the necessity of change data capture with most IBM zSystems data access methods, including ETL.
What is Change Data Capture?
First let’s define what we mean by “change data capture.” Simply stated, when the original master data source changes, we need to somehow capture that change—for example, send the changed data (and only the changed data) to the replicated dataset(s) so that the copies are kept in sync with the original. Even if the data are read-only, the needs of data currency may require frequent or continuous updates to the replicas.
Now, consider the case where batch and transaction processing systems store customer data in VSAM files on an IBM zSystems mainframe. The marketing department requires customer data in a cloud-based business intelligence (BI) tool. They ask the IT department to provide a nightly extract of the customer file delivered to an import process provided by the BI tool.
An IT analyst is assigned to understand the requirements including what format the data need to be in for ingestion by the BI tool. They quickly run into many of the typical issues that arise in this scenario. The VSAM file may hold gigabytes of data. The data must be extracted from the file using a nightly batch job, then converted from the mainframe data format to the text file format needed by the BI tool. Not only does this require translation from EBCIDC to ASCII, but various numeric and date fields need to be unpacked and converted to simple text representations.
Once the data is extracted and transformed (the “ET” of “ETL”), it must then be transferred over a network to the cloud target. Depending on the size of the extract, this can be a time-consuming and costly process.
Once transferred, the data now must be loaded into the BI tool. In the worst case, this is a manual process that requires an end user of the BI tool to tell the tool where the extract resides in the cloud file system and to then kick off the load process. As with the data transfer, depending on the size of the data set, this could be another time-consuming process that must be monitored for successful completion, which presupposes that the extract and transform were done correctly.
Only now can the marketing department get to work. But because they need the data to be current every day, the entire process starts over again tomorrow, and the day after that, and so on.
Regardless of how frequently the data gets refreshed, it is by definition a point-in-time snapshot that grows increasingly out of sync with the source data from the moment the ETL process is started. Because the extract is a copy of the original data at a point in time, it is to some degree different than the source between the time the copy is made and the time updates occur to the source data. Which is why the marketing department needs regular updates, such as a nightly extract. This is the “brute force” approach to change data capture: copying all the data (much of which may not have changed) on a regular basis.
But the costs in this scenario are constant and can be large. The cost includes the development and testing of the extract and transform process; the processing time of the extract and transform job on the mainframe; the data transfer across the network; the cloud storage used to receive the data; and the additional storage and processing time consumed by the BI application.
Some tools have been developed and refined that provide scripting, automation, and ease of use (such as a graphical user interface for development of ETL jobs by end users). Some of the better tools on the market also provide an automated change data capture process, so that the data can be continuously or at least periodically updated or resynchronized, without moving the entire dataset each time. But these features are essentially refinements and efficiencies, not a fundamental change to the nature and costs of the process of ETL.
Data Transfer / ETL and Data Replication Approaches
We’ve seen that one of the refinements we need to make in the data transfer or replication/synchronization approach is change data capture.
We could just run a job on some regular schedule (e.g. hourly, nightly, weekly) to determine what data changed, extract just the changed records or fields (in database terms rows and columns), then transfer those changes to wherever the copies are kept. On the receiving side, there would need to be a process (on demand or on a schedule) to read the change set and apply those changes to the replicated dataset.
Change detection in a database can be reasonably straightforward. But for VSAM and other sequential files, change detection may involve multiple, repeated passes of a file to compare before and after images of the data. At the very least it requires specialized “agents” that are hooked in to the I/O stream on the mainframe, or that inspect change logs for the VSAM files.
This is a substantial amount of processing and infrastructure that must be created or provisioned, installed, configured, operated, scheduled, and monitored. What may have been saved in processing, network, and human cost over the scheduled bulk data transfer of the original ETL process comes at the cost of increased complexity. Not to mention the IT project(s) that must be created, managed, and executed to create all the necessary processing and to move it into production.
The good news is many of the vendors in the legacy ETL category have created tools specifically for change data capture. These same tools can also be applied to the broad exercise of Master Data Management. They provide an out-of-box implementation and varying degrees of automation of the steps described earlier that make up a robust and effective change data capture process.
But the whole change data capture process raises some other questions:
- How is notification of data change delivered and managed?
- Who receives the notification?
- How timely is that notification?
- How timely should it be?
- What happens when the notification is delivered and received?
Some tools answer and handle some or all of this. Part of the tool is a process running at the source (the mainframe) that is configured to watch relevant data sources (files, database tables, etc.), record changes, and capture a snapshot of what changed. This change set is then delivered to a companion process running near each of the replica datasets, and the changes are applied.
Data Virtualization Approaches
Other approaches have emerged in response to this very common situation in modern data processing. One of these is data virtualization. This is a category of tool that allows a data specialist to construct a virtual database that can be accessed by SQL queries or other APIs (the same way data in a standard relational database are accessed). Under the covers, the tool can connect to other data sources and send queries to them, then compose the returned data into a single view in response to the query issued against the virtual database.
The virtual database approach has eliminated the batch-oriented extract job. The data sources are accessed in near real-time. But a few pitfalls remain.
For example, someone must be skilled in the use of the tool, which manages the transformation between data formats. It requires data source configuration, query testing, and troubleshooting if the query fails for some reason. It also requires that the virtual database and the underlying data sources must be accessible via SQL, which typically limits the data sources to relational databases.
There are some approaches that allow SQL access to VSAM datasets. But this is usually supported by a substantial investment in software infrastructure in addition to the virtual database tool. Some allow updates to be written to the original VSAM file via SQL, some do not.
Message- and API-Based Approaches
Another approach that many organizations have adopted is message- and/or API-based data access. Given the presence of messaging systems such as Enterprise Service Bus, Kafka, and others in many large enterprises, it makes sense that these would be leveraged to provide change notification on data sources. Data streaming services leverage the messaging systems to provide direct delivery of changed data. These methods have similar challenges to those mentioned in other approaches.
On the mainframe, message queueing systems are mature and robust but not inexpensive. Again, if the investment has already been made, it is tempting to use them. But most shops are painfully aware of IBM’s consumption-based pricing models. Customers have the goal of reducing — not driving up — mainframe costs.
Furthermore, processes must be created or procured to provide the actual change data capture function, both on the source and the targets. Messaging systems provide notification, but to provide full read/write capability, new message-based applications must be created.
This brings us to the API-based approach. An API-first approach to integration and indeed application development has become increasingly popular and for good reason. By providing data access APIs on legacy data sources (coupled with the message-based notification mechanisms), companies can create relatively lightweight processes on most target systems to leverage those APIs to retrieve necessary data from legacy and master data sources. If the API infrastructure is defined and deployed appropriately, this can also extend to full read-write capability.
But these processes are typically purpose-built. They involve programming teams to build, and DevOps teams to deploy and manage on both the source and target platforms.
This raises another interesting consideration—one of application architecture. Messaging and stream-based approaches are often used to support what could be considered “event driven” architecture. That is, as data change events occur, applications can be notified and processing is triggered.
Many applications could be considered “data driven,” and legacy applications were often designed this way. In this case, an application is started or scheduled, and it consumes the available data until the process finishes. This approach may not be well suited to an event-based or streaming data delivery mechanism without some mitigation to make the data appear as a sequential or keyed data file.
The majority of mainframe-based applications considered for migrating to cloud or distributed platforms use this file-based data access method. In the case where the data must remain on the mainframe, companies are faced with the dilemma of attempting to reduce mainframe costs by moving applications, but then incurring additional costs due to the need for moving the data to a location that can be accessed by the migrated applications.
The Lozen Approach
Lozen brings a revolutionary approach to the IBM zSystems data access methods we’ve discussed so far. Lozen was carefully designed to provide the most robust real-time data access, with the least amount of investment in additional software infrastructure.
First, because Lozen provides real-time access to original master data sources with full data integrity and security, there is no need to replicate entire datasets to one or more target systems. Because there is no data replication, there is no need for change data capture and notification, with the attendant resynchronization of target datasets, and no programming of purpose-built processes need occur on the source or target systems.
Second, we wanted to provide simple, yet complete network-based access to IBM VSAM and other datasets without significantly impacting processing cost. Most of the tools and approaches we’ve been discussing consume general purpose processing capacity which is the driver of cost for IBM z/OS customers. Lozen runs on specialty processors (zIIPs). Because IBM imposes no additional software cost for zIIP usage, Lozen is not a significant driver of increased cost on the IBM zSystems platform. Read more about our zIIP architecture here.
Third, we did not want to require any software installation on the client side (e.g. Linux and Windows). Because we leverage NFS[i], simple scriptable configuration can be applied on the target system that makes a VSAM file appear as if it were present in the local filesystem and is accessed with the same I/O commands that are used against any ordinary Linux or Windows file. By directly accessing VSAM files from an application running on a non-IBM platform, such as a Windows server or Linux VM running on Microsoft Azure, we have eliminated many of the costs and challenges discussed in all the other approaches.
Lozen offers several other data access advantages:
- A robust, secure, and performant data server that resides on the IBM zSystems mainframe
- Full access to VSAM data sources of all types: KSDS, ESDS, and RRDS with full VSAM semantics
- Access to sequential files, including data that is kept as UNIX System Services files, as well as HFS/ZFS files
- The ability to make requests through NFS, a mature and widespread industry-standard file protocol that was designed and built specifically to extend POSIX-standard file I/O across networks
Here is a summary of how Lozen differentiates from other data access methods.
Our research and understanding of many other products and approaches convinces us that Lozen is unique in the market today.
With Lozen, your data remains in place, avoiding the need to fully understand and reconcile application-to-data dependencies for shared data. You can safely migrate your high-impact applications to the cloud and distributed platforms without risking other applications that depend on that same data. The result is faster, safer, lower-cost migrations and earlier success in your application migration projects.
Furthermore, Lozen can complement many of the other approaches; there is no requirement for a “rip and replace” of existing expensive software investments. Indeed, Lozen can expand the reach of many existing tools in Data Integration and Master Data Management suites by providing simple, transparent access to VSAM data.
To learn more about how to unlock the power of real-time, read-write IBM zSystems data access with Lozen:
- Try Lozen for free.
- Explore our demos page and YouTube channel.
- Watch webinars and read the blog.
- Still have questions? Contact us.