What are the data considerations?

Mar 6, 2020

Naturally, if a job is redirected from one system to another, the data that job needs in order to execute properly must be available to it. Depending on the capacity of your network and your service level agreements, this may mean that products requiring smaller amounts of data are most appropriate for VirtualZ redirection.

However, there are several exceptions. For instance, if you redirect a job in an environment where the required data is accessible from both the source and target, no transmission of the data is required, and thus there is very little redirection overhead. This would be the case when redirecting work within a single datacenter, where your devices are shared between the source and target (or in other cases where some type of disk replication technology is employed). In these cases, workload redirection imposes little performance penalty, and larger datasets can be used with no added delays.

For maximum performance, the source and target need to share physical access to any data needed within redirected jobs. The source and target don’t need to be part of the same JES2 MAS environment or otherwise connected, and they don’t have to be part of the same SYSPLEX or GRS ring: as long as the target has physical access to the same disks as the source system, there’s no I/O overhead when redirecting work.

In a cloud scenario where data is not accessible from the target, applications that process relatively small amounts of data are more appropriate, since execution normally requires movement of inputs and outputs over the network. VirtualZ includes sophisticated algorithms to minimize the amount of data that must be moved, but you should carefully consider the impact of moving applications that process high data volumes.

When redirecting work in cloud or other environments where sharing data isn’t possible, VirtualZ includes several other features that can dramatically reduce data movement overhead:

  • Many times, certain input files are identical on all of your systems, so the local file can be used instead of fetching data from the source system. An example might be a C/C++ compiler’s system header files. These vendor-supplied files are part of the compiler, and thus they tend to be identical on all systems. For files like this, VirtualZ can be configured to use the “local” file on the target system instead of moving data across the network, greatly reducing the need for data movement, especially for products like compilers and others that include system macros, language files, and so on.
  • All data moved over the network can be cached at the target so as to be used over and over again. An example might be an application-specific COBOL copybook library. If there’s a job with many compile steps, the first will cause some of these library elements to be copied to the target, but once they’re there, they persist such that subsequent steps don’t require copies.
  • VirtualZ supports “sparse” copying. In other words, if you have a library of copybooks containing a thousand PDS members, we only need to bring the ones that are actually referenced — not the entire library.
  • If you’ve implemented a remote disk replication technology such as IBM’s PPRC and your data has been adequately synchronized, VirtualZ behaves just as though it is in a shared DASD environment, using local replicated data instead of moving data across your network.
  • If some cases, VirtualZ allows you to “pre-stage” remote files so that they will be available before any redirected work begins. This is an open-ended feature that allows you to use whatever high-performance file transfer application you like outside of VirtualZ redirection. If you stage important files ahead of a redirection job running there, VirtualZ uses this data without further network data transfer.

Note that spooled output files (SYSOUT) are treated differently than input files. All of the SYSOUT files created by redirected work are transparently returned to the source system and written to the original source job’s output. This enables the user that originally submitted the work to find the SYSOUT exactly where they expect it to be.

This approach is also used for messages written to the JES2 JOBLOG datasets, although there is a VirtualZ option to suppress returning JOBLOG messages to the source system if desired. If the option to return target system JOBLOG messages is activated, the user will see the redirected job’s JOBLOG output in the output from the job they submitted, complete with all the allocation messages, return codes, errors, and so forth as they were generated on the target system. If you don’t want your users to see this level of detail, it can be suppressed.

Conventional output files are also treated slightly differently than input files in order to not cause conflicts within the user’s JCL. An example will help illustrate why this is important.

Let’s say the user submits a two-step job: step one executes the IBM assembler, and step two executes the IBM binder. (The example shows the IBM High Level Assembler as the redirected product, but that’s not important…the same principles work identically for all products.) The JCL might look like this:

//SYSIN    DD DSN=source-code,DISP=SHR
//         UNIT=VIO,SPACE=(CYL,10)
NAME       TEST(R)

Note that the user has setup their job to write the object code output of the assembler (the SYSLIN DD statement above) to a temporary file that will be passed as input to the second step, and the user is requesting that this file be created in memory rather than as a conventional disk file by specifying UNIT=VIO. Since this file may be “virtual,” it only resides on the source system, even if all of the disks are otherwise shared. For this reason, VirtualZ simply transfers the records from the target back to the source, where they are used to populate this temporary file, solving a variety of issues (although at the expense of moving this data across the network from target to source).

To summarize, VirtualZ provides a lot of flexibility when it comes to handling data for redirected workloads, but you’ll want to consider what the optimum values are depending on your site’s network capacity, workload volume, and tolerance for delays. Applications that only require small amounts of data (i.e., compilers) will generally perform with little perceptible delay. However applications requiring large-scale data movement should be carefully evaluated to ensure you’ll be happy with the results.