Recently, a major bank in North America began developing a new application which would extract data from a SQL database running on Linux, perform some transformations and analysis on the data, and then upload the data to IBM Z datasets for use by other business units.
Initially, the team developing the application thought to use traditional FTP to preform the data movement from Linux to IBM Z, but quickly found that this approach had a few major disadvantages:
- Data generated by the tool had to first be written to disk on Linux before it could be picked up by FTP and copied to IBM Z. This added significant storage and runtime overhead to each ETL job.
- Ideally, the whole process would be initiated via jobs run on IBM Z. This worked for starting up the FTP transfer, but there was no way to coordinate the extraction of the data from Linux with IBM Z and to ensure that the overall process completed successfully.
- Overall, FTP was consuming too much CPU time on IBM Z. ASCII to EBCDIC translation was being performed on a general purpose Z processor, and BSAM was used to write the output datasets. This was impacting other jobs and increasing the costs associated with this process.
Luckily for the bank, there is a robust alternative to FTP available: Alebra’s Parallel Data Mover (PDM). Using PDM, the team developing the application was able to solve all the problems presented by FTP:
- PDM is able to read data directly from the output of a Linux application and stream that data directly to the destination dataset on IBM Z. No interim stop on disk on the Linux system is necessary. This drastically reduced the runtime of the ETL process and eliminated storage requirements on the Linux server.
- PDM jobs on IBM Z are able to start processes on Linux systems and check for successful completion. This way, the entire ETL process could be driven from IBM Z and managed by Workload Manager.
- PDM takes advantage of zIIP processors on IBM Z for processes like character translation and DASD I/O. In certain cases, PDM can bypass BSAM and read and write datasets directly using zHPF. By taking advantage of these features, the CPU consumption and overall performance of this process was improved and the total cost of the project was reduced dramatically.
- In addition, PDM makes use of IBM Z features such as zEDC compression and encryption assistance, further minimizing CPU consumption of the process.
Thanks to Alebra PDM, this development project was a success and the application is streaming many gigabytes of data from Linux to IBM Z each day.