DTF: An I/O Arbitration Framework for Multi-Component Data processing Workflows
Big Data Analytics
System Software & Runtime Systems
TimeTuesday, June 26th2:45pm - 3:15pm
DescriptionMulti-component workflows where one component performs a particular transformation with the data and passes it on to the next component is a common way of performing complex computations. Using components as building blocks, we can apply sophisticated data processing algorithms to large volumes of data.
Because the components may be developed independently, they often use file I/O and the Parallel File System to pass the data. However, as the data volume increases, file I/O quickly becomes the bottleneck in such workflows. In this work we propose an I/O arbitration framework called DTF designed to alleviate this problem by silently replacing file I/O with direct data transfer between the components. The DTF treats file I/O calls as I/O requests and performs I/O request matching to perform data movement. Currently, the framework works with PnetCDF-based multi-component workflows. It requires minimal modifications of the application and allows the user to easily control the I/O flow via the framework's configuration file.