Tutorial on Machine Learning & Data Analytics
AI/Machine Learning/Deep Learning
Big Data Analytics
TimeWednesday, June 27th11:30am - 12:30pm
DescriptionThe tutorial offers basics of analyzing data with machine learning and data analytics algorithms in order to understand foundations of learning from large quantities of data. This tutorial requires no previous knowledge of machine learning or data analytics techniques. It consists of general methods for data analysis in order to understand clustering, classification, and regression by using High Performance Computing (HPC). This includes a short discussion of test datasets, training datasets, and validation datasets required to learn from data with a high accuracy. Easy application examples in context are given to foster the theoretical tutorial elements that also will illustrate problems like overfitting followed by mechanisms such as validation and regularization that prevent such problems.
The tutorial will start from a very simple application example in order to teach foundations like the role of features in data, linear separability, or decision boundaries for machine learning models. In particular this tutorial will point to key challenges in analyzing large quantities of data sets (aka ‘big data’) in order to motivate the use of parallel and scalable machine learning algorithms and advanced data analytics techniques. The tutorial thus targets specific challenges in analyzing large quantities of datasets that cannot be analyzed with traditional serial methods provided by tools such as R, SAS, or Matlab. This includes several challenges as part of the machine learning algorithms, the distribution of data, or the process of performing validation. The tutorial will introduce selected solutions to overcome these challenges using parallel and scalable computing techniques based on the Message Passing Interface (MPI) and OpenMP that run on massively parallel High Performance Computing (HPC) platforms. The tutorial ends with a short introduction to deep learning that emerged as a promising disruptive approach, allowing knowledge discovery from large datasets in an unprecedented effectiveness and efficiency using GPGPUs.