Doctoral Short Course “Statistics and Data Analysis”, July 9 and 10, ISU, H104


The course intends to give an introduction to several aspects of Data Science, starting from data visualization, continuing with established statistical methods (univariate analysis, multivariable methods), and ending up with machine learning and data mining. For each subject we will introduce the main concepts, we will show how to perform some example analyses using different tools, and we will provide links to additional study material.

The course consists of four theme-specific individual sessions that are independent of each other. It is therefore possible to register for and attend only selected session(s).

At the end of the short course the students should (a) know the very basics of these subjects, (b) be able to perform simple analyses on their own, and (c) be able to deepen their knowledge using the additional material if they so desire.


1) Data Visualization → July 9, 10:00 – 13:00, H104 A Presentation of the course and its objectives (15 minutes) B Working with tabular data: Excel and Comma Separate Values (CSV) files (15 minutes) C First example: scatter plots (1 hour 30 minutes) – Introduction to the problem: differential expression in genes (biology) – From tabular data to graphs: how to arrange multiple information in the same plot – Building the scatterplot D Second example: line plots (30 minutes) – Introduction to the problem: Apples sales over the years (retail) – Building the line plot E Software tools, additional material, questions (30 minutes)

2) Univariate Analysis → July 9, 14:00 – 17:00, H104 A Statistical significance (30 minutes) B T-test: assumptions and meaning (30 minutes) C Example: single t-test (30 minutes) D Multiple t-test and p-value correction (1 hour) – Extending the previous example to multiple tests – The problem with multiple testing: false positive results – The Bonferroni correction – False Discovery Rate E Software tools, additional material, questions (30 minutes)

3) Multivariate Analysis

→ July 10, 10:00 – 13:00, H104 A Linear regression, univariate (30 minutes) B Linear regression, multivariate (30 minutes) C Example analysis (30 minutes) D Extension: generalized linear models (1 hour) – Logistic regression E Software tools, additional material, questions (30 minutes)

4) Machine Learning and Data Mining → July 10, 14:00 – 17:00, H104 A Introduction: from statistics to machine learning and AI (30 minutes) B First example of machine learning method: decision trees (45 minutes) C Second example of machine learning method: neural networks (45 minutes) D Hands on: the Iris and Digit datasets E Software tools, additional material, questions (30 minutes)

Instructors: Prof. Vincenzo Lagani (Bioinformatics, School of Natural Sciences and Medicine), Prof. Erekle Magradze (Computer Engineering, School of Business, Technology and Education)

Language of Instruction: English

Dates and Times: Session 1: July 9, 10:00-13:00 Session 2: July 9, 14:00-17:00 Session 3: July 10, 10:00-13:00 Session 4: July 10, 14:00-17:00

Venue: Ilia State University, H-Building, Room H104

Registration: The maximum number of attendees per session is 18. Therefore, prior registration is required and the principle of priority applies. The registration deadline is July 5, 2019, 18:00. For registration please use the registration online form: