MA7BY020: Exporatory Data Analysis

Course MA7BY020: Exploratory Data Analysis is an introduction to the principles and practice of data exploration and visualization using the R programming language. Keeping an eye on Machine Learning and Data Science, the course will cover the following topics:

  • Tabular data manipulation
  • Data visualization
  • Univariate and bivariate analysis for qualitative and quantitative data
  • Multivariate analysis starting with multiple linear
  • Matrix methods based on Singular Value Decomposition: PCA, CA, CCA, …
  • Clustering methods

After that course, you will be able to:

  • Handle tabular data using R version of the relational algebra (dplyr)
  • Visualize data using ggplot2 and plotly
  • Perform univariate and bivariate analysis, compute and assess statistical summaries
  • Perform multivariate analysis using matrix methods
  • Diagnose and validate the results of multivariate analysis
  • Perform, discuss and communicate the results of SVD factorization methods
  • Perform, discuss and communicate the results of clustering methods
  • Communicate the results of the analysis in a clear and concise manner using Quarto reports, presentations, and dashboards

The course is based on the R programming language, the RStudio IDE, and the VS Code Ide. We will rely on the tidyverse package and attempt to take advantage of R tidy evaluation mechanisms to write expressive and efficient code.

We will use the quarto package for reproducible research.