In Exploratory analysis of tabular data, univariate analysis is the first step. It consists in exploring, summarizing, visualizing columns of a dataset.
In common circumstances, table wrangling is a prerequisite.
Then, univariate techniques depend on the kind of columns we are facing.
For numerical samples/columns, to name a few:
Boxplots
Histograms
Density plots
CDF
Quantile functions
Miscellanea
For categorical samples/columns, we have:
Bar plots
Column plots
Dataset
Since 1948, the US Census Bureau carries out a monthly Current Population Survey, collecting data concerning residents aged above 15 from \(150 000\) households. This survey is one of the most important sources of information concerning the american workforce. Data reported in file Recensement.txt originate from the 2012 census.
In this lab, we investigate the numerical colums of the dataset.
After downloading, dataset Recensement can be found in file Recensement.csv.
Choose a loading function for the format. Rstudio IDE provides a valuable helper.
Load the data into the session environment and call it df.
Table wrangling
Question
Which columns should be considered as categorical/factor?
Coerce the relevant columns as factors.
Search for missing data (optional)
Question
Check whether some columns contain missing data (use is.na).