One Dataset Visualized 25 Ways
We will only touch the surface of data visualization in this class. Visualization is a rich field on its own, in case some of you would consider it as a career option.
Data Visualizations
are graphical representations of data
use different colors, shapes, and the coordinate system to summarize data
can tell a story or can be useful for exploring data
Rows: 312
Columns: 13
$ employment <fct> Part Time, Full Time, Part Time, Part Time, NA, Part…
$ age <dbl> 19, 23, 22, 21, 26, 25, 27, 36, 30, 20, 18, 20, 25, …
$ enrollment <fct> Part Time, Full Time, Full Time, Full Time, Full Tim…
$ weekly_earnings <dbl> 400.00, 1476.92, 561.25, 100.00, NA, 300.00, 1076.92…
$ household_size <dbl> 6, 2, 2, 4, 2, 3, 3, 1, 3, 2, 4, 4, 2, 2, 4, 5, 4, 2…
$ time_alone <dbl> 326, 150, 357, 22, 0, 455, 90, 340, 326, 120, 285, 5…
$ sleep_time <dbl> 680, 180, 470, 660, 875, 765, 630, 300, 445, 630, 66…
$ work_time <dbl> 315, 0, 0, 0, 0, 0, 0, 645, 555, 0, 0, 0, 520, 615, …
$ degree_class_time <dbl> 0, 0, 238, 0, 0, 0, 0, 0, 0, 0, 0, 228, 0, 0, 0, 0, …
$ shopping_time <dbl> 14, 0, 0, 0, 0, 0, 0, 5, 20, 345, 0, 0, 0, 0, 0, 0, …
$ lunch_break_time <dbl> 66, 60, 20, 115, 35, 50, 25, 30, 75, 60, 15, 15, 60,…
$ sports_time <dbl> 0, 60, 0, 30, 0, 0, 0, 0, 0, 0, 0, 0, 0, 15, 0, 0, 0…
$ religious_time <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 5, 0, 0, 0, 0, 0…
Figure 1: A bar plot of employment status
Figure 2: Blank coordinate system
Figure 3: Mapping employment variable to the x-axis
Figure 4: Creating the bars by adding the geometric layer of a barplot
The three main steps to make a plot are:
ggplot() function on the data we want to plot.aes() function.The tidyverse style guide has the following convention for writing ggplot2 code.
The plus sign for adding layers + always has a space before it and is followed by a new line.
The new line is indented by two spaces. RStudio does this automatically for you.
Both the above and below code are correct styles of writing ggplot code.
`stat_bin()` using `bins = 30`. Pick better value `binwidth`.
Figure 5: A histogram of weekly earnings
Figure 6: Understanding skewness of a histogram
weekly_earnings isweekly_earnings which of the following can be concluded?Figure 7: Boxplot of weekly earnings
Figure 8: Annotated boxplot
Figure 9: Boxplot overlayed with individual observation points
If you are color-blind, depending on the type, you may possibly not be able to distinguish these colors so the next slide will make much more sense.
Using the penguins data frame ask a question that you are interested in answering. Visualize data to get a visual answer to the question. What is the visual telling you? Note all of this down in your lecture notes.