Based on Chapter 8 of ModernDive. Code for Quiz 12.
library(tidyverse)
library(moderndive) #install before loading
library(infer) #install before loading
library(fivethirtyeight) #install before loading
-Replace all the instances of ???. These are answers on your moodle quiz.
-Run all the individual code chunks to make sure the answers in this file correspond with your quiz answers
-After you check all your code chunks run then you can knit it. It won’t knit until the ??? are replaced
-Save a plot to be your preview plot
-Look at the variable definitions in congress_age
What is the average age of members that have served in congress?
-Set random seed generator to 123
-Take a sample of 100 from the dataset congress_age and assign it to congress_age_100
set.seed(4346)
congress_age_100 <- congress_age %>%
rep_sample_n(size=100)
#18,635 rows representing members of Congress
-congress_age is the population and congress_age_100 is the sample
-18,635 is number of observations in the the population and 100 is the number of observations in your sample
Construct the confidence interval
congress_age_100 %>%
specify(response = age)
Response: age (numeric)
# A tibble: 100 x 1
age
<dbl>
1 58
2 27.3
3 59.4
4 47.8
5 36.4
6 62.3
7 52.5
8 55.5
9 44
10 48
# ... with 90 more rows
Response: age (numeric)
# A tibble: 100,000 x 2
# Groups: replicate [1,000]
replicate age
<int> <dbl>
1 1 55.2
2 1 40.8
3 1 55.7
4 1 52.5
5 1 54.5
6 1 35.8
7 1 44.5
8 1 47.9
9 1 40.8
10 1 37.4
# ... with 99,990 more rows
The output has 100,000 rows
-Assign to bootstrap_distribution_mean_age
-Display bootstrap_distribution_mean_age
bootstrap_distribution_mean_age <- congress_age_100 %>%
specify(response = age) %>%
generate(reps = 1000, type = "bootstrap") %>%
calculate(stat = "mean")
bootstrap_distribution_mean_age
# A tibble: 1,000 x 2
replicate stat
* <int> <dbl>
1 1 51.3
2 2 48.2
3 3 49.7
4 4 50.5
5 5 51.6
6 6 47.9
7 7 49.5
8 8 50.0
9 9 51.0
10 10 51.0
# ... with 990 more rows
The bootstrap_distribution_mean_age has 1000 means
visualize (bootstrap_distribution_mean_age)
Calculate the 95% confidence interval using the percentile method
-Assign the output to congress_ci_percentile
-Display congress_ci_percentile
congress_ci_percentile <- bootstrap_distribution_mean_age %>%
get_confidence_interval(type ="percentile", level = .95) #4:12 ch8-2 remove later
congress_ci_percentile
# A tibble: 1 x 2
lower_ci upper_ci
<dbl> <dbl>
1 48.5 52.7
-Calculate the observed point estimate of the mean and assign it to obs_mean_age
-Display obs_mean_age,
obs_mean_age <- congress_age_100 %>%
specify(response = age) %>%
calculate(stat = "mean") %>%
pull()
obs_mean_age
[1] 50.533
-Shade the confidence interval
-Add a line at the observed mean, obs_mean_age, to your visualization and color it “hotpink”
#endpoint = the congress percentile variable
visualize(bootstrap_distribution_mean_age) +
shade_confidence_interval(endpoints = congress_ci_percentile) +
geom_vline(xintercept = obs_mean_age, color ="hotpink", size = 1 )
-Calculate the population mean to see if it is in the 95% confidence interval
-Assign the output to pop_mean_age
-Display pop_mean_age
#assign orginal data to pop_mean
pop_mean_age <- congress_age %>%
summarize(pop_mean= mean(age)) %>% pull()
pop_mean_age
[1] 53.31373
-Add a line to the visualiztin at the, population mean, pop_mean_age, to the plot color it “purple”
visualize(bootstrap_distribution_mean_age) +
shade_confidence_interval(endpoints = congress_ci_percentile) +
geom_vline(xintercept = obs_mean_age, color = "hotpink", size = 1) +
#adding a line use hotpink is the same as up top
#adding purple to the pop_mean_age
geom_vline(xintercept = pop_mean_age , color = "purple", size = 3)
-Is population mean the 95% confidence interval constructed using the bootstrap distribution? yes
-Change set.seed(123) to set.seed(4346). Rerun all the code.
-When you change the seed is the population mean in the 95% confidence interval constructed using the bootstrap distribution? no
-If you construct 100 95% confidence intervals approximately how many do you expect will contain the population mean? 95