1. Using a for loop, write a function to calculate the number of zeroes in a numeric vector. Before entering the loop, set up a counter variable counter <- 0. Inside the loop, add 1 to counter each time you have a zero in the vector. Finally, use return(counter) for the output.
= 0
counter = runif(0:100)
pre_vec = ifelse(pre_vec>0.5,1,0)
my_vec print(my_vec)
## [1] 1 1 1 1 1 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 1 1 1 0 0 1 1
## [38] 1 1 0 1 0 0 0 1 1 0 0 1 1 1 0 0 0 1 1 0 1 0 0 1 0 1 0 0 0 1 0 1 0 1 0 1 0
## [75] 1 0 0 0 1 1 0 0 0 1 0 0 1 0 1 0 1 0 1 1 1 1 0 0 0 1 1
= function(counter=0){
count_myvec for(i in 1:length(my_vec)){
if(my_vec[i]==0){
= counter + 1
counter
}
}return(counter)
}count_myvec()
## [1] 52
2. Use subsetting instead of a loop to rewrite the function as a single line of code.
= function(counter=0){
count_myvec2 length(my_vec[my_vec == 0])
}count_myvec2()
## [1] 52
3. Write a function that takes as input two integers representing the number of rows and columns in a matrix. The output is a matrix of these dimensions in which each element is the product of the row number x the column number.
= function(x=9,y=10){
timestable = matrix(nrow=y,ncol=x)
makeTT for(i in 1:nrow(makeTT)){
for(j in 1:ncol(makeTT)){
= i*j
makeTT[i,j]
}
}return(makeTT)
}= timestable()
ninebyten print(ninebyten)
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
## [1,] 1 2 3 4 5 6 7 8 9
## [2,] 2 4 6 8 10 12 14 16 18
## [3,] 3 6 9 12 15 18 21 24 27
## [4,] 4 8 12 16 20 24 28 32 36
## [5,] 5 10 15 20 25 30 35 40 45
## [6,] 6 12 18 24 30 36 42 48 54
## [7,] 7 14 21 28 35 42 49 56 63
## [8,] 8 16 24 32 40 48 56 64 72
## [9,] 9 18 27 36 45 54 63 72 81
## [10,] 10 20 30 40 50 60 70 80 90
4. In the next few lectures, you will learn how to do a randomization test on your data. We will complete some of the steps today to practice calling custom functions within a for loop. Use the code from the March 31st lecture (Randomization Tests) to complete the following steps:
a. Simulate a dataset with 3 groups of data, each group drawn from a distribution with a different mean. The final data frame should have 1 column for group and 1 column for the response variable.
= rnorm(n=20,mean=5,sd=0.1)
group1 = rnorm(n=20,mean=10,sd=0.1)
group2 = rnorm(n=20,mean=15,sd=0.1)
group3
= c(rep("group1",20),rep("group2",20),rep("group3",20))
group = c(group1,group2,group3)
response
= data.frame(group,response)
my_df
print(my_df)
## group response
## 1 group1 4.917592
## 2 group1 5.148940
## 3 group1 4.911602
## 4 group1 4.936133
## 5 group1 5.031800
## 6 group1 5.016486
## 7 group1 4.893159
## 8 group1 5.039505
## 9 group1 5.048681
## 10 group1 5.134871
## 11 group1 4.931540
## 12 group1 4.891423
## 13 group1 5.058913
## 14 group1 4.983532
## 15 group1 5.118631
## 16 group1 4.874312
## 17 group1 5.095622
## 18 group1 5.142073
## 19 group1 4.948372
## 20 group1 4.971372
## 21 group2 9.860150
## 22 group2 9.976863
## 23 group2 10.087848
## 24 group2 9.964550
## 25 group2 10.029693
## 26 group2 10.084950
## 27 group2 9.920666
## 28 group2 9.856344
## 29 group2 10.021449
## 30 group2 10.039757
## 31 group2 9.973460
## 32 group2 10.151909
## 33 group2 10.082908
## 34 group2 9.976163
## 35 group2 10.065276
## 36 group2 9.940563
## 37 group2 9.893560
## 38 group2 10.268610
## 39 group2 9.929510
## 40 group2 10.009485
## 41 group3 14.908657
## 42 group3 15.022520
## 43 group3 14.843476
## 44 group3 15.200429
## 45 group3 15.120887
## 46 group3 15.321163
## 47 group3 15.014699
## 48 group3 14.951508
## 49 group3 14.918244
## 50 group3 14.991522
## 51 group3 15.040520
## 52 group3 14.847455
## 53 group3 14.895838
## 54 group3 14.976620
## 55 group3 14.909054
## 56 group3 15.157215
## 57 group3 14.998971
## 58 group3 15.110812
## 59 group3 15.112512
## 60 group3 14.920778
b. Write a custom function that 1) reshuffles the response variable, and 2) calculates the mean of each group in the reshuffled data. Store the means in a vector of length 3.
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
= function(df=my_df){
shuffle_calcmeans = sample(df$response)
shuff_resp = data.frame(group,shuff_resp)
shuff_df = aggregate(shuff_df$shuff_resp,list(shuff_df$group),FUN=mean)
group_means = group_means$x
vec_means
return(vec_means)
}
= shuffle_calcmeans()
shuff_means print(shuff_means)
## [1] 10.276991 9.257257 10.490310
c. Use a for loop to repeat the function in b 100 times. Store the results in a data frame that has 1 column indicating the replicate number and 1 column for each new group mean, for a total of 4 columns.
= data.frame(replicate=NA,group1_mean=NA,group2_mean=NA,group3_mean=NA)
hun_shuff_df
for(i in 1:100){
= c(i,shuffle_calcmeans())
hun_shuff_df[i,]
}print(hun_shuff_df)
## replicate group1_mean group2_mean group3_mean
## 1 1 10.777158 9.499201 9.748198
## 2 2 10.008137 9.257162 10.759259
## 3 3 9.256905 10.257508 10.510144
## 4 4 9.252882 11.275017 9.496660
## 5 5 10.022146 9.769780 10.232632
## 6 6 10.511696 8.757700 10.755162
## 7 7 10.747908 9.757068 9.519581
## 8 8 9.731820 10.539387 9.753351
## 9 9 10.972761 9.015553 10.036245
## 10 10 10.772353 8.757419 10.494785
## 11 11 10.255411 9.736462 10.032685
## 12 12 10.514819 9.262364 10.247375
## 13 13 8.769344 11.011461 10.243753
## 14 14 9.983617 11.285327 8.755614
## 15 15 10.034522 10.253876 9.736160
## 16 16 9.767875 10.518189 9.738494
## 17 17 10.506643 9.269779 10.248135
## 18 18 11.028891 10.024600 8.971067
## 19 19 10.478422 9.540977 10.005159
## 20 20 9.786177 9.754658 10.483723
## 21 21 10.250848 9.742781 10.030929
## 22 22 9.006050 10.268038 10.750469
## 23 23 10.033970 8.232939 11.757649
## 24 24 9.470016 9.557880 10.996661
## 25 25 10.023156 11.012041 8.989361
## 26 26 9.521060 10.510344 9.993154
## 27 27 10.768732 9.493418 9.762407
## 28 28 9.491365 11.253634 9.279558
## 29 29 10.784016 10.502790 8.737752
## 30 30 11.232168 9.276748 9.515642
## 31 31 10.795203 8.997365 10.231990
## 32 32 10.011285 9.979769 10.033504
## 33 33 9.251858 11.012579 9.760121
## 34 34 10.270220 10.504373 9.249965
## 35 35 8.750749 10.802049 10.471760
## 36 36 10.735962 9.487426 9.801169
## 37 37 9.735751 10.522182 9.766625
## 38 38 10.718625 9.526638 9.779295
## 39 39 11.008706 9.024918 9.990934
## 40 40 9.247790 9.523708 11.253060
## 41 41 10.290359 10.496257 9.237942
## 42 42 9.044129 10.267441 10.712987
## 43 43 9.497272 10.742127 9.785160
## 44 44 10.751333 10.020119 9.253107
## 45 45 8.769274 11.492455 9.762829
## 46 46 8.763727 10.277867 10.982964
## 47 47 10.005551 8.765424 11.253583
## 48 48 10.522139 8.971365 10.531053
## 49 49 9.778260 9.531998 10.714300
## 50 50 11.017187 9.251032 9.756339
## 51 51 9.992530 10.247702 9.784325
## 52 52 10.000544 9.017578 11.006436
## 53 53 9.295728 9.977489 10.751341
## 54 54 9.751705 9.744932 10.527920
## 55 55 10.239788 9.280827 10.503943
## 56 56 10.246698 10.036184 9.741676
## 57 57 10.736272 10.279465 9.008822
## 58 58 9.268998 10.229465 10.526096
## 59 59 8.997943 10.257686 10.768929
## 60 60 10.013269 8.227509 11.783780
## 61 61 9.013959 11.017940 9.992659
## 62 62 10.010759 9.484200 10.529599
## 63 63 9.503249 11.223530 9.297779
## 64 64 10.511335 12.014078 7.499145
## 65 65 9.265610 9.485394 11.273554
## 66 66 10.511786 9.281374 10.231398
## 67 67 10.747950 9.017396 10.259212
## 68 68 10.234124 9.257613 10.532821
## 69 69 9.271651 10.234326 10.518581
## 70 70 10.231141 10.036791 9.756626
## 71 71 9.240569 11.005053 9.778936
## 72 72 9.260864 10.724352 10.039342
## 73 73 10.013936 10.016744 9.993878
## 74 74 9.479796 9.766669 10.778093
## 75 75 10.525685 8.481500 11.017373
## 76 76 10.273792 8.245303 11.505463
## 77 77 10.489259 9.772610 9.762689
## 78 78 8.769531 11.261781 9.993246
## 79 79 9.243037 10.050922 10.730599
## 80 80 9.520720 10.539437 9.964401
## 81 81 9.000461 10.744668 10.279429
## 82 82 10.774303 9.497136 9.753120
## 83 83 9.984155 9.475495 10.564908
## 84 84 9.736058 10.273469 10.015030
## 85 85 9.251491 10.278071 10.494996
## 86 86 10.018504 9.271872 10.734182
## 87 87 9.495214 10.012202 10.517142
## 88 88 9.482050 9.511307 11.031201
## 89 89 10.751731 9.749365 9.523462
## 90 90 9.785442 9.984990 10.254126
## 91 91 8.753946 10.507236 10.763375
## 92 92 9.504293 10.509170 10.011094
## 93 93 9.525552 10.253465 10.245541
## 94 94 10.225230 10.258962 9.540366
## 95 95 8.975701 9.796937 11.251920
## 96 96 8.999874 10.755151 10.269533
## 97 97 9.742420 10.760931 9.521207
## 98 98 11.030628 9.274248 9.719682
## 99 99 11.282022 8.983479 9.759057
## 100 100 10.011613 10.215266 9.797679
d. Use qplot() to create a histogram of the means for each reshuffled group. Or, if you want a challenge, use ggplot() to overlay all 3 histograms in the same figure. How do the distributions of reshuffled means compare to the original means?
library(ggplot2)
library(tidyr)
= pivot_longer(hun_shuff_df,c("group1_mean","group2_mean","group3_mean"),names_to="group",values_to="means")
final_df = select(final_df,-"replicate")
final_df
= ggplot(final_df,aes(x=means,fill=group)) +
hun_shuff_plot theme_bw(12) +
geom_histogram(color='#e9ecef',alpha=0.4,position='identity',bins=30)
hun_shuff_plot
The reshuffled means are all around 10, whereas the original means were more distinctly different (5,10,15).