Homework 10

1. Using a for loop, write a function to calculate the number of zeroes in a numeric vector. Before entering the loop, set up a counter variable counter <- 0. Inside the loop, add 1 to counter each time you have a zero in the vector. Finally, use return(counter) for the output.

counter = 0
pre_vec = runif(0:100)
my_vec = ifelse(pre_vec>0.5,1,0)
print(my_vec)

##   [1] 1 1 1 1 1 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 1 1 1 0 0 1 1
##  [38] 1 1 0 1 0 0 0 1 1 0 0 1 1 1 0 0 0 1 1 0 1 0 0 1 0 1 0 0 0 1 0 1 0 1 0 1 0
##  [75] 1 0 0 0 1 1 0 0 0 1 0 0 1 0 1 0 1 0 1 1 1 1 0 0 0 1 1

count_myvec = function(counter=0){
  for(i in 1:length(my_vec)){
  if(my_vec[i]==0){
    counter = counter + 1
  }
  }
  return(counter)
}
count_myvec()

## [1] 52

2. Use subsetting instead of a loop to rewrite the function as a single line of code.

count_myvec2 = function(counter=0){
  length(my_vec[my_vec == 0])
}
count_myvec2()

## [1] 52

3. Write a function that takes as input two integers representing the number of rows and columns in a matrix. The output is a matrix of these dimensions in which each element is the product of the row number x the column number.

timestable = function(x=9,y=10){
  makeTT = matrix(nrow=y,ncol=x)
  for(i in 1:nrow(makeTT)){
    for(j in 1:ncol(makeTT)){
      makeTT[i,j] = i*j
    }
  }
return(makeTT)  
}
ninebyten = timestable()
print(ninebyten)

##       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
##  [1,]    1    2    3    4    5    6    7    8    9
##  [2,]    2    4    6    8   10   12   14   16   18
##  [3,]    3    6    9   12   15   18   21   24   27
##  [4,]    4    8   12   16   20   24   28   32   36
##  [5,]    5   10   15   20   25   30   35   40   45
##  [6,]    6   12   18   24   30   36   42   48   54
##  [7,]    7   14   21   28   35   42   49   56   63
##  [8,]    8   16   24   32   40   48   56   64   72
##  [9,]    9   18   27   36   45   54   63   72   81
## [10,]   10   20   30   40   50   60   70   80   90

4. In the next few lectures, you will learn how to do a randomization test on your data. We will complete some of the steps today to practice calling custom functions within a for loop. Use the code from the March 31st lecture (Randomization Tests) to complete the following steps:

a. Simulate a dataset with 3 groups of data, each group drawn from a distribution with a different mean. The final data frame should have 1 column for group and 1 column for the response variable.

group1 = rnorm(n=20,mean=5,sd=0.1)
group2 = rnorm(n=20,mean=10,sd=0.1)
group3 = rnorm(n=20,mean=15,sd=0.1)

group = c(rep("group1",20),rep("group2",20),rep("group3",20))
response = c(group1,group2,group3)

my_df = data.frame(group,response)

print(my_df)

##     group  response
## 1  group1  4.917592
## 2  group1  5.148940
## 3  group1  4.911602
## 4  group1  4.936133
## 5  group1  5.031800
## 6  group1  5.016486
## 7  group1  4.893159
## 8  group1  5.039505
## 9  group1  5.048681
## 10 group1  5.134871
## 11 group1  4.931540
## 12 group1  4.891423
## 13 group1  5.058913
## 14 group1  4.983532
## 15 group1  5.118631
## 16 group1  4.874312
## 17 group1  5.095622
## 18 group1  5.142073
## 19 group1  4.948372
## 20 group1  4.971372
## 21 group2  9.860150
## 22 group2  9.976863
## 23 group2 10.087848
## 24 group2  9.964550
## 25 group2 10.029693
## 26 group2 10.084950
## 27 group2  9.920666
## 28 group2  9.856344
## 29 group2 10.021449
## 30 group2 10.039757
## 31 group2  9.973460
## 32 group2 10.151909
## 33 group2 10.082908
## 34 group2  9.976163
## 35 group2 10.065276
## 36 group2  9.940563
## 37 group2  9.893560
## 38 group2 10.268610
## 39 group2  9.929510
## 40 group2 10.009485
## 41 group3 14.908657
## 42 group3 15.022520
## 43 group3 14.843476
## 44 group3 15.200429
## 45 group3 15.120887
## 46 group3 15.321163
## 47 group3 15.014699
## 48 group3 14.951508
## 49 group3 14.918244
## 50 group3 14.991522
## 51 group3 15.040520
## 52 group3 14.847455
## 53 group3 14.895838
## 54 group3 14.976620
## 55 group3 14.909054
## 56 group3 15.157215
## 57 group3 14.998971
## 58 group3 15.110812
## 59 group3 15.112512
## 60 group3 14.920778

b. Write a custom function that 1) reshuffles the response variable, and 2) calculates the mean of each group in the reshuffled data. Store the means in a vector of length 3.

library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

shuffle_calcmeans = function(df=my_df){
  shuff_resp = sample(df$response)
  shuff_df = data.frame(group,shuff_resp)
  group_means = aggregate(shuff_df$shuff_resp,list(shuff_df$group),FUN=mean)
  vec_means = group_means$x
  
  return(vec_means)
}

shuff_means = shuffle_calcmeans()   
print(shuff_means)

## [1] 10.276991  9.257257 10.490310

c. Use a for loop to repeat the function in b 100 times. Store the results in a data frame that has 1 column indicating the replicate number and 1 column for each new group mean, for a total of 4 columns.

hun_shuff_df = data.frame(replicate=NA,group1_mean=NA,group2_mean=NA,group3_mean=NA)
  
for(i in 1:100){
  hun_shuff_df[i,] = c(i,shuffle_calcmeans())
}
print(hun_shuff_df)

##     replicate group1_mean group2_mean group3_mean
## 1           1   10.777158    9.499201    9.748198
## 2           2   10.008137    9.257162   10.759259
## 3           3    9.256905   10.257508   10.510144
## 4           4    9.252882   11.275017    9.496660
## 5           5   10.022146    9.769780   10.232632
## 6           6   10.511696    8.757700   10.755162
## 7           7   10.747908    9.757068    9.519581
## 8           8    9.731820   10.539387    9.753351
## 9           9   10.972761    9.015553   10.036245
## 10         10   10.772353    8.757419   10.494785
## 11         11   10.255411    9.736462   10.032685
## 12         12   10.514819    9.262364   10.247375
## 13         13    8.769344   11.011461   10.243753
## 14         14    9.983617   11.285327    8.755614
## 15         15   10.034522   10.253876    9.736160
## 16         16    9.767875   10.518189    9.738494
## 17         17   10.506643    9.269779   10.248135
## 18         18   11.028891   10.024600    8.971067
## 19         19   10.478422    9.540977   10.005159
## 20         20    9.786177    9.754658   10.483723
## 21         21   10.250848    9.742781   10.030929
## 22         22    9.006050   10.268038   10.750469
## 23         23   10.033970    8.232939   11.757649
## 24         24    9.470016    9.557880   10.996661
## 25         25   10.023156   11.012041    8.989361
## 26         26    9.521060   10.510344    9.993154
## 27         27   10.768732    9.493418    9.762407
## 28         28    9.491365   11.253634    9.279558
## 29         29   10.784016   10.502790    8.737752
## 30         30   11.232168    9.276748    9.515642
## 31         31   10.795203    8.997365   10.231990
## 32         32   10.011285    9.979769   10.033504
## 33         33    9.251858   11.012579    9.760121
## 34         34   10.270220   10.504373    9.249965
## 35         35    8.750749   10.802049   10.471760
## 36         36   10.735962    9.487426    9.801169
## 37         37    9.735751   10.522182    9.766625
## 38         38   10.718625    9.526638    9.779295
## 39         39   11.008706    9.024918    9.990934
## 40         40    9.247790    9.523708   11.253060
## 41         41   10.290359   10.496257    9.237942
## 42         42    9.044129   10.267441   10.712987
## 43         43    9.497272   10.742127    9.785160
## 44         44   10.751333   10.020119    9.253107
## 45         45    8.769274   11.492455    9.762829
## 46         46    8.763727   10.277867   10.982964
## 47         47   10.005551    8.765424   11.253583
## 48         48   10.522139    8.971365   10.531053
## 49         49    9.778260    9.531998   10.714300
## 50         50   11.017187    9.251032    9.756339
## 51         51    9.992530   10.247702    9.784325
## 52         52   10.000544    9.017578   11.006436
## 53         53    9.295728    9.977489   10.751341
## 54         54    9.751705    9.744932   10.527920
## 55         55   10.239788    9.280827   10.503943
## 56         56   10.246698   10.036184    9.741676
## 57         57   10.736272   10.279465    9.008822
## 58         58    9.268998   10.229465   10.526096
## 59         59    8.997943   10.257686   10.768929
## 60         60   10.013269    8.227509   11.783780
## 61         61    9.013959   11.017940    9.992659
## 62         62   10.010759    9.484200   10.529599
## 63         63    9.503249   11.223530    9.297779
## 64         64   10.511335   12.014078    7.499145
## 65         65    9.265610    9.485394   11.273554
## 66         66   10.511786    9.281374   10.231398
## 67         67   10.747950    9.017396   10.259212
## 68         68   10.234124    9.257613   10.532821
## 69         69    9.271651   10.234326   10.518581
## 70         70   10.231141   10.036791    9.756626
## 71         71    9.240569   11.005053    9.778936
## 72         72    9.260864   10.724352   10.039342
## 73         73   10.013936   10.016744    9.993878
## 74         74    9.479796    9.766669   10.778093
## 75         75   10.525685    8.481500   11.017373
## 76         76   10.273792    8.245303   11.505463
## 77         77   10.489259    9.772610    9.762689
## 78         78    8.769531   11.261781    9.993246
## 79         79    9.243037   10.050922   10.730599
## 80         80    9.520720   10.539437    9.964401
## 81         81    9.000461   10.744668   10.279429
## 82         82   10.774303    9.497136    9.753120
## 83         83    9.984155    9.475495   10.564908
## 84         84    9.736058   10.273469   10.015030
## 85         85    9.251491   10.278071   10.494996
## 86         86   10.018504    9.271872   10.734182
## 87         87    9.495214   10.012202   10.517142
## 88         88    9.482050    9.511307   11.031201
## 89         89   10.751731    9.749365    9.523462
## 90         90    9.785442    9.984990   10.254126
## 91         91    8.753946   10.507236   10.763375
## 92         92    9.504293   10.509170   10.011094
## 93         93    9.525552   10.253465   10.245541
## 94         94   10.225230   10.258962    9.540366
## 95         95    8.975701    9.796937   11.251920
## 96         96    8.999874   10.755151   10.269533
## 97         97    9.742420   10.760931    9.521207
## 98         98   11.030628    9.274248    9.719682
## 99         99   11.282022    8.983479    9.759057
## 100       100   10.011613   10.215266    9.797679

d. Use qplot() to create a histogram of the means for each reshuffled group. Or, if you want a challenge, use ggplot() to overlay all 3 histograms in the same figure. How do the distributions of reshuffled means compare to the original means?

library(ggplot2)
library(tidyr)

final_df = pivot_longer(hun_shuff_df,c("group1_mean","group2_mean","group3_mean"),names_to="group",values_to="means")
final_df = select(final_df,-"replicate")

hun_shuff_plot = ggplot(final_df,aes(x=means,fill=group)) +
  theme_bw(12) +
  geom_histogram(color='#e9ecef',alpha=0.4,position='identity',bins=30)
  

hun_shuff_plot

The reshuffled means are all around 10, whereas the original means were more distinctly different (5,10,15).