--- title: "Random Samples" author: "Jeff Kinne" date: "7/3/2019" output: html_document --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) ``` This Rmd file contains code for generating simulated gene expression data # Parameters Pick some parameters - how many samples, how many genes, etc., and set up vectors of identifiers for the samples. ```{r} set.seed(17) # so we get the same numbers each time n <- 10000 # number of "genes" or "features" replicates <- 3 # replicates per group groups <- c("A", "B", "C", "D") # names of our groups # create a vector that will be something like A.1, A.2, A.3, B.1, etc. sample.ids <- c(sapply(groups, function(ch) { sapply(1:replicates, function(i) {paste0(ch,".",i)}) })) # create a vector that will be something like A, A, A, B, ... group <- c(sapply(groups, function(ch) {rep(ch, replicates)})) # mean value for our features base.mean <- 10 mean.increase.by <- 3 # some features will have higher mean, increased by this amount ``` # Rows/features generated from the same distribution For these rows/features we choose random values using the same distribution and mean ```{r} # the half of the features that are generated using the same distribution and mean same.part <- cbind(sapply(1:length(sample.ids), function(i) { rnorm(n/2, mean=base.mean) })) colnames(same.part) <- sample.ids boxplot(same.part) ``` # Rows/fatures generated from different distribution/means For these rows/features we choose random values so each group has values chosen with different mean value. ```{r} # the half of the features that are generated using different distribution/means for # each of the groups diff.part <- cbind(sapply(1:length(sample.ids), function(i) { rnorm(n/2, mean=(base.mean + mean.increase.by*((i-1) %/% replicates))) })) colnames(diff.part) <- sample.ids boxplot(diff.part) ``` # Combine both parts for our dataset ```{r} # put together both halves for our set of features/genes to look at values <- rbind(same.part, diff.part) colnames(values) <- sample.ids boxplot(values) ```