Displaying ./randomSamples.Rmd

---
title: "Random Samples"
author: "Jeff Kinne"
date: "7/3/2019"
output: html_document
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

This Rmd file contains code for generating simulated gene expression data

# Parameters

Pick some parameters - how many samples, how many genes, etc., and set up vectors of identifiers for the samples.

```{r}
set.seed(17) # so we get the same numbers each time

n <- 10000      # number of "genes" or "features"
replicates <- 3 # replicates per group
groups <- c("A", "B", "C", "D") # names of our groups

# create a vector that will be something like A.1, A.2, A.3, B.1, etc.
sample.ids <- c(sapply(groups, function(ch) {
  sapply(1:replicates, function(i) {paste0(ch,".",i)})
  }))

# create a vector that will be something like A, A, A, B, ...
group <- c(sapply(groups, function(ch) {rep(ch, replicates)}))

# mean value for our features
base.mean <- 10
mean.increase.by <- 3  # some features will have higher mean, increased by this amount
```

# Rows/features generated from the same distribution

For these rows/features we choose random values using the same distribution and mean

```{r}
# the half of the features that are generated using the same distribution and mean
same.part <- cbind(sapply(1:length(sample.ids), function(i) {
  rnorm(n/2, mean=base.mean)
}))
colnames(same.part) <- sample.ids
boxplot(same.part)
```

# Rows/fatures generated from different distribution/means

For these rows/features we choose random values so each group has values chosen with different mean value.

```{r}
# the half of the features that are generated using different distribution/means for
# each of the groups
diff.part <- cbind(sapply(1:length(sample.ids), function(i) {
  rnorm(n/2, mean=(base.mean + mean.increase.by*((i-1) %/% replicates)))
}))
colnames(diff.part) <- sample.ids
boxplot(diff.part)
```

# Combine both parts for our dataset

```{r}
# put together both halves for our set of features/genes to look at
values <- rbind(same.part, diff.part)
colnames(values) <- sample.ids
boxplot(values)
```