Difference between revisions of "R Programming - Getting Started"
(→R Functions) |
(→R Functions) |
||
Line 86: | Line 86: | ||
* math - log2, sum, prod | * math - log2, sum, prod | ||
* plotting - plot, boxplot, hist | * plotting - plot, boxplot, hist | ||
− | * strings - grep, substring | + | * strings - grep, substring, gsub |
* sets - unique, intersect, union, setdiff | * sets - unique, intersect, union, setdiff |
Revision as of 13:37, 24 May 2019
So you want to learn R programming. Good for you. This page will hopefully walk you through getting into R.
Contents
Reading
There are numerous good tutorials, getting started, and so forth for R. Reading through just about any of them is good for you. Here are a few you can try, but feel free to pick your own as well.
- R Tutorial on TutorialsPoint - mostly a high level overview, many parts suitable for people with very limited programming experience.
- R Manuals - from the official R website, these tend to be more in depth and aimed at an audience who already has some programming experience.
- R Getting Started - summary / intro by Jeff Kinne
Software Setup
R is free to use and has numerous free packages as well. We recommend using the Rstudio IDE since it is the most popular and has some very nice features.
Install on Your Computer
- Download and install the latest version of R from https://cloud.r-project.org
- Download and install Rstudio deskttop (the free version) from https://www.rstudio.com/products/rstudio/download/
Use on ISU CS Systems
To use R on the ISU CS systems, you can either use Rstudio when you are in one of the labs or run R from the terminal when you are logged in remotely. To run Rstudio on one of the CS lab computers, simply run the rstudio command (either from a terminal, or via the graphical menu). To run R from a terminal, simply run the R command.
Packages
Note - before trying to install a package, first try to load it with the library command. If it isn't installed, then you try to install it. See next...
One of the best features of R is the large number of very good packages that are easy to install and use. Once you have downloaded and installed R and Rstudio and open up Rstudio, you can download and install packages using the install.packages command. For example, here is the command to install openxlsx, you would run the following.
install.packages("openxlsx")
You only need to run this once on your computer. Once it is installed, you use the library command to load the package so it is available for use.
library("openxlsx")
Many packages related to biology and medicine are installed a little differently, using a system called Bioconductor. You first must install the R Bioconductor by running the following.
if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager")
Once this is complete, you can install a Bioconductor package (here the package edgeR) as follows.
BiocManager::install("edgeR")
The Basics Quizzes
Look through the R Getting Started slides again to refresh your memory about the basics, and then download and open the following "quizzes" R files. For each of these, your goal is to open the file, look at the first line, decide what you think will happen after the first line is run, then run the first line and see what happens; then proceed one line at a time trying to think what will happen, and then running the line to see what actually happens.
- arithmetic_quiz.R
- variable_quiz.R
- boolean_quiz.R
- vector_quiz.R
- function_quiz.R
- dataframe_quiz.R
- string_quiz.R
Case Studies
Read through one of the tutorials, and start looking at each of the following case studies. These are R files that are looking at some interesting data. Our first goal is just to understand what the data is and how the code works. Once we understand how the code works we can ask some more questions about the data.
Gene Expression in Developing Heart Cells
For this example we look at some of the data from a scientific study by researchers looking at heart cell development. The data was downloaded from https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE69618, and the research of the authors was published at https://www.ncbi.nlm.nih.gov/pubmed/26485529. You can also view ISU posters related to looking at this data at http://cs.indstate.edu/info/posters/
Let's take a look at the data and some R code to begin looking at it. Login to one of the CS systems, and run the following commands.
cd ~ mkdir heart-genes cd heart-genes cp /u1/junk/bd4isu/GSE69618_data.csv . cp /u1/junk/bd4isu/GSE69618.R
You can also download the files from http://cs.indstate.edu/~jkinne/bd4isu-summer-2019/code/. Open the GSE69618.R file in Rstudio and run each line in the file. Note - your instructor will show you how to do this and explain the different parts of Rstudio that you are seeing.
References
Cheat Sheets
- Base R - http://github.com/rstudio/cheatsheets/raw/master/base-r.pdf
- R studio IDE - https://www.rstudio.com/resources/cheatsheets/#ide
- Advanced R - https://www.rstudio.com/wp-content/uploads/2016/02/advancedR.pdf
R Language Definition
Every programming language contains a list of "reserved" words that have special meaning and cannot be used for variable or function names. R's - R reserved words
Every programming language has special meaning for what punctuation means - normally parenthesis () are used for enforcing order of operations and for defining and calling functions. Every language is slightly different in the rules. For R, this is all listed in the specification of the R parser (that is a bit of a boring read, but there you go).
And the complete R language specification is at https://cran.r-project.org/doc/manuals/r-release/R-lang.html. This is aimed at "mature" programmers, so view at your own risk.
R Functions
The following are R functions that we commonly use. You can find examples by searching online. You can also look up help on the functions in Rstudio.
- statistics - min, max, mean, var, cor, cov, sd
- I/O - print, View, read.csv, write.csv, setwd
- data frames / matrices / arrays - summary, table, ncol, nrow, dim, tapply, sapply, cbind, rowMeans
- vectors - c, length
- math - log2, sum, prod
- plotting - plot, boxplot, hist
- strings - grep, substring, gsub
- sets - unique, intersect, union, setdiff