'1;95;0c CS 459 - Topics in CS, Bioinformatics Programming

CS 459 - Topics in CS, Bioinformatics Programming

Spring 2019


Note: Do not under any circumstances leave your home directory world and/or group accessible in such a manner that others can view your assignment work.

  1. HW 1 - unix and linux
    Login to the lab machine or CS server using your cs459 account, and then do the following.

    • 	     cd ~
      	     mkdir public_html
      	     setfacl -m user:apache:x .
      	     setfacl -m user:apache:rx public_html
      	     setfacl -m default:user:apache:rx public_html
      	     cd public_html
      	     echo "hello world" > hello.txt
      	 
      Then browse to http://cs.indstate.edu/~cs459XY/hello.txt (replacing XY with your cs459 number). Congrats - you have made the permissions on your public_html directory so the web server on the CS server has permission to access those files!
    • 	     cd ~
      	     ln -s ~jkinne/public_html/cs459-bd4isu-s2019/code jkinne-cs459
      	     ls -l
      	     
      	     cd jkinne-cs459
      	     ls -l
      	     pwd
      	 
      Congrats - you have created a link to the directory in Jeff's account where he keeps code and files for this class!
    • Watch videos 23 and 24 from Jeff's CS 151 videos (link in the links page), and update your rc file so that rm, mv, etc. ask for confirmation before removing files.

    Assigned 18-Jan-2019

  2. HW 2 - installing R and making sure it's working

    • Login to one of the lab machines and then do the following. Open a terminal (aka shell). Run the following command -
      rstudio &
      If rstudio did not open properly see the GA in his lab hours. If rstudio did open properly, congrats!
    • Install R on your home computer or laptop. Download from - https://cloud.r-project.org/
      Install rstudio on your home computer or laptop. Download from - https://www.rstudio.com/products/rstudio/download/#download

    Assigned 18-Jan-2019

  3. HW 3 - Weather part 1

    1. Download the files weather.R and the .csv file that starts with "Indianapolis" from the In Class Code page. Put these files on whatever computer you plan to use for R, and put them wherever you want to keep such files.
    2. Open up Rstudio and open the weather.R file. Comment out the setwd commands at the top of the file, and put in a setwd command so the directory is where you have stored your weather.R and csv file.
    3. Run the code one line at a time from the top (use ctrl-enter on Windows or Linux, cmd-enter on Mac).
    4. In your weather.R file, put comments that answer the following questions.
      1. To answer the following, use View(data) to view the data in rstudio, and you should be able to answer all of these questions by clicking on the column headings to sort based on them.
        What is the earliest date in the file? What is the latest date? What is the highest SNOW in the file, and on what date? What is the highest PRCP, and on what date? What is the highest TMAX, and on what date? What is the lowest TMIN, and on what date? What are the highest and lowest MEAN, and on what dates?
      2. To answer the following, look at the printout from print(summary(data)).
        What is the mean precipitation (I think this will be for all days when there was measureable precip)? What is the mean SNOW? What is the highest SNWD? What is the mean TMAX? What is the mean TMIN? What is the mean MEAN?
      3. To answer the following, look at the printout from print(summary(yearSummaries)).
        What are the lowest and highest tmax (note that this column is the average tmax for the year)? What are the lowest and highest tmin? What are the lowest and highest mean? What are the lowest and highest prcp? What are the lowest and highest snow?
    5. Add new lines to the weather.R file to do the following.
      1. Print out the annual mean temperatures from the last 10 years. This information is in the mean column yearSummaries data frame. Note that you can get the number of rows in yearSummaries with either dim or nrow.
      2. Print out the annual mean temperatures from the first 10 years in yearSummaries.
      3. Copy/paste the code that creates fit_max to make a new line that creates a variable fit_min that fits to the tmin column rather than the tmax column. Copy/pasted the predicted line and points line so that you display the fitted tmin line on the plot. Note that you should have 3 new lines of code - creating fit_min, creating predicted_min, and points.
      4. Change the options to the plot, points, and legend commands to use different symbols than the ones that are in there now. Try to find some that you think look better.
  4. HW 4 - Weather Part 2

    1. Download the csv file for the Champaign weather from the in class code directory. Modify your weather.R file so it loads both the Champaign weather and the Indy weather. You probably want to have two data frame variables; you could call one data_Indy and the other data_Champ.
    2. Copy/paste/modify lines as needed so that you come up with a plot showing the mean annual temperature for both Indy and Champaign. Use colors, symbols, or other options in your plot to make it easy to see both.
    3. Do you see any other trends besides the fact that the mean temperature is slightly increasing? For interesting reading see https://en.wikipedia.org/wiki/Solar_activity_and_climate
    4. Check the in-class-code directory for which cities we have weather data for already (right now just Indy and Champaigne, but we'll be adding more to that list). I want you to add to that list by using the Midwest Regional Climate Center's download page.

      Go to https://mrcc.illinois.edu/CLIMATE/, create an account, and find another weather station with data from 1950-2018 that we can use. Right when you login, click on Select Daily Station. On the right, click on Go to Map Selector. Use the mouse to move around on the map to find a location you are interested in.

      Click on one of the dots in the map and click More Info in the box that comes up. On the page that opens click See Station Inventory. If there are large gaps in the charts that come up, this is not a station we want to use.

      When you find a station that looks good, click Select Station after you clicked on it on the map. Click Go, and that takes you back to the main page.

      In the menu on the left, hover over Daily-Observed Data, then Daily, then click Between Two Dates. Use the form on that page to select the dates from Jan 1 1950 to Dec 31 2018; also select to download the Mean Temperature in addition to the items checked by default. Click Get Tabular Data. After it loads, click CSV Version to download the csv version.

      After you have the csv file, send it to Jeff unmodified. You can also create a copy for yourself that has the "metadata" removed from the beginning and end of the file, so you can import the file into R. You can edit the file by opening it in Rstudio (or some other text editor).

    5. In your weather.R file, create a new data frame called difference with columns Date, Year, and MeanDiff. The Date and Year columns should be from data_Indy. The MeanDiff should be data_Indy$MEAN - data_Champ$MEAN. That is the difference in mean temperature for the day between the two cities. Now, copy/paste/modify the appropriate lines so that you can add a column to the yearSummaries data frame that has the average MeanDiff per year. Plot the yearSummaries$MeanDiff. How does it look? One would think it should be roughly 0, and not generally more than 5 or so.
    6. Any other interesting things you notice about any of this data, or interesting questions you can ask and then come up with R code for the answers?

    Assigned 1-Feb-2019

  5. HW 5 - Weather Part 3

    1. Pick some other question to ask or plots to try to plot with the weather data. Make a first attempt and then come discuss with Muhammad or Jeff. Some possible suggestions - compute (and plot) average (min/max/mean) temperature for each day over the 69 year period, compute (and plot) record (min/max) temperature for each day, compute (and plot) average daily-temperature-range (max-min) for each day, plot of time points (1 to 25202) where either a record max/min temperature occurred (either daily or "all time"), plot of years that were "hottest of all time" at the time.
    2. Also, complete your version of HW 4, and check with Muhammad or Jeff for help.
  6. HW 6 - Math/CS Data

    1. Take the Math/CS enrollment data in here, download the csv files, and have them loaded into R. Use summary to make sure it read the columns as numeric. Note that you might use the skip, header, and nrow options in the read.csv call.
    2. Plot the # of CS degrees in the 4 year timeframe. Include on the plot the # BS degrees, #minors, #MS degrees.
    3. Plot the # of CS SCH's in the 4 year timeframe.
  7. HW 7 - Rmd

    1. In rstudio, create a new Rmarkdown file, save it as first.Rmd, put it into your cs459 account's public_html directory, and verify that you can browse to it in your web browser.
    2. Now get whatever you have been working on into an Rmd file as well.
  8. HW 8 - gene expression challenge

    1. Take GSE69618_gene_fun.Rmd as a starting point to work on the gene expression data. Make sure you can load it and it works. Make sure you are keeping enough of the file so you are looking at the log2 expression values.
    2. We want to compare the WT samples at days 0, 2, 6, and 10. Create a data frame that has the two replicates for each of those averaged, and still has the RefSeqID column. So you should have 5 columns.
    3. Let's focus on days 0 and 2. Create a new column (or a new data frame) that has day2 - day0. Give a boxplot of this (so we can see the range on this over all genes).
    4. Do a similar thing for day6-day2, day10-day6.
    5. Pick out the top 10 genes that have the highest difference for each of these comparisons. Note - you can do this by using View and clicking on the column header to sort; you can/should also do it in your script automatically.
    6. Look up your top 10 genes online to see what is known about them. Do these genes make sense as being difference at the time points we are looking at?
    7. Check the posters linked below, and think of any other kind of analysis you can do with what you know already to understand and analyze these datasets.

    Note - see 2018 posters in http://cs.indstate.edu/info/posters/ for reference.

Note: course website layout/code/template from Steve Baker. Anything horrible is not his fault.