Difference between revisions of "Bioinformatics"

From Computer Science
Jump to: navigation, search
(Reading)
(Gene Expression)
Line 10: Line 10:
 
=Gene Expression=
 
=Gene Expression=
 
We start by getting into the following dataset - https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE85331
 
We start by getting into the following dataset - https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE85331
 +
See https://pubmed.ncbi.nlm.nih.gov/28663367/ for the publication associated with this dataset.
  
 
On your own computer, download https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE85331&format=file&file=GSE85331%5Fall%2Egene%2EFPKM%2Eoutput%2Ereplicates%2Etxt%2Egz
 
On your own computer, download https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE85331&format=file&file=GSE85331%5Fall%2Egene%2EFPKM%2Eoutput%2Ereplicates%2Etxt%2Egz

Revision as of 19:03, 20 January 2022

Additional Programs to Install

  • Compression - for those using Windows, install 7-Zip. MacOS and Linux natively support most compression formats that we will need.
  • R - first install R and then install RStudio Desktop (free).
  • Teams - ISU faculty/staff/students can install MS Teams for free along with other parts of Office 365. Start by logging into https://portal.office.com with your ISU credentials, click around looking for Teams (may have to click on "All Apps" or something like that), and once you get to Teams look for a link to download the desktop application (for Windows and Mac OS, not available for Linux). You can also use Teams in the browser.

Reading

Potentially good things to read / tutorials, etc. ...

Gene Expression

We start by getting into the following dataset - https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE85331 See https://pubmed.ncbi.nlm.nih.gov/28663367/ for the publication associated with this dataset.

On your own computer, download https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE85331&format=file&file=GSE85331%5Fall%2Egene%2EFPKM%2Eoutput%2Ereplicates%2Etxt%2Egz and extract (uncompress) the file (on MacOS or Linux just double click it, on Windows use 7-Zip or something similar).

Spreadsheet

After extracting you can open the file in Excel, Sheets, or LibreOffice. Note that it is a tsv file. If you double click, your OS may not know what program to use to open it. So start your spreadsheet program and then open the file. Some things are not too painful to do in your spreadsheet program. For example, you should verify that the following are all correct...

  • Genes with highest H1_day0_0 values: SNORD97, SNHG25, EEF1A1, RPL38, RPS27.
  • Genes with highest H1_CM_0 values: H19, MYL7, RPL31, SNORD9, RPS27.
  • Number of genes (#rows - 1): 26257
  • Median value for H1_day0_0: 0.539942
  • Median value for H1_CM_0: 1.246015
  • Average value for H1_day0_0: 15.86772859
  • Average value for H1_CM_0: 16.4574767

It seems that this dataset might be normalized so that the average values for each column (sample) are similar.

And that is about all we want to do in the spreadsheet right now. You can save it as an xlsx or import into Google Sheets in case we want to do anything else manually with it.

R and R Studio

Let's see what we can do with the same file in R and R Studio. First you should install R and R Studio on your computer. See links above.