CS 618

From Computer Science
Revision as of 11:15, 23 May 2024 by Jkinne (talk | contribs) (Projects)
Jump to: navigation, search

This page will contain the course syllabus and plan for CS 618 Computational Biology (run also as CS 459 Topics in CS for undergrads) for the summer of 2024. For now, it contains a list of topics that we will look at. The basic plan is to look at a number of projects that I have worked on in the past, and to look at key tools and algorithms used in computational biology and bioinformatics.

The top 3 goals for the course - (1) being able to use programming, tools, etc. to work on biology-related projects and data, (2) understanding some of the key algorithms, statistics, etc. used in this area, and (3) understanding as much of the biology as we can, in particular related to where the data comes from, what it means, etc. And tying all of these together will be working on some projects.

Note - this page will be added to as I prepare for the course, but this is hopefully enough for you to know if you want to take the course or not.

Programming/Tools

R programming, including commonly used packages.

Python programming, including commonly used packages.

Other programming - javascript/node, bash.

Software/tools - BLAST, NCBI.

Algorithms/Statistics

Statistics

Clustering techniques

Sequence alignment algorithms

Biology

Central dogma of genetics

Biological data - different types of assays, etc. - how the data is produced, what the data looks like, etc.

Projects

Gene expression - determining key genes from gene expression datasets. Project is in R, uses Shiny, Datatables, ShinyProxy, Docker. Poster - https://cs.indstate.edu/info/posters/bd4isu2022-bartlett.pdf

Protein topology prediction - finding potential transmembrane proteins in genomes. Project is in Python, uses Javascript, NCBI, BLAST. Poster - https://cs.indstate.edu/info/posters/bd4isu2022-hoffman.pdf

Transcription factors - finding mutations to disable a transcription factor while still preserving others. Project is in Python, R, and/or C. Poster - https://cs.indstate.edu/info/posters/bd4isu2020-bennett.pdf

Gene expression - determining key genes in a particular dataset from fish. Poster - https://cs.indstate.edu/info/posters/bd4isu2021-gosnell.pdf

Mass spectrometry data - keeping a database of mass spec data and searching through databases for new samples. Potential new project.

Genome sequencing - doing whole genome sequencing for species that have not yet had this done. New project with one of the students in the course.

Sequencing - different sequencing technologies (RNA seq, CHIP seq, single cell RNA seq, etc.), pros/cons/costs/what-used-for/etc., for a study looking cancer in a model organism.

Other requested topics...

Drug discovery/modeling - modeling/simulating drug interactions with the body and the drug discovery process.

Resources

Bioinformatics

Watch list

Watch starting from the bottom.

Data files

  • GSE85331 Liu et al - see file GSE85331_all.gene.FPKM.output.replicates.txt.gz at the bottom.

Programs to install

  • 7-zip - for extracting zip files, if your OS cannot unzip them already (e.g., Windows 10).