CS 618

From Computer Science
Revision as of 18:14, 26 May 2024 by Jkinne (talk | contribs) (Programs to install)
Jump to: navigation, search

This page contains the course syllabus and plan for CS 618 Computational Biology (run also as CS 459 Topics in CS for undergrads) for the summer of 2024. The basic plan is to look at a number of projects that I have worked on in the past, and to look at key tools and algorithms used in computational biology and bioinformatics.

The top 3 goals for the course - (1) being able to use programming, tools, etc. to work on biology-related projects and data, (2) understanding some of the key algorithms, statistics, etc. used in this area, and (3) understanding as much of the biology as we can, in particular related to where the data comes from, what it means, etc. And tying all of these together will be working on some projects.

General Information

Course website' - https://cs.indstate.edu/wiki/index.php/CS_618

Your Instructor

Jeff Kinne, jkinne@cs.indstate.edu
Office: Root Hall A-142 and in Microsoft Teams, phone 812-237-2126
Instructor Office Hours: MTWR 11am-noon
Meeting: https://cs.indstate.edu/jkinne-meeting

Lecture, Exam

Lecture: Course is being run asynchronously, so the regular lecture hour time is being used as office hours. If you want to meet you can schedule a meeting with the meeting link, or join the zoom meeting that is in the course (and if I am available I will join as well)
Mid-term exam: TBA
Final exam: TBA

Prerequisites - no prerequisite.

CRN numbers - 30693 for the 001 face to face section, 30695 for the 301 online section

Required text We will use all online sources. This section will be updated as we go through them.

Class notes - Notes during class will mostly be kept in the documents in this OneDrive folder. Note that you will need to authenticate with your ISU account to view the notebook.

Programming/Tools

R programming, including commonly used packages.

Python programming, including commonly used packages.

Other programming - javascript/node, bash.

Software/tools - BLAST, NCBI.

Algorithms/Statistics

Statistics

Clustering techniques

Sequence alignment algorithms

Biology

Central dogma of genetics

Biological data - different types of assays, etc. - how the data is produced, what the data looks like, etc.

Projects

Gene expression - determining key genes from gene expression datasets. Project is in R, uses Shiny, Datatables, ShinyProxy, Docker. Poster - https://cs.indstate.edu/info/posters/bd4isu2022-bartlett.pdf

Protein topology prediction - finding potential transmembrane proteins in genomes. Project is in Python, uses Javascript, NCBI, BLAST. Poster - https://cs.indstate.edu/info/posters/bd4isu2022-hoffman.pdf

Transcription factors - finding mutations to disable a transcription factor while still preserving others. Project is in Python, R, and/or C. Poster - https://cs.indstate.edu/info/posters/bd4isu2020-bennett.pdf

Gene expression - determining key genes in a particular dataset from fish. Poster - https://cs.indstate.edu/info/posters/bd4isu2021-gosnell.pdf

Mass spectrometry data - keeping a database of mass spec data and searching through databases for new samples. Potential new project.

Genome sequencing - doing whole genome sequencing for species that have not yet had this done. New project with one of the students in the course.

Sequencing - different sequencing technologies (RNA seq, CHIP seq, single cell RNA seq, etc.), pros/cons/costs/what-used-for/etc., for a study looking cancer in a model organism.

Other requested topics...

Drug discovery/modeling - modeling/simulating drug interactions with the body and the drug discovery process.

Resources

Bioinformatics

Watch list

Watch starting from the bottom.

Data files

  • GSE85331 Liu et al - see file GSE85331_all.gene.FPKM.output.replicates.txt.gz at the bottom.

Programs to install

  • 7-zip - for extracting zip files, if your OS cannot unzip them already (e.g., Windows 10).

Course Description and Content

Course Description

The catalog description for this course is: "An introduction to computational biology. Topics may include principles and methods used for sequence alignment, motif finding, structural modeling, structure prediction and network modeling, as well as currently emerging research areas. A focus is placed on the computational cost of solving problems ­ in terms of CPU time, memory, and disk space. Study of the core algorithms used to solve problems."

Course Outline

Learning Outcomes