Bioinformatics and CS510: Difference between pages

From Computer Science at Indiana State University
(Difference between pages)
Jump to navigation Jump to search
wiki_previous>Jkinne
 
m 1 revision imported
 
Line 1: Line 1:
=Background=
This page contains the syllabus for CS 510 and is used to keep track of assignments, etc. <br>
For each video that is listed, vocab terms are given that are either explained within the video or are assumed the viewer already knows.
(this syllabus heavily authored by Jeff Kinne. Updated by Xavier Saunders)<br>
==Biology==
<br>
Definitions are from [https://www.ncbi.nlm.nih.gov/books/NBK21052/ NCBI] or Wikipedia.
CS 510 is Fast Track Introduction to Programming.  The course has no pre-reqs (can be taken by those with no prior CS or programming experience) and is meant to (a) get you programming (in python), and (b) get you ready to pass the admissions interview (python programming and basic algorithms / data structures). CS 510 counts as elective credit towards the MS degree. The course is meant for current ISU students in non-CS programs and for potential incoming CS MS students who need the course to get ready for the CS MS.


A video giving an overview and explaining some of the key points is here - [https://indstate-edu.zoom.us/rec/share/JiK-lnHhi4_8JhVukKSqActW8jYImxW0-Lwi7TN9XYfflglVe1WiomZdqQqSzPIV.F8fORdZrR6f5DXc1?startTime=1716559144000 zoom recording explaining Bio background].
For more information on applying to the CS MS program, see http://cs.indstate.edu/info/apply.html


===Cell===
=Who Should Register=
[https://www.youtube.com/watch?v=jsDxw63QqK0&list=PL8dPuuaLjXtPW_ofbxdHNciuLoTRLPMgB&index=24 Crash Course Biology Video]
'''Current ISU Students: '''
For ISU grad students who are in non-CS programs, if you do not have much programming experience, CS 510 is your best starting point in CS coursework.  If you are already a competent programmer in some language, then you likely would want to start with either [[CS 500]] or [https://cs.indstate.edu/~lmay1/courses/#/courses/cs401/home CS 501]. CS 500 is C and data structures and algorithms, while CS 501 is python and data science.


Things to understand...
'''Non-ISU Students: '''
* Cell diagram - be able to identify different parts of the cell in a diagram and what each part does (roughly). See diagrams on [https://en.wikipedia.org/wiki/Cell_(biology) Wikipedia].
''Domestic students who are not quite ready to start the CS MS (or just aren't sure if they want to do a full MS) can [https://www.indstate.edu/cgps/graduate/apply/non-degree-application apply to ISU as a Guest/Unclassifed student]'' and take the course either face to face or online. To do this, click on the Apply button, click to create an account if this is your first time starting an application at ISU, choose Graduate when prompted between Graduate or Undergraduate, for the Field of Interest choose "Guest Admission / Unclassified", and for Program of Study choose "Guest Admission (One Semester Only)".  Choose the term that you plan to take the course, and complete the remaining required fields.  


Vocab...
''International students outside of the US'' are not in general allowed to take courses online inside of the US without being enrolled in a degree program. International persons who are in the US on a visa of some type might be allowed to take the course (in particular, those on F2 or H4 likely would be allowed). Those who cannot take courses at ISU can check back here for course info (sample quizzes, reading assignments, programming assignments, tutorials, etc.) that will be posted publicly throughout the fall 2023 term.  
* ''prokaryote'' - Single-celled microorganism whose cells lack a well-defined, membrane-enclosed nucleus.  Comprises two of the major domains of living organisms—the Bacteria and the Archaea.
* ''eukaryote'' - Organism composed of one or more cells with a distinct nucleus and cytoplasm. Includes all forms of life except viruses and prokaryotes (bacteria and archea).
* ''DNA (deoxyribonucleic acid)'' - Polynucleotide formed from covalently linked deoxyribonucleotide units. It serves as the store of hereditary information within a cell and the carrier of this information from generation to generation.
* ''nucleus'' - Prominent membrane-bounded organelle in a eukaryotic cell, containing DNA organized into chromosomes.
* ''cytoplasm'' - Contents of a cell that are contained within its plasma membrane but, in the case of eukaryotic cells, outside the nucleus.
* ''cell membrane (plasma membrane)'' - Membrane that surrounds a living cell (all types of cells).
* ''cell wall'' - Mechanically strong extracellular matrix deposited by a cell outside its plasma membrane. It is prominent in most plants, bacteria, algae, and fungi. Not present in most animal cells.
* ''vacuole'' - Very large fluid-filled vesicle found in most plant and fungal cells, typically occupying more than a third of the cell volume.
* ''chloroplast'' - Organelle in green algae and plants that contains chlorophyll and carries out photosynthesis. It is a specialized form of plastid.
* ''organelle'' - Membrane-enclosed compartment in a eukaryotic cell that has a distinct structure, macromolecular composition, and function. Examples are nucleus, mitochondrion, chloroplast, Golgi apparatus.
* ''lipid'' - Organic molecule that is insoluble in water but tends to dissolve in nonpolar organic solvents. A special class, the phospholipids, forms the structural basis of biological membranes.
* ''protein'' - The major macromolecular constituent of cells. A linear polymer of amino acids linked together by peptide bonds in a specific sequence.
* ''cytoskeleton'' - System of protein filaments in the cytoplasm of a eukaryotic cell that gives the cell shape and the capacity for directed movement. Its most abundant components are actin filaments, microtubules, and intermediate filaments.
* ''RNA (ribonucleic acid)'' - Polymer formed from covalently linked ribonucleotide monomers (which are represented by the letters A, U, C, G).
* ''ribosome'' - Particle composed of ribosomal RNAs and ribosomal proteins that associates with messenger RNA and catalyzes the synthesis of protein.
* ''endoplasmic reticulum (ER)'' - Labyrinthine membrane-bounded compartment in the cytoplasm of eukaryotic cells, where lipids are synthesized and membrane-bound proteins and secretory proteins are made.
* ''rough ER'' - Endoplasmic reticulum with ribosomes on its cytosolic surface. Involved in the synthesis of secreted and membrane-bound proteins.
* ''smooth ER'' - Region of the endoplasmic reticulum not associated with ribosomes. It is involved in lipid synthesis.
* ''vesicle'' - Small, membrane-bounded, spherical organelle in the cytoplasm of a eukaryotic cell.
* ''Golgi apparatus (Golgi complex)'' - Membrane-bounded organelle in eukaryotic cells in which proteins and lipids transferred from the endoplasmic reticulum are modified and sorted. It is the site of synthesis of many cell wall polysaccharides in plants and extracellular matrix glycosaminoglycans in animal cells.
* ''mitochondria'' - Membrane-bounded organelle, about the size of a bacterium, that carries out oxidative phosphorylation and produces most of the ATP in eukaryotic cells. The "powerhouse of the cell".
* ''symbiosis'' - Intimate association between two organisms of different species from which both derive a long-term selective advantage.
* ''surface area to volume ratio'' - The physics of a system is different at difference SA to Vol ratios (e.g., to a flying insect, flapping their wings is more like it would be for humans to fly in water). The reason is that the mass of an object is proportional to its volume (which is a cubed measurement) while the interaction with the environment is through an object's surface area (which is a squared measurement). The larger an object, the smaller its surface area to volume ratio will be.


===Genetics===
For those with no or little prior programming and CS experience, you are highly recommended to take this course to build your programming skills (and for those interested in a CS MS, to get you ready to apply to the CS MS program).
[https://www.youtube.com/watch?v=9zwq8N4Ufd8&list=PL8dPuuaLjXtPW_ofbxdHNciuLoTRLPMgB&index=33 Crash Course Biology Video]
* ''allele'' - One of a set of alternative forms of a gene (the DNA letters of the gene). In a diploid cell each gene will have two, each occupying the same position (locus) on homologous chromosomes.
* ''dominant'' - In genetics, refers to the member of a pair of alleles that is expressed in the phenotype of the organism while the other allele is not, even though both alleles are present. Opposite of recessive.
* ''recessive'' - In genetics, refers to the member of a pair of alleles that fails to be expressed in the phenotype of the organism when the dominant allele is present. Also refers to the phenotype of an individual that has only the recessive allele.
* ''gene'' - Region of DNA that controls a discrete hereditary characteristic, usually corresponding to a single protein or RNA. This definition includes the entire functional unit, encompassing coding DNA sequences, noncoding regulatory DNA sequences, and introns.
* ''epigenetics'' - The study of heritable traits, or a stable change of cell function, that happen without changes to the DNA sequence.
* ''genome'' - The totality of genetic information belonging to a cell or an organism; in particular, the DNA that carries this information.
* ''model organism'' - A species, such as Drosophila melanogaster (fruit fly) or Escherichia coli (E coli), that has been studied intensively over a long period and thus serves as a "model" of the biology of a particular type of organism. Other such prominent organisms include: Mus musculus (house mouse), Saccharomyces cerevisiae (baker's yeast), Arabidopsis thaliana (thale cress, a plant).
* ''methylation'' - Addition of a methyl group to DNA. Extensive methylation of the cytosine base in CG sequences is used in vertebrates to keep genes in an inactive state.
* ''methyl group'' - Containing methyl (-CH3), a hydrophobic chemical group derived from methane (CH4).
* ''genotype'' - Genetic constitution (that is, what the letters are in the DNA) of an individual cell or organism, as opposed to the observed characteristics of the organism.
* ''phenotype'' - The observable character of a cell or an organism.


===DNA Structure===
=General Course Information=
[https://www.youtube.com/watch?v=4YNDB_zSzfE&list=PL8dPuuaLjXtPW_ofbxdHNciuLoTRLPMgB&index=34 Crash Course Biology Video]
'''Course website''' - https://cs.indstate.edu/wiki/index.php/CS_510


Some facts...
'''Your Instructor'''
* Size of human genome: about 3 billion nucleotides.
* DNA error rate in humans: around 1 in 10 billion (after proofreading and fixing mistakes).


Some things to understand...
[https://cs.indstate.edu/xsaunders/  Xavier Saunders], [http://mailto:xavier.saunders@indstate.edu xavier.saunders@indstate.edu] <br>
* DNA structure - the basic structure of double helix, sugar phosphate backbone, complementary base pairs.
''Office:'' In Person TBA and in Microsoft Teams, no phone number <br>
* DNA replication - roughly how it works - helicase unzips portion of double helix, DNA polymerase attaches new nucleotides to each side.
''Instructor Office Hours:'' TBA, TBA <br>
'''Lecture, Exam'''


Vocab...
''Lecture:'' Tuesday and Thursday  12:30-1:45 in person and Zoom(in Canvas, see below), and recorded<br>
* ''nucleotide'' - Nucleoside with one or more phosphate groups joined in ester linkages to the sugar moiety. DNA and RNA are polymers of nucleotides.
''Mid-term exam:'' TBA <br>
* ''sugar'' - Small carbohydrates with a monomer unit of general formula (CH2O)n. Examples are the monosaccharides glucose, fructose and mannose, and the disacharide sucrose (composed of a molecule of glucose and one of fructose linked together).
''Final exam:'' TBA<br>
* ''Carbohydrate'' - General term for sugars and related compounds containing carbon, hydrogen, and oxygen, usually with the empirical formula (CH2O)n.
''Asynchronous students:'' For students who will be mostly participating asynchronously even though the course is being offered synchronously, it is best if you are able to watch the most recent lecture ''before'' the next one occurs. You should make note of any questions or comments and send them to me by email or Teams. I will start the next lecture by answering any questions/comments that came via email or Teams.  
* ''base'' - A substance that can accept a proton in solution. The purines and pyrimidines in DNA and RNA are organic nitrogenous bases and are often referred to simply as bases.
* ''base pair'' - Two nucleotides in an RNA or DNA molecule that are held together by hydrogen bonds—for example, G pairs with C, and A with T or U (remember - AT, GC).
* ''double helix'' - The three-dimensional structure of DNA, in which two DNA chains held together by hydrogen bonding between the bases are wound into a helix.
* ''antiparallel strands'' - Describes the relative orientation of the two strands in a DNA double helix; the polarity of one strand is oriented in the opposite direction to that of the other.
* ''adenine, guanine, cytosine, thymine'' - nucleotide bases of DNA.
* ''adenine, guanine, cytosine, uracil'' - nucleotide bases of RNA.
* ''hydrogen bond'' - An electrostatic force of attraction between a hydrogen (H) atom which is covalently bonded to a more electronegative "donor" atom or group (Dn), and another electronegative atom bearing a lone pair of electrons—the hydrogen bond acceptor (Ac). Such an interacting system is generally denoted Dn−H···Ac, where the solid line denotes a polar covalent bond, and the dotted or dashed line indicates the hydrogen bond.
* ''chromosome'' - Structure composed of a very long DNA molecule and associated proteins that carries part (or all) of the hereditary information of an organism. Especially evident in plant and animal cells undergoing mitosis or meiosis, where each chromosome becomes condensed into a compact rodlike structure visible under the light microscope.
* ''enzyme'' - Protein that catalyzes a specific chemical reaction (normally a biological reaction).
* ''DNA polymerase'' - Enzyme that synthesizes DNA by joining nucleotides together using a DNA template as a guide.
* ''DNA helicase'' - Enzyme that is involved in opening the DNA helix into its single strands for DNA replication.
* ''mutation'' - Heritable change in the nucleotide sequence of a chromosome.


===DNA Transcription===
'''Prerequisites''' - none.
[https://www.youtube.com/watch?v=j6YaOqKORYY&list=PL8dPuuaLjXtPW_ofbxdHNciuLoTRLPMgB&index=35 Crash Course Biology Video]


Process to understand...
'''CRN numbers''' - 52482 , 52483
* Process of gene in DNA being made into a protein (DNA -> pre-mRNA -> mRNA -> protein). Should be able to draw a picture for each part and explain what happens.


Vocab...
'''Required text'''
* ''mRNA (messenger RNA)'' - RNA molecule that specifies the amino acid sequence of a protein. Produced by RNA splicing (in eukaryotes) from a larger RNA molecule made by RNA polymerase as a complementary copy of DNA. It is translated into protein in a process catalyzed by ribosomes.
* We will use the following free online sources.
* ''transcription (DNA transcription)'' - Copying of one strand of DNA into a complementary RNA sequence by the enzyme RNA polymerase.
* For python - '''[https://runestone.academy/ns/books/published/thinkcspy/index.html How to Think Like a Computer Scientist] (aka "Think CS")''', [https://docs.python.org/3/ Python Official Documentation], [https://pandas.pydata.org/docs/ Pandas Official Documentation] ([https://pandas.pydata.org/docs/getting_started/intro_tutorials/index.html tutorial] [https://pandas.pydata.org/Pandas_Cheat_Sheet.pdf cheat sheet])
* ''RNA polymerase'' - Enzyme that catalyzes the synthesis of an RNA molecule on a DNA template from nucleoside triphosphate precursors.
* For data structures and algorithms - '''[https://www.tutorialspoint.com/data_structures_algorithms/index.htm TutorialsPoint Data Structures and Algorithms]''', and we might supplement with the texts used in [[CS 500]].
* ''promoter'' - Nucleotide sequence in DNA to which RNA polymerase binds to begin transcription.
* For math content - [https://mfleck.cs.illinois.edu/building-blocks/index-sp2020.html Building Blocks for Theoretical Computer Science] by Margaret M. Fleck, [https://courses.csail.mit.edu/6.042/spring18/mcs.pdf Mathematics for Computer Science] by Eric Lehman, F Thomson Leighton, and Albert R Meyer
* ''pre-mRNA'' - RNA that was directly copied from a strand of DNA within the cell nucleus, and has not yet been spliced.
* Additional sources - as needed.
* ''poly-A tail'' - Multiple adenosine monophosphates; in other words, it is a stretch of RNA that has only adenine bases (the letter A). In eukaryotes, polyadenylation is part of the process that produces mature mRNA for translation. In many bacteria, the poly(A) tail promotes degradation of the mRNA. It, therefore, forms part of the larger process of gene expression.
** Datasets - [https://www.ncei.noaa.gov/cdo-web/datasets NOAA Weather Data] (specifically Daily Summaries / FTP). [https://www.ncei.noaa.gov/pub/data/ghcn/daily/by_station/USW00093814.csv.gz CVG]
* ''RNA splicing'' - Process in which intron sequences are excised from RNA transcripts in the nucleus during formation of messenger and other RNAs.
** [https://biopython.org/wiki/SeqIO BioPython SeqIO]
* ''intron'' - Noncoding region of a eukaryotic gene that is transcribed into an RNA molecule but is then excised by RNA splicing during production of the messenger RNA or other functional structural RNA. Remember - "in between".
** Jupyter - [https://www.ibm.com/docs/en/watson-studio-local/1.2.3?topic=notebooks-markdown-jupyter-cheatsheet markdown cheat sheet], [https://code.visualstudio.com/docs/datascience/jupyter-notebooks Jupyter in VS Code], [https://code.visualstudio.com/docs/datascience/data-science-tutorial Tutorial with VS Code and Data Science]
* ''exon'' - Segment of a eukaryotic gene that consists of a sequence of nucleotides that will be represented in messenger RNA or the final transfer RNA or ribosomal RNA. In protein-coding genes, exons encode amino acids in the protein. An exon is usually adjacent to a noncoding DNA segment called an intron.
** Anaconda - install from [https://www.anaconda.com/products/distribution here]
* ''ribosome'' - Particle composed of ribosomal RNAs and ribosomal proteins that associates with messenger RNA and catalyzes the synthesis of protein.
* ''alternative splicing'' - The production of different proteins from the same RNA transcript by splicing it in different ways.
* ''central dogma (of molecular biology)'' - DNA makes RNA, and RNA makes protein. This is generally true.


===RNA Translation===
'''Class notes''' - I may try to keep notes in OneNote, and they may be available PDF. A link will be here as soon as it is in working order. Note that you will need to authenticate with your ISU account to view the notebook.
[https://www.youtube.com/watch?v=6ulXau2HyHg&list=PL8dPuuaLjXtPW_ofbxdHNciuLoTRLPMgB&index=36 Crash Course Biology Video]


Process to understand...
'''Code from lectures''' - Some code from lectures will be at https://cs.indstate.edu/xsaunders/cs510/
* Process of gene in DNA being made into a protein


Vocab...
'''Assignments and Announcements'''
* ''peptide'' - Short chains of amino acids linked by peptide bonds.
Current Assignment, Announcements, and Notes are available on the class' canvas.  
* ''protein'' - The major macromolecular constituent of cells. A linear polymer of amino acids linked together by peptide bonds in a specific sequence.
* ''translation (RNA translation)'' - Process by which the sequence of nucleotides in a messenger RNA molecule directs the incorporation of amino acids into protein. It occurs on a ribosome.
* ''amino acids'' - Organic molecule containing both an amino group and a carboxyl group. Those that serve as the building blocks of proteins are alpha amino acids, having both the amino and carboxyl groups linked to the same carbon atom.
* ''codon'' - Sequence of three nucleotides in a DNA or messenger RNA molecule that represents the instruction for incorporation of a specific amino acid into a growing polypeptide chain.
* ''stop codon'' - A codon that signals the termination of the translation process of the current protein.
* ''ribosome'' - Particle composed of ribosomal RNAs and ribosomal proteins that associates with messenger RNA and catalyzes the synthesis of protein.
* ''ribosomal RNA (rRNA)'' - Any one of a number of specific RNA molecules that form part of the structure of a ribosome and participate in the synthesis of proteins. Often distinguished by their sedimentation coefficient, such as 28S rRNA or 5S rRNA.
* ''start codon'' - The first codon of a messenger RNA (mRNA) transcript translated by a ribosome. Normally codes for methionine in eukaryotes (letter M).
* ''methionine'' - Normally the first amino acid in a peptide sequence in eukaryotes.
* ''transfer RNA (tRNA)'' - Set of small RNA molecules used in protein synthesis as an interface (adaptor) between messenger RNA and amino acids. Each type of tRNA molecule is covalently linked to a particular amino acid.
* ''anticodon'' - Sequence of three nucleotides in a transfer RNA molecule that is complementary to a three-nucleotide codon in a messenger RNA molecule.
* ''polypeptide'' - A larger linear polymer composed of multiple amino acids. Term is interchangeable with "protein".
* ''protein folding'' - The physical process by which a protein, after synthesis by a ribosome as a linear chain of amino acids, changes from an unstable random coil into a more ordered three-dimensional structure.


===Gene Expression===
[https://www.youtube.com/watch?v=NeeaP8pp9HI&list=PL8dPuuaLjXtPW_ofbxdHNciuLoTRLPMgB&index=37 Crash Course Biology Video]


Some facts...
=Course Description and Content=
* How many genes are there? About 20,000 in humans.
* How long or large are genes? Can be as small as a few hundred DNA bases, or as large as than 2 million bases. The average size of a protein-coding gene in humans is around 62 kilobases (kb), and the median length is about 24 kb. Note that of the 62kb of a gene, on average 60kb of this is introns (regions that are spliced out before being translated to protein), so the amount that is actually translated to protein is an average of about 2kb.


Vocab...
'''Course Description'''  
* ''gene regulation'' - Includes a wide range of mechanisms that are used by cells to increase or decrease the production of specific gene products (i.e., proteins).
* ''gene'' - Region of DNA that controls a discrete hereditary characteristic, usually corresponding to a single protein or RNA. This definition includes the entire functional unit, encompassing coding DNA sequences, noncoding regulatory DNA sequences, and introns.
* ''differential gene expression'' - The process where different genes are activated in a cell, giving that cell a specific purpose that defines its function.
* ''transcriptional regulation'' - The means by which a cell regulates the conversion of DNA to RNA, thereby orchestrating gene activity.
* ''non-coding DNA'' - Components of an organism's DNA that do not encode protein sequences. Some is transcribed into functional non-coding RNA molecules.
* ''transcription factor'' - Term loosely applied to any protein required to initiate or regulate transcription in eukaryotes. Includes both gene regulatory proteins.
* ''micro RNA (miRNA)'' - Small, single-stranded, non-coding RNA molecules containing 21 to 23 nucleotides. Found in plants, animals and some viruses, these are involved in RNA silencing and post-transcriptional regulation of gene expression.
* ''small interfering RNA'' (aka short interfering RNA, or silencing RNA) - A class of double-stranded RNA at first non-coding RNA molecules, typically 20–24 base pairs in length, similar to miRNA, and operating within the RNA interference pathway.
* ''epigenetic mechanisms'' - Heritable traits, or a stable change of cell function, that happen without changes to the DNA sequence. One example is DNA methylation.
* ''histone'' - One of a group of small abundant proteins, rich in arginine and lysine, four of which form the nucleosome on the DNA in eukaryotic chromosomes.
* ''DNA methylation'' - Addition of a methyl group to DNA. Extensive methylation of the cytosine base in CG sequences is used in vertebrates to keep genes in an inactive state.
* ''post-transcriptional regulation'' - The control of gene expression at the RNA level. It occurs once the RNA polymerase has been attached to the gene's promoter and is synthesizing the nucleotide sequence.


===Genetic Mutations===
The catalog description for this course is: "This is a first course in programming and computer science that is aimed at those with little to no previous experience in these areas. The main learning outcomes are proficiency in a useful and modern programming language and proficiency in basic data structures and algorithms. The course also prepares students to be ready to apply to the computer science master’s program. Notes: Not available to students with 9 or more CS graduate coursework."
[https://www.youtube.com/watch?v=8HfzUgxumVE&list=PL8dPuuaLjXtPW_ofbxdHNciuLoTRLPMgB&index=38 Crash Course Biology Video]


Vocab...
The following is the compressed plan of the entire semester.
* ''mutation'' - Heritable change in the nucleotide sequence of a chromosome.
* ''DNA replication error'' - In the process of copying DNA, an error is made in copying.
* ''mutagen'' - A physical or chemical agent that permanently changes genetic material, usually DNA, in an organism and thus increases the frequency of mutations above the natural background level.
* ''somatic mutation'' - Mutation in any cell of a plant or animal other than a germ cell (reproductive cell) or germ-cell precursor.
* ''substitution'' - A type of mutation that replaces one nucleotide or amino acid with another.
* ''missense mutation'' - When a single nucleotide base in a DNA sequence is swapped for another one, resulting in a different codon and, therefore, a different amino acid.
* ''nonsense mutation'' - A genetic alteration that causes the premature termination of a protein.
* ''silent mutation'' - Base substitutions that result in no change of the amino acid or amino acid functionality when the altered messenger RNA (mRNA) is translated.
* ''frameshift mutation'' - A genetic mutation caused by insertions or deletions of a number of nucleotides in a DNA sequence that is not divisible by three.
* ''gene therapy'' - A medical technology that aims to produce a therapeutic effect through the manipulation of gene expression or through altering the biological properties of living cells.
* ''CRISPR/Cas9'' - Edits genes by precisely cutting DNA and then harnessing natural DNA repair processes to modify the gene in the desired manner. The system has two components: the Cas9 enzyme and a guide RNA.
* ''pharmacogenomics'' - The study of the role of the genome in drug response.


===Review...===
'''Course Outline'''
Note that there are /a lot/ of terms so far, and many of the definitions are kind of complicated sounding (since they are from a textbook or Wikipedia). The following are the most important terms and concepts you need to know.
* Getting started - system setup, linux, math background, development on your personal computer.
* Parts of the cell - be able to identify most of them in a picture or diagram.
* Python programming basics - operators, reserved words, data types, base systems, overflow.
* DNA and RNA are made of nucleotides ("bases"). DNA is double-stranded, RNA is single-stranded. DNA letters are A, T, C, G. RNA letters are A, U, C, G. A pairs with T or U, and C pairs with G. DNA is stable for long periods of time. RNA is not stable for long periods of time - what is happening right now in the cell.
* Python programming containers - strings, tuples, lists, dictionaries, sets.
* Gene - region of DNA that is transcribed to mRNA and then normally to protein.
* Python programming object-oriented
* Genes are a small part of the human genome. The parts of the genome that are not genes are also NOT "junk DNA". Regions that are not genes often have an impact on gene expression.
* Python programming style - good programming style for reliability, readability, extensibility, security.
* Gene expression - a gene that is actively been transcribed. At any given time in a cell, only some of the genes  are active.
* Data structures - understanding/use of most important data structures - arrays, linked lists, binary search trees, hash tables, heaps. Implementation of some of these in Python.
* Basic steps going from DNA to protein: pre-mRNA, mRNA, protein. Be able to draw a picture of the basic steps.
* Algorithms - understanding/use of some basic algorithms - sorting (various), binary/linear search (and uses) - including some algorithms that are each of - greedy, heuristic, randomized, brute force / backtracking.
* All cells in an organism have the same DNA (except for mutations that have happened during DNA replication). Cells in general do /NOT/ all have the same genes being expressed. Cells in the same tissue will tend to have mostly the same genes being expressed.
* Vocab - additional terms, algorithms, concepts at a shallow level.
* DNA in chromosomes are normally stored in a condensed form (twisted, balled up); this is called chromatin. Only the genes that are accessible are able to be transcribed.
* In eukaryotes, a gene contains introns and exons. introns are regions that are part of the gene but are spliced out before the gene is translated to protein. exons are the regions that /are/ translated to protein. In prokaryotes this is different (most genes do not contain introns).
* Human genome is about 3 billion base pairs of DNA. There are around 20,000 genes that code for proteins, with each gene being a median size of about 24,000 letters of DNA, and an average size of about 62,000 letters of DNA. Of the 62,000 letters of DNA, on average about 60,000 are spliced out before translating to protein (these are the introns), and on average about 2,000 are translated to protein (the exons). So about 1/3 of the human genome is contained within a gene region, but only 1-2% of the genome are letters that get translated to proteins.


==Sequencing==
'''Learning Outcomes'''
* System setup - personal computer setup for both remote (connecting to CS server with terminal, sftp, X windows) and local development (editor, compiler/interpreter).
* Linux - proficient using the Linux terminal for development.
* Math background - proficient in math background needed for data structures and algorithms.
* Personal computer - is setup for development so you can do coursework from your home computer as well.
* Python programming - understanding of most language features, proficient in writing code using the most common, write code using good programming style.
* Data structures - understanding of operations, efficiency, use cases, can use builtin python data structures and write python code for some data structures that are not included in python.
* Algorithms - understanding of basic algorithms, arguments for correctness and efficiency, can use the algorithms to solve problems efficiently.


===RNA Sequencing===
=Grading and Assignments=
Video to watch - [https://www.youtube.com/watch?v=tlf6wYJrwKY&list=PLblh5JKOoLUJo2Q6xK4tZElbIvAACEykp high throughput RNA seq (StatQuest)]


Processes to understand...
We will be doing what I am calling "achievements-based" grading. There are a series of skills, knowledge, and experiences that I want you to achieve.  Your final letter grades will be based strictly on which of these you have completed. For each achievement, you can achieve the rating of incomplete, pass-, pass, pass+The following will be our starting point for how letter grades will be assigned. I will reevaluate this throughout the term to make sure we are on track. I will also be setting the standards for pass-, pass, and pass+ for each of the achievements as we get to them in the course.
* Steps going from a biological experiment all the way through to having RNA seq counts data ready to analyze.  
** main steps for RNA-seq - prepare library (isolate RNA, break RNA into fragments, convert into double stranded DNA, add sequencing adapters, PCR amplify, quality control), sequence library, analyze results
* [https://en.wikipedia.org/wiki/FASTQ_format FastQ file] - basic structure, typical properties, quality scores.   
* Typical size of RNA sequencing - millions of reads for one sequencing run, 50-1000 bp per read.
* Difficulties in sequence alignment - given read might not align exactly to the reference genome (due to mutations or differences between individuals), sequence might align in multiple locations
* Best practices in sequencing - be as consistent as possible with the samples through the entire process. This includes saving up the sequence libraries until all experiments are complete so the samples can be sequenced in one sequencing run if possible (due to "batch effects" where each sequencing run will have slightly different biases or tendencies for mistakes).


Vocab...
'''C - lowest passing grade in a grad course'''  
* ''RNA-seq'' - RNA sequencing, which is determining the sequences for mRNA that is currently in a sample.
* Pass or higher achievement for ''all'' of the following
* ''gene expression'' - the amount that a gene is actively being transcribed (from DNA to mRNA).
* Terminal text editors - basic use
* ''mutated cell/sample'' - a cell or sample that has some mutation (usually induced by the researchers in order to see what difference this will make in the organism).
* Linux terminal commands, files - basic use
* ''wild type'' - experiments using model organisms often have one "control group" of unmodified individuals to see what difference the "treatment group" has from the control group.
* Math for CS basics (base systems, rules of exponents, logs, logic)
* ''chromosome'' - Structure composed of a very long DNA molecule and associated proteins that carries part (or all) of the hereditary information of an organism. Especially evident in plant and animal cells undergoing mitosis or meiosis, where each chromosome becomes condensed into a compact rodlike structure visible under the light microscope.
* Python - basic development in the terminal on the CS server
* ''gene'' - Region of DNA that controls a discrete hereditary characteristic, usually corresponding to a single protein or RNA. This definition includes the entire functional unit, encompassing coding DNA sequences, noncoding regulatory DNA sequences, and introns.
* Python - basic development on your personal computer
* ''mRNA transcript'' - RNA product of DNA transcription (the RNA that was produced by copying a gene from DNA).
* Text editor - on your personal computer
* ''high throughput sequencing'' - the comprehensive term used to describe technologies that sequence DNA and RNA in a rapid and cost-effective manner.
* File transfers - between personal computer and CS server
* ''sequencing library'' - a biological sample that is composed of the RNA or DNA that is ready to be sequenced. If sequencing is performed at a later date or offsite, the sequencing library will be made and stored (typically in a -80C freezer) until needed.
* Python programming basics - knowledge of keywords, concepts, operators, evaluation of expressions
* ''flow cell'' - optical cells used through which a sample is passed for detection before being measured or counted by electrometric or optical means.
* Python programming containers - knowledge of basic operations on tuples, lists, sets, dictionaries
* ''fluorescent probes'' - molecules that absorb light of a specific wavelength and emit light of a different, typically longer, wavelength (a process known as fluorescence), and are used to study biological samples (i.e., will show up in an image or to a device to indicate the presence of whatever the researcher is measuring).
* ''quality score'' - a value indicating the confidence the sequencer had that a given nucleotide is correct.
* ''fastq file'' - a file format for storing sequence data (RNA, DNA, or peptide) that contains a series of sequences together with their identifiers and quality scores.
* ''garbage reads'' - low quality or shorter than expected reads that should be removed before further analysis. If there are too many of these reads, then there may be some issue with the sequencing or sample preparation.
* ''sequence alignment'' - performing a string matching to determine where in a genome a given sequence matches, possibly allowing some differences in the sequence (i.e., if there is no exact match).
* ''read counts per gene'' - for each gene, count is the number of individual RNA short reads that aligned to the DNA sequence of that gene.
* ''bulk RNA sequencing'' - term used to indicate sequencing of a sample of tissue that will generally have at least millions of cells, including some of different cell types.
* ''single cell RNA-seq'' - term used to indicate sequencing that attempts to sequence the RNA or DNA in individual cells.
* ''normalization'' - a transformation done to read counts so that the counts for different samples can be compared even though there may be a different total number of counts from different genes.
* ''PCA (principle components analysis)'' - a process that performs a mathematical transformation of a matrix that allows graphing the rows or columns of the matrix in two dimensions, used as a form of clustering.
* ''CPM (counts per million)'' - in a read counts file, indicates a scaling has been done to divide each sample's counts by a scaling factor to account for some samples having a higher total number of reads than others.
* ''logCPM (log counts per million)'' - logarithm of the CPM.
* ''logFC (log fold change)'' - difference in the logarithm of two values (e.g., difference in logarithm between read counts for a gene between two samples).
* ''PCR (polymerase chain reaction)'' - Technique for amplifying specific regions of DNA by the use of sequence-specific primers and multiple cycles of DNA synthesis, each cycle being followed by a brief heat treatment to separate complementary strands. This can also used to amplify an entire sample of DNA or RNA (by attaching primers to all segments in a sample) and is a standard part of the procedure for sequencing RNA or DNA.
* ''reference genome'' - a consensus genome created for a particular organism that is meant to represent a "normal" genome. Note that each individual organism (e.g., two different people) do not have identical genomes, so a reference genome is necessarily an approximation of what the "normal" genome is.


===CHIP Sequencing===
'''B - satisfactory'''
Video to watch - [https://www.youtube.com/watch?v=nkWGmaYRues&list=PLblh5JKOoLUJo2Q6xK4tZElbIvAACEykp&index=2 CHIP Seq (StatQuest)]
* In addition to the above...
* Object-oriented programming in python
* Python programming style - good programming style for reliability, readability, extensibility, security.
* Data structures - good understanding of how operations are implemented for - arrays, linked lists, binary search trees, hash tables, heaps. Able to properly "play computer" with these.
* Algorithms - good understanding of several linear and binary search, several sorting algorithms (including one efficient one).
* Vocab - some additional terms, algorithms, concepts at a shallow level.


===Single Cell Sequencing===
'''A - good/excellent'''
Video to watch - [https://www.youtube.com/watch?v=k9VFNLLQP8c single cell RNA seq (StatQuest)]
* In addition to the above...
* Pass+ rating on most of the above
* Pass or higher achievement for ''all'' of the following
* Basic data structures from B level - can write python code to implement the data structures.
* Algorithms - can write python code to implement linear and binary search, several sorting algorithms (including one efficient one).
* Algorithms - good understanding of some algorithms for each of - greedy, heuristic, randomized, brute force / backtracking.


===ATAC Sequencing===
Achievements can be earned based on quizzes, assignments, in-class work, and exams. Rather than having numerical scores for these, I will use them to mark off your achievements. Note that achievements can be "lost" if you demonstrate a skill early in the term and then demonstrate a lack of the skill later in the term. I expect this will not normally be the case, but I will continue to evaluate you based on all of the skills throughout the term.
Video to watch - [https://www.youtube.com/watch?v=L2Kxaq9yRE4 ATAC seq (Activ Motif)]


===Other===
'''Late Work''' -
Assignments will generally be available to still handin for around a week after their due date.  Once the solutions are posted and discussed, late submissions will no longer be graded.  Quizzes will normally need to be taken on the day they are due, or perhaps within a few days of when they are due.  Solutions will normally be discussed or posted within a week of their due date.  Not accepting late work that is more than about a week old is in part because it takes much longer to grade quizzes/assignments that are no longer super fresh in the instructor's head, and in part to try to keep everyone in the class working on the same material. 


* CITE seq
'''Start Assignments and Quiz Studying Early''' -
* Flow cytometry
I suggest attempting an assignment the day it is given, or the day after, so that if you have a problem you can ask early. If you continue to have problems in trying to complete the assignment, you will have time to ask again. Many of the assignments require thought and problem solving, which takes "time on the calendar" not just "time on the clock". By that I mean that spending an hour on 3 consecutive days is likely to be more productive than trying to spend 3 hours at once on the assignment.
* Western blot
* Northern blot
* Gell electrophoresis
* transcription
* reverse transcription
* cDNA
* polyA tail
* lyse
* reverse transcription
* cDNA library
* 96 well plate
* aliquot
* mass cytometry
* mass spec


===Review===
'''Expected Amount of Work''' -
After putting all of that together, I will summarize here what is most important to remember and understand.
My expectation is that an average student will spend about 4-8 hours OUTSIDE of class each week (that is in addition to class time or viewing lecture videos) WORKING PRODUCTIVELY/EFFICIENTLY (not just staring at the computer) to complete their coursework for this class. Some students may spend less time than this, and some students will spend more. If you find yourself spending the upper end of, or exceeding these hours contact me; bring your notes and study schedule.  


=Programs to Install=
This is the foundation for the rest of CS, so it definitely pays off to do your best here.  
See [https://indstate-edu.zoom.us/rec/share/4agpiLQFQiSDoM2JDrmbqwvjssOy-_R_HD7RqjR4LVxw_w6YmwOFJhQzrIp4TeiM.50WPEpLq8bUZOlKZ?startTime=1720539064000 this video] for installing a bunch of these programs on Windows 10 in the Windows Subsystem for Linux (WSL).


'''On Windows''' - If using Windows, enable the Windows Subsystem for Linux (WSL): https://learn.microsoft.com/en-us/windows/wsl/install and then install Ubuntu (from the MS app store, free). This is so we can run programs that are only available on Linux/Mac. After you have Ubuntu installed, start it up, and run the following command: sudo apt-get update. That will get an updated list of packages. You can then install packages like this: sudo apt-get install emacs. If you download programs and want them to be in your terminal path, then you will edit your .bashrc (or .tcshrc or .zshrc, or whatever rc file for your terminal) and put a line to set the PATH to include the directory to where you have installed the new files. If you have an installation file to download you can do this: wget https://some_link. You can then extract it and put it where you want.
Note - please find a way to spend enough time on this class (the investment will pay off in terms of skills, being able to get a job, etc.).


'''Compression''' - if your OS does not unzip certain zip files (e.g., .gz and .tar, which are not natively supported in some Windows versions),  install [https://www.7-zip.org/ 7-Zip]MacOS and Linux natively support most compression formats that we will need.
'''Grade Meanings''' -  
The letter grades are intended to have the following rough meaning. The list of achievements needed for each was chosen with this in mind.
* A+/A: You understand everything and probably could teach the course yourself.
* B+/A-: You understand nearly everything, and should be all set to use this knowledge in other courses or in a job.
* C/C+/B-/B: Some things you understand very well and others you don't (more towards the former for a B and more towards the latter for a C).
* D-/D+/C-: You did put some effort in, and understand many things at a high level, but you haven't mastered the details well enough to be able to use this knowledge in the future'''Note that the lowest grade for grad courses is a C, so if you fall in the range below C then your letter grade will be an F.'''
* F: Normally, students that get an F simply stopped doing the required work at some point.


'''R''' - first install [https://cloud.r-project.org/ R], and then install [https://www.rstudio.com/products/rstudio/ RStudio Desktop (free)] (the IDE I normally use for workingo n R).
=CS-Specific Items=
This section contains items that are generally the same for all CS courses (and in particular those taught by this instructor).


'''MS Office''' - ISU faculty/staff/students can install MS Office 365 for free. Start by logging into https://portal.office.com with your ISU credentials, and click on the button "Install and more". You can download and install for Windows and Mac OS (not available for Linux).  You can also use MS Office programs in the browser on any OS. You should install Excel on your computer for using to look at .csv and .tsv files.
==CS Course Policies==
Note that this course follows all standard CS course policies. In particular, (a) cheating/plagiarism by graduate students results in an F in the course, (b) and there will be no makeup exams. See http://cs.indstate.edu/info/policies.html for details.


'''Gitlab at ISU''' - Login to https://git.indstate.edu so you can be added onto projects there.
==Lab Help==
We have a few lab assistants who are available to help students in beginning computer science courses. Please see https://cs.indstate.edu/wiki/index.php/Unix_Lab_and_Help for details. The lab hours are in a calendar on the CS homepage, at http://cs.indstate.edu/info/index.php#lab_hours. You can join the lab when working on your programs. You can ask the lab assistants to look at your programs, and you can work with any other CS students that are there (you could use the lab as a regular meeting place to work with your classmates).


'''SRA Toolkit''' - needed for downloading sequence files from NCBI. [https://github.com/ncbi/sra-tools/wiki SRA Toolkit]. Note for Mac - you will likely need to approve each program within the toolkit the first time you run it (like with many programs not security signed by Apple, it doesn't let you run it, then you go to Settings / Security, and click an appropriate button to allow the program to run, and then try running again). Note - you will want to update the PATH in your shell rc file (.zshrc, .tcshrc, etc.) so that it is set automatically when you start a terminal.
==Course Announcements==
Announcements regarding the course will be made both during class and via email to your @sycamores.indstate.edu email address. You should regularly check this email account or have it forwarded to an account that you check regularly. You can set the account to forward by logging into your indstate.edu email online (if you aren't able to find the option, try a different browser or search online for things like - outlook online forward email setting).


'''Short reads sequence aligner''' - [https://daehwankimlab.github.io/hisat2/download/ hisat2]. This is only available for Linux and Mac, so if using Windows then it will be run under WSL.
==Classroom conduct==
You may not use cell phones, iPods/music players, etc. during class. You should be civil and respectful to both the instructor and your classmates, and you should arrive to class a few minutes before the scheduled lecture so you are ready for lecture to begin on time. You may use your computer during class if you are using it to follow along with the examples that are being discussed. You should avoid spending time on email, Facebook, work on other courses, etc. during the lecture for this class (be fully present wherever you are, make the most of each experience).


'''Adapter trimming''' - [https://cutadapt.readthedocs.io/en/stable/ cutadapt] - for removing adapters from RNA/DNA sequence files.
==Academic Integrity==
Please follow these guidelines to avoid problems with academic misconduct in this course:


'''Quality check''' - [http://www.bioinformatics.babraham.ac.uk/projects/fastqc FastQC] - for checking quality of FastQ files. Note that you need to have a Java runtime environment installed for this to work.
''Homework:'' You may discuss the homework assignments, but should solve and finish them on your own. To make sure you are not violating this, if you discuss with someone, you should DESTROY any work or evidence of the discussion, go your separate ways, SPEND at least an hour doing something completely unrelated to the assignment, and then you should be able to RECREATE the program/solution on your own, then turn that in. If you cannot recreate the solution on your own, then it is not your work, and you should not turn it in.


'''SAM/BAM files''' - [https://www.htslib.org/ samtools] - for dealing with SAM and BAM sequence files.
''Note on sources:'' if you use some other source, the web or whatever, you better cite it! Not doing so is plagiarism.


'''reads counting''' - [https://htseq.readthedocs.io/en/release_0.11.1/index.html htseq-count] - for taking an alignment of a reads file and producing a counts matrix.
''Exams:'' This should be clear no cheating during exams. Each instructor has different rules for what is allowed on exams in terms of notes, etc. If not noted otherwise, you should assume that a quiz or exam is closed notes, no computer, no calculator.


'''downloading metadata''' - [https://www.ncbi.nlm.nih.gov/books/NBK179288/ Entrez Direct] - can be used for pulling information from Entrez databases.
''Projects:'' You should not copy from the Internet or anywhere else. The project should be your own work. It will be fairly obvious to me if you do copy code from the Internet, and the consequences will be at the least a 0 on the project.
If cheating is observed, you will at the least receive a 0 for the assignment (and may receive an F for the course), and I will file a Notification of Academic Integrity Violation Report with Student Judicial Programs, as required by the university's policy on Academic Integrity. A student who is caught cheating twice (whether in a single course or different courses) is likely to be brought before the All University Court hearing panel, which can impose sanctions up to and including suspension/expulsion. See http://www.indstate.edu/sjp/docs/code.pdf and http://www.indstate.edu/academicintegrity/ for more information.


=Data Files=
Please ask the instructor if you have doubts about what is considered cheating in this course.
This section contains links to sample data files. See the following video walking through some of these data files - [https://indstate-edu.zoom.us/rec/play/SqPfjW92rA1BezrA1Nnv_sFtkQbFFsmKc6Gbjx6wSa5f14hTPObsUyqLwy-DglhbINLcn-uH6k0fkQ09._Sg8jZPwlHxRMFjF video looking at GSE85331 gene expression and RNA-seq files].


'''Gene expression (from RNA-seq)''' - [https://ftp.ncbi.nlm.nih.gov/geo/series/GSE85nnn/GSE85331/suppl/GSE85331%5Fall.gene.FPKM.output.replicates.txt.gz GSE85331 gene expression file], which comes from [https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE85331 GSE85331] at NCBI Gene Expression Omnibus (GEO).
==Office hours (using Teams)==
Office hours will be through Microsoft Teams by default. If you would like to meet in person you should reserve an appointment using outlook calendar. I am normally in my office during my listed office hours, but by making an appointment you can be more certain. For meeting through Teams, you should start Teams in your browser or start the application. You should be logged in using your ISU credentials. Once you have Teams open you can message me to ask me questions or to ask to talk. We can use Teams to message (better than emailing back and forth repeatedly if you have questions about something that you just want to write about) or to talk and share screens (e.g., to take a look at your code). I normally have Teams open on my computer all of the time, including during my office hours. During my office hours I will normally reply right away; at other times I will reply when I get a chance.


'''FastQ''' - [https://trace.ncbi.nlm.nih.gov/Traces/?view=run_browser&acc=SRR4011874&display=metadata C15_0_1 sample] from GSE85331. This is a link to page that hosts the reads data. A FastQ file can be downloaded using the SRA Toolkit.
==Canvas==
The course has a canvas site. Click https://indstate.instructure.com/ to go to canvas. You should see this course listed under your courses for the current term. If you don't you may need to click on the Courses icon and then click the "All courses" link. The canvas site is used for giving you your grades, for quizzes/exams, and for getting to online lectures (which are done using Zoom). Announcements will be sent through canvas and to your university email. Links and such will be kept on this website.


'''Genome FastA''', '''Genome GFF''' - hg19 version of the human genome (which was used in the GSE85331 study as the reference genome), available [https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_000001405.13/ from NCBI] or [https://genome.ucsc.edu/cgi-bin/hgGateway?db=hg19 from UCSC] (which is what GSE85331 said they used).
==Lectures (using Zoom) ==
Here at ISU section numbers starting with the number 3 (e.g.3xx: 301, 302, etc.) are generally online sections. There are 2 types of online sections, synchronous online and asynchronous online. Sections that are synchronous should be joined at the regularly scheduled time of the course, whereas sections that are asynchronous generally keep up with the material independently without regularly scheduled meetings. In general async sections are more difficult to stay on top of, and require a great deal of self-discipline (it is much easier to think "I can watch the videos tomorrow" and just get behind). So if you are in one of these sections make sure you get off to a strong start, and ask for help sooner rather than later. If you are in an online section, check your course schedule for course meeting times; if you have a meeting time, then your section is synchronous, otherwise it is asynchronous (or there is an error in the system).


==Practice==
This course has a 301 section (synchronous online) and 001 section (face to face). Students in either section can participate in whatever way you need to.
Practice with some of the data files...


===Practice with GSE85331 gene expression file in Excel===
For ISU's links to information on getting started with Zoom, see https://indstate.teamdynamix.com/TDClient/1851/Portal/KB/ArticleDet?ID=107534. You can also see the information linked at https://www.indstate.edu/services/student-success/cfss. You will get to the lectures for this course by going to Canvas, select this course, click Modules on the menu on the left, and click on the Zoom module. Once there you should see a schedule of lectures and be able to view recorded lectures. Note that you should install the Zoom application for your computer, and you will need to be logged into to Zoom with your ISU credentials to be able to connect. Also note that the lectures are recorded and only available to those in our class. Recorded lectures normally appear later the same day as the lecture.
Start with the [https://ftp.ncbi.nlm.nih.gov/geo/series/GSE85nnn/GSE85331/suppl/GSE85331%5Fall.gene.FPKM.output.replicates.txt.gz GSE8331 gene expression file]. Note that you will be adding columns and rows to the file. You should save the file as an xlsx file so formulas and such will be saved properly. Before doing anything else, you should create a new sheet called "log2" and copy/paste in the expression values with the formula log2(value+1). Then you should add columns to the end that will be the max, min, average, median, max-min, median of day0, median of CM, day0-CM, each of those being based on that row's expression values. And you should add rows to the end that are the max, min, average, median, max-min of that column. This should give you enough to answer the following questions. For some of these you will sort based on one of the columns or rows. See the following video walking you through this process - [https://indstate-edu.zoom.us/rec/play/5bBN25wjrLGeS1zijV1Ju0zzbY21D2AJ7ERi-6Dx8nwK34wPuqB8qdRpWsCK21jV_nBzExU2GltR3cqI.-a6D2hjqZpf3E2UI video walkthrough].
# How many genes are in this file? How many samples?
# Which sample has the highest expression for gene TNNT2? What is that expression value? ''For this question and all the rest, you should be using the log2 expression values created as instructed above.''
# What is the highest expression value in the entire file - which gene, which sample, and what is the expression value?
# Which gene has the highest median expression value? What is that value?
# Which sample has the highest median expression value, and what is that value? Which sample has the lowest median expression value, and what is that value?
# List the top 10 genes in terms of the median difference between the CM (day 30) values and the day 0 values. Give the genes and the difference. ''Note - you will be taking a median of 8 columns (the CM ones), subtract the median of 8 other columns (the day 0 ones).''
# For the following genes, describe their expression profile (at what time point are they highest, where lowest, etc.): POU5F1, T, GATA4, TBX2, TBX5, TNNT2.


=R Programming=
Note that if you have not used Zoom with your ISU account previously, you need to go to https://indstate-edu.zoom.us and login with your ISU email address and password to get it setup.


First get R installed (see the programs to install section above). Next...
==Participating online==
* [https://indstate-edu.zoom.us/rec/play/IFMuCZHzVFO8VLgm_ltwUjLY0tG_te0gyfQHFkXksEAzh8xC08FhFraR9sJc-kv-GbiURPDBxiELud2G.GOFdI58_Gh6lzMlj watch this video introducing R].
If you are participating online, please see the information at https://www.indstate.edu/services/student-success/cfss about participating in online courses. You are expected to either join lectures live through Zoom or watch the recordings once they are available. You will complete assignments, quizzes, and exams on the same schedule as the rest of the class. For quizzes and exams you will normally have a 24 hour period during which to take the quiz/exam (note that different students will have slightly different questions and any communication between students about quiz/exam content is academic misconduct).
* Look through [https://docs.google.com/presentation/d/1E1LSiWITG6iCqnmYeWi_CBSifQ_3dMVXxdpWsT30r4w Jeff's intro to R slides].
* Look through [https://cs.indstate.edu/wiki/index.php/R_Programming_-_Getting_Started R Getting Started].
* Pick another R tutorial or resource to start looking through (see the suggestions in R Getting Started, or ask the internet).
* Take R files from class, download them, try them out yourself. Files... [https://cs.indstate.edu/~jkinne/bioinformatics/R/gse85331_a.R gse85331_a.R] and [https://indstate-edu.zoom.us/rec/play/6FXvNrsK8pK9MboW-WfBjUXf7I-vy7EG7yI5LYg4enRCOC5e5PCUvoui_y9RGAKS6tQ5Xz2GmMSB4SOz.WG3UIiA2dhfkB0K_ video developing that file]. [https://cs.indstate.edu/~jkinne/bioinformatics/R/gse85331_b.R gse85331_b.R] and [https://indstate-edu.zoom.us/rec/play/jwbn6DxsUeGky_Op79OPjmdEmV40aZpG7Ojvob-dMeozjBOeUa3J3v81oAXQRMvIvARDs4pPAb8PKZfR.7vr1GWMY812yfoJv video developing that file]. [https://cs.indstate.edu/~jkinne/bioinformatics/R/GSE244362_lab3.R GSE244362_lab3.R] and the [https://indstate-edu.zoom.us/rec/play/rCNIUokP40Ro6s5S2aLCdD4RNXIWwF-VXkQzRx3GFCxsramj3CIAm090PxmebMVpDNjbOh7nmgQ-SpPF.-E9PFNCSfBaKJKEZ video] explaining what you are supposed to do in that lab.
* Take the R programming quiz until you can get 100%.
* Get started on the first R lab.
* Start keeping your own personal cheat sheet for how to do different things in R.
* R Markdown - read [https://rmarkdown.rstudio.com/articles_intro.html this introduction], see a cheat sheet [https://www.rstudio.com/wp-content/uploads/2015/02/rmarkdown-cheatsheet.pdf like this one], and google search for any other formatting things you would want, or different behaviors for the code sections. Use [https://cs.indstate.edu/~jkinne/bioinformatics/R/gene_expression_lab4.Rmd this Rmd] file as a starting point, and make the changes described in [https://indstate-edu.zoom.us/rec/play/jA87CR9O3PFi7l5j2-pTupNU0ukTYG6QaqY1nkyuQ5rT1v1FOANXM8yUDerhFKygeM1LhtH2fZvbnjH6.XEzqY7AaPWLw8J25 this video].


=Statistics=
So also the General Information section at the top of this page for setting up a normal check-in time with the instructor.
Viewing lists - [https://www.youtube.com/watch?v=qBigTkBLU6g&list=PLblh5JKOoLUK0FLuzwntyYI10UQFUhsY9 StatQuest Statistics Fundamentals], [https://www.youtube.com/watch?v=0Jp4gsfOLMs&list=PLblh5JKOoLUJJpBNfk8_YadPwDTO2SCbx StatQuest Statistics and Machine Learning in R], [https://www.youtube.com/watch?v=tlf6wYJrwKY&list=PLblh5JKOoLUJo2Q6xK4tZElbIvAACEykp StatQuest High Throughput Sequencing]


=High Throughput Sequencing workflow=
=ISU Required Syllabus Items=
Viewing list -
The items in this section are required and are the same for every ISU course.
* [https://www.youtube.com/watch?v=p4vKJJlNTKA Generation of genomic data (Hubbard Center for Genomic Studies)] gives background on DNA/RNA sequencing.
* [https://www.youtube.com/watch?v=tlf6wYJrwKY&list=PLblh5JKOoLUJo2Q6xK4tZElbIvAACEykp StatQuest High Throughput Sequencing] - watch the videos in this playlist as needed. Start with the first.


Basic steps in workflow of going from RNA seq files to produce counts files.
==COVID-19 Information==
# Get input files. We use SRA, and take the first 1000 reads from SRR4011874 using fastq-dump from the SRA toolkit.
Information specific to CS courses - [[Start of Term Announcements]]
# Trim adapters. We use cutadapt.
# Check for quality (could do that before trimming adapters as well). We use FastQC.
# Align to reference genome. We use hisat2.
# Convert files as needed by whatever comes next. For us, we convert SAM to BAM, sort BAM, create an index for BAM. We use samtools.
# Take alignment results and count how many reads per gene. We use htseq-count.
# Do the above for each sample in the dataset, and combine the individual counts files into a single matrix file (i.e., csv).


A shell script that has all of these steps for our test dataset is [https://cs.indstate.edu/~jkinne/bioinformatics/rna_seq_counts/rnaseq_pipeline.sh rnaseq_pipeline.sh]. A video walking through those steps is: [https://indstate-edu.zoom.us/rec/play/9RScrMk4-ini7aKa7q_n9p-zfx2GdB-UBAzX-q57LOciZh-dUM_WUWv8IfSWsu2rpAXSFIgUdFlaDPJO.lyYlRZY61VBkY06E?autoplay=true&startTime=1720628015000 here].
''Standard ISU language required in all syllabi (read this all once, then skim for your other courses)...''


=Genome Assembly=
Students are expected to adhere to course attendance policies, as stated in the course syllabus. Documented COVID-related absences will be treated like any other serious medical issue. Following University policy, students with a documented, serious medical issue must contact the Office of the Dean of Students for assistance. The Office of the Dean of Students will supply documentation for faculty. Students with a documented serious medical issue should not be penalized and will be given a reasonable chance to complete exams or assignments. Once notification is made, faculty will make reasonable efforts to accommodate the student’s absence and will communicate that accommodation directly to the student. Please note that faculty are not required to accommodate a serious medical issue with virtual content options, like streaming or recorded lectures. To avoid the potential of missing significant class time, students are strongly encouraged to receive the COVID vaccination that has been made available on campus. For more information about the vaccines or to find a vaccination site, go to: ourshot.in.gov. The ISU Health Center also administers COVID-19 vaccines by appointment.
Viewing list
* [https://www.youtube.com/watch?v=p4vKJJlNTKA Generation of genomic data (Hubbard Center for Genomic Studies)] gives background on DNA/RNA sequencing.
* [https://www.youtube.com/watch?v=ZmF6QROPlTU Sequencing and assembling a genome (Hubbard Center for Genomic Studies)] provides a nice overview of the steps and challenges.
* And then we'll have a video(s) working through doing all the steps of genome assembly.


=OLD=
Students should contact the Office of the Dean of Students with questions by calling 812-237-3829.
Things here will be moved into other sections as we need them.


==Reading==
The information provided in this section of the syllabus is subject to modification based on guidance by public health authorities. Changes to Covid-related policies or updated information will, as always, be posted on the ISU website and communicated in multiple ways.
Potentially good things to read / tutorials, etc. ...
* R: [[R Programming - Getting Started]] - programs to install, reading, etc.
* Other courses like this one - [https://microbiology.columbia.edu/icqb Introduction to Computational & Quantitative Biology - Columbia Dept Microbiology & Immunology],
[https://bioboot.github.io/bggn213_f17/lectures/ Foundations of Bioinformatics - UC San Diego CS (UCSC)],
[https://personal.utdallas.edu/~prr105020/biol6385/2018/lecture.html Computational Biology - UT Dallas Dept. Biology],
[https://genomicsclass.github.io/book/ Biomedical Data Science - Harvard]


In particular, your assigned reading includes...
==Special Needs / Disability Services==
* From the R Programming Getting Started, start looking through each of the items linked in [https://cs.indstate.edu/wiki/index.php/R_Programming_-_Getting_Started#Reading Reading]
If I've not e-mailed you saying that this sentence doesn't apply to you, I've not recieved information from Student Support Services. If you believe I should have, check with me, student support services, or whatever relavent program.  
* [https://bioboot.github.io/bggn213_f17/lectures/#17 UCSD lecture 17 - Transcriptomics and the analysis of RNA-Seq data]
''Standard ISU language required in all syllabi...''
* Up through Figure 1 in [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5576565/ Genome-Wide Temporal Profiling of Transcriptome and Open-Chromatin of Early Cardiomyocyte Differentiation Derived From hiPSCs and hESCs]
* [https://microbiology.columbia.edu/icqb Columbia] - check each of the lectures to see what is basically there, and refer back to it when we get to those parts.  These lecture slides are very much at a level that is good for what we are doing.
* SVM slides in Unit 6 from UT Dallas https://personal.utdallas.edu/~prr105020/biol6385/2018/lecture.html
* [https://stats.libretexts.org/Bookshelves/Computing_and_Modeling/RTG%3A_Classification_Methods/4%3A_Numerical_Experiments_and_Real_Data_Analysis/Preprocessing_of_categorical_predictors_in_SVM%2C_KNN_and_KDC_(contributed_by_Xi_Cheng) Dummy Variables in SVM / KNN]
* [http://topepo.github.io/caret Machine Learning with caret in R]
* [https://www.datacamp.com/community/tutorials/decision-trees-R Decision trees in R (datacamp)], [https://towardsdatascience.com/understanding-random-forest-58381e0602d2#:~:text=The%20random%20forest%20is%20a,that%20of%20any%20individual%20tree. Random forests (towards data science)]
* [https://docs.google.com/document/d/1Fe-w4GNSq7-2nPfZ2QF3byD36hGctZppQY-XIZWKaXs/edit# Jeff's notes on terms, etc.]


==Gene Expression==
Indiana State University recognizes that students with disabilities may have special needs that must be met to give them equal access to college programs and facilities. If you need course adaptations or accommodations because of a disability, please contact us as soon as possible in a confidential setting either after class or in my office. All conversations regarding your disability will be kept in strict confidence. Indiana State University's Student Support Services (SSS) office coordinates services for students with disabilities: documentation of a disability needs to be on file in that office before any accommodations can be provided. Student Support Services is located on the lower level of Normal Hall in the Center for Student Success and can be contacted at 812-237-2700, or you can visit the ISU website under A-Z, Disability Student Services and submit a Contact Form. Appointments to discuss accommodations with SSS staff members are encouraged.
Start by watching the [https://indstate-edu.zoom.us/rec/share/uhl2WGCOd5FmJPKagZBCB5GEIlufLVGKgTtq5W8r40eRF5mPw8O5az5-Z1NZEgWV.Bey1Q2Kfb2wHh8NL video introduction] (16min, watch it at 2x or 1.5x).


We start by getting into this [https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE85331 GSE85331 dataset], described in [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5576565/ this publication] (and see [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5576565/bin/NIHMS889282-supplement-Online_Data_Supplement.pdf supplementary information] for how they processed/analyzed the data).
Once a faculty member is notified by Student Support Services that a student is qualified to receive academic accommodations, a faculty member is obligated to provide or allow a reasonable classroom accommodation under ADA.


On your own computer, download [https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE85331&format=file&file=GSE85331%5Fall%2Egene%2EFPKM%2Eoutput%2Ereplicates%2Etxt%2Egz the dataset]
==Disclosures Regarding Sexual Misconduct==
and extract (uncompress) the file (on MacOS or Linux just double click it, on Windows use 7-Zip or something similar).
''Standard ISU language required in all syllabi...''


===Spreadsheet===
Indiana State University Policy 923 strictly prohibits discrimination on the basis of: age, disability, genetic information, national origin, pregnancy, race/color, religion, sex, gender identity or expression, sexual orientation, veteran status, or any other class protected by federal and state statutes in ISU programs and activities or that interferes with the educational or workplace environment.
After extracting you can open the file in Excel, Sheets, or LibreOffice. Note that it is a tsv file.  If you double click, your OS may not know what program to use to open it.  So start your spreadsheet program and then open the file. Some things are not too painful to do in your spreadsheet program.  For example, you should verify that the following are all correct...
* Genes with highest H1_day0_0 values: SNORD97, SNHG25, EEF1A1, RPL38, RPS27.
* Genes with highest H1_CM_0 values: H19, MYL7, RPL31, SNORD9, RPS27.
* Number of genes (#rows - 1): 26257
* Median value for H1_day0_0: 0.539942
* Median value for H1_CM_0: 1.246015
* Average value for H1_day0_0: 15.86772859
* Average value for H1_CM_0: 16.4574767
It seems that this dataset might be normalized so that the average values for each column (sample) are similar.


And that is about all we want to do in the spreadsheet right now. You can save it as an xlsx or import into Google Sheets in case we want to do anything else manually with it.
Title IX of the Educational Amendments of 1972 prohibits discrimination based on sex, including sexual harassment. Sexual harassment includes quid pro quo harassment, unwelcome verbal or physical conduct, sexual assault, dating violence, domestic violence, and stalking.


===R and R Studio===
If you witness or experience any forms of the above discrimination, you may report to:
Start by watching the [https://indstate-edu.zoom.us/rec/share/jatjli-YpV3d4tM6HPtgRPMq59dGmyTdeeTafTV41aERUd6V0uMT2jw3F3zj68Y6.sjpykKNTn4KkK2N0 video about gse85331_first_look.R] (18min).


'''First Look''' Let's see what we can do with the same file in R and R Studio.  First you should install R and R Studio on your computer, see links above.  Let's take a first look at the data and confirm the values we got from Excel. You can download the R file here - [https://cs.indstate.edu/~jkinne/cs618-s2022/code/FILES/gse85331_first_look.R gse85331_first_look.R] and run it to confirm this.  See also [https://indstate-edu.zoom.us/rec/share/jatjli-YpV3d4tM6HPtgRPMq59dGmyTdeeTafTV41aERUd6V0uMT2jw3F3zj68Y6.sjpykKNTn4KkK2N0 this video] showing the file and explaining it.
''Office:'' Equal Opportunity & Title IX; (812) 237-8954; Rankin Hall, Room 426 <br>
''Email:'' ISU-equalopportunity-titleix@mail.indstate.edu <br>
''Online:'' https://cm.maxient.com/reportingform.php?IndianaStateUniv&layout_id=10


'''Differential expression''' From the supplementary information from the publication, differentially expressed genes were found as follows - "Statistical analysis was performed for each cell line individually by pairwise comparisons across time-points and day 0 (control)."  So, let's see if we can duplicate that.  You can download the R file here - [https://cs.indstate.edu/~jkinne/cs618-s2022/code/FILES/gse85331_diff_exp.R gse85331_diff_exp.R] and run it to see one way to do this.  See also [https://indstate-edu.zoom.us/rec/share/pH4GRsBcimQreLtjsMsuRD9gblf6twUKKRb7yWraEVRMdWzOUIIH5dQXrmLI9aB6.lMvQHPD0Zoe1usGL this video] showing the file and explaining it.
Disclosures made to the following confidential campus resources will not be reported to the Office of Equal Opportunity and Title IX:<br>
 
''ISU Student Counseling Center:'' (812) 237-3939; Gillum Hall, 2nd Floor <br>
'''Simulated data''' Taking the previous analysis from above and putting in simulated data where we know what the answer should be for each gene.  You can download the R file here - [https://cs.indstate.edu/~jkinne/cs618-s2022/code/FILES/gse85331_diff_exp_simulated.R gse85331_diff_exp_simulated.R] and run it yourself.  See also [https://indstate-edu.zoom.us/rec/share/lXucpVqYQWLBE4tBaerc-_eY9qvbwgh0aaTwbLh1m1k6Zfe-ybYSLKtivz7Wt8IT.mx8EVD0DorxGdhOK this video] showing the file and explaining it.
''Victim Advocate:'' (812) 237-3829; HMSU 7th Floor <br>  
 
''UAP Clinic/ISU Health Center:'' (812) 237-3883; 567 N. 5th Street
'''Next up''' - Using an R package specifically made for differential gene expression (DESeq and edgeR) and comparing with using plain old ANOVA.
 
==ShapeFiles==
An R file showing how to get started with shape files - [https://cs.indstate.edu/~jkinne/cs618-s2022/code/FILES/shapefiles.R shapefiles.R]
 
The data files for those are in the Teams for CS 618 / Papers, References / Files / Shapefiles
 
Note that a shapefile often has a number of associated files with it (same basename, different file extension).  You likely need all of those for things to work properly.
 
==Miscellaneous==
Things that will be put somewhere else eventually, but for now will be here.
 
===Setup for using gitlab.indstate.edu from the CS server, Linux, or Mac===
* Go to https://gitlab.indstate.edu and login with your ISU credentials.
* Run the following.
  <pre>
  mkdir ~/.ssh # unless it exists already
  cd ~/.ssh
  ssh-keygen -t ecdsa -f filename_for_your_ssh_key # pick whatever you want for the filename, press enter when prompted for passphrase
  </pre>
* Edit your <code>~/.ssh/confi</code> file (creating it if it doesn't exist already), and include the following.
  <pre>
  Host gitlab.indstate.edu
  Hostname gitlab.indstate.edu
  IdentityFile ~/.ssh/filename_for_your_ssh_key # or whatever you used for the filename
  <pre>
* Take the public key file (filename_for_your_ssh_key.pub, or whatever you used for the filename with .pub on the end) and add it to your profile on gitlab.indstate.edu.  Login to gitlab.indstate.edu, click the User icon in top right / Edit Profile / SSH Keys.  Copy/paste the public key (.pub file) into the text field, leave “expires at” empty, set a title to something, click Add key.
 
Now when you use git commands on the system that you setup like this, it should properly authenticate to gitlab.indstate.edu.

Revision as of 13:22, 17 August 2025

This page contains the syllabus for CS 510 and is used to keep track of assignments, etc.
(this syllabus heavily authored by Jeff Kinne. Updated by Xavier Saunders)

CS 510 is Fast Track Introduction to Programming. The course has no pre-reqs (can be taken by those with no prior CS or programming experience) and is meant to (a) get you programming (in python), and (b) get you ready to pass the admissions interview (python programming and basic algorithms / data structures). CS 510 counts as elective credit towards the MS degree. The course is meant for current ISU students in non-CS programs and for potential incoming CS MS students who need the course to get ready for the CS MS.

For more information on applying to the CS MS program, see http://cs.indstate.edu/info/apply.html

Who Should Register

Current ISU Students: For ISU grad students who are in non-CS programs, if you do not have much programming experience, CS 510 is your best starting point in CS coursework. If you are already a competent programmer in some language, then you likely would want to start with either CS 500 or CS 501. CS 500 is C and data structures and algorithms, while CS 501 is python and data science.

Non-ISU Students: Domestic students who are not quite ready to start the CS MS (or just aren't sure if they want to do a full MS) can apply to ISU as a Guest/Unclassifed student and take the course either face to face or online. To do this, click on the Apply button, click to create an account if this is your first time starting an application at ISU, choose Graduate when prompted between Graduate or Undergraduate, for the Field of Interest choose "Guest Admission / Unclassified", and for Program of Study choose "Guest Admission (One Semester Only)". Choose the term that you plan to take the course, and complete the remaining required fields.

International students outside of the US are not in general allowed to take courses online inside of the US without being enrolled in a degree program. International persons who are in the US on a visa of some type might be allowed to take the course (in particular, those on F2 or H4 likely would be allowed). Those who cannot take courses at ISU can check back here for course info (sample quizzes, reading assignments, programming assignments, tutorials, etc.) that will be posted publicly throughout the fall 2023 term.

For those with no or little prior programming and CS experience, you are highly recommended to take this course to build your programming skills (and for those interested in a CS MS, to get you ready to apply to the CS MS program).

General Course Information

Course website - https://cs.indstate.edu/wiki/index.php/CS_510

Your Instructor

Xavier Saunders, xavier.saunders@indstate.edu
Office: In Person TBA and in Microsoft Teams, no phone number
Instructor Office Hours: TBA, TBA
Lecture, Exam

Lecture: Tuesday and Thursday 12:30-1:45 in person and Zoom(in Canvas, see below), and recorded
Mid-term exam: TBA
Final exam: TBA
Asynchronous students: For students who will be mostly participating asynchronously even though the course is being offered synchronously, it is best if you are able to watch the most recent lecture before the next one occurs. You should make note of any questions or comments and send them to me by email or Teams. I will start the next lecture by answering any questions/comments that came via email or Teams.

Prerequisites - none.

CRN numbers - 52482 , 52483

Required text

Class notes - I may try to keep notes in OneNote, and they may be available PDF. A link will be here as soon as it is in working order. Note that you will need to authenticate with your ISU account to view the notebook.

Code from lectures - Some code from lectures will be at https://cs.indstate.edu/xsaunders/cs510/

Assignments and Announcements Current Assignment, Announcements, and Notes are available on the class' canvas.


Course Description and Content

Course Description

The catalog description for this course is: "This is a first course in programming and computer science that is aimed at those with little to no previous experience in these areas. The main learning outcomes are proficiency in a useful and modern programming language and proficiency in basic data structures and algorithms. The course also prepares students to be ready to apply to the computer science master’s program. Notes: Not available to students with 9 or more CS graduate coursework."

The following is the compressed plan of the entire semester.

Course Outline

  • Getting started - system setup, linux, math background, development on your personal computer.
  • Python programming basics - operators, reserved words, data types, base systems, overflow.
  • Python programming containers - strings, tuples, lists, dictionaries, sets.
  • Python programming object-oriented
  • Python programming style - good programming style for reliability, readability, extensibility, security.
  • Data structures - understanding/use of most important data structures - arrays, linked lists, binary search trees, hash tables, heaps. Implementation of some of these in Python.
  • Algorithms - understanding/use of some basic algorithms - sorting (various), binary/linear search (and uses) - including some algorithms that are each of - greedy, heuristic, randomized, brute force / backtracking.
  • Vocab - additional terms, algorithms, concepts at a shallow level.

Learning Outcomes

  • System setup - personal computer setup for both remote (connecting to CS server with terminal, sftp, X windows) and local development (editor, compiler/interpreter).
  • Linux - proficient using the Linux terminal for development.
  • Math background - proficient in math background needed for data structures and algorithms.
  • Personal computer - is setup for development so you can do coursework from your home computer as well.
  • Python programming - understanding of most language features, proficient in writing code using the most common, write code using good programming style.
  • Data structures - understanding of operations, efficiency, use cases, can use builtin python data structures and write python code for some data structures that are not included in python.
  • Algorithms - understanding of basic algorithms, arguments for correctness and efficiency, can use the algorithms to solve problems efficiently.

Grading and Assignments

We will be doing what I am calling "achievements-based" grading. There are a series of skills, knowledge, and experiences that I want you to achieve. Your final letter grades will be based strictly on which of these you have completed. For each achievement, you can achieve the rating of incomplete, pass-, pass, pass+. The following will be our starting point for how letter grades will be assigned. I will reevaluate this throughout the term to make sure we are on track. I will also be setting the standards for pass-, pass, and pass+ for each of the achievements as we get to them in the course.

C - lowest passing grade in a grad course

  • Pass or higher achievement for all of the following
  • Terminal text editors - basic use
  • Linux terminal commands, files - basic use
  • Math for CS basics (base systems, rules of exponents, logs, logic)
  • Python - basic development in the terminal on the CS server
  • Python - basic development on your personal computer
  • Text editor - on your personal computer
  • File transfers - between personal computer and CS server
  • Python programming basics - knowledge of keywords, concepts, operators, evaluation of expressions
  • Python programming containers - knowledge of basic operations on tuples, lists, sets, dictionaries

B - satisfactory

  • In addition to the above...
  • Object-oriented programming in python
  • Python programming style - good programming style for reliability, readability, extensibility, security.
  • Data structures - good understanding of how operations are implemented for - arrays, linked lists, binary search trees, hash tables, heaps. Able to properly "play computer" with these.
  • Algorithms - good understanding of several linear and binary search, several sorting algorithms (including one efficient one).
  • Vocab - some additional terms, algorithms, concepts at a shallow level.

A - good/excellent

  • In addition to the above...
  • Pass+ rating on most of the above
  • Pass or higher achievement for all of the following
  • Basic data structures from B level - can write python code to implement the data structures.
  • Algorithms - can write python code to implement linear and binary search, several sorting algorithms (including one efficient one).
  • Algorithms - good understanding of some algorithms for each of - greedy, heuristic, randomized, brute force / backtracking.

Achievements can be earned based on quizzes, assignments, in-class work, and exams. Rather than having numerical scores for these, I will use them to mark off your achievements. Note that achievements can be "lost" if you demonstrate a skill early in the term and then demonstrate a lack of the skill later in the term. I expect this will not normally be the case, but I will continue to evaluate you based on all of the skills throughout the term.

Late Work - Assignments will generally be available to still handin for around a week after their due date. Once the solutions are posted and discussed, late submissions will no longer be graded. Quizzes will normally need to be taken on the day they are due, or perhaps within a few days of when they are due. Solutions will normally be discussed or posted within a week of their due date. Not accepting late work that is more than about a week old is in part because it takes much longer to grade quizzes/assignments that are no longer super fresh in the instructor's head, and in part to try to keep everyone in the class working on the same material.

Start Assignments and Quiz Studying Early - I suggest attempting an assignment the day it is given, or the day after, so that if you have a problem you can ask early. If you continue to have problems in trying to complete the assignment, you will have time to ask again. Many of the assignments require thought and problem solving, which takes "time on the calendar" not just "time on the clock". By that I mean that spending an hour on 3 consecutive days is likely to be more productive than trying to spend 3 hours at once on the assignment.

Expected Amount of Work - My expectation is that an average student will spend about 4-8 hours OUTSIDE of class each week (that is in addition to class time or viewing lecture videos) WORKING PRODUCTIVELY/EFFICIENTLY (not just staring at the computer) to complete their coursework for this class. Some students may spend less time than this, and some students will spend more. If you find yourself spending the upper end of, or exceeding these hours contact me; bring your notes and study schedule.

This is the foundation for the rest of CS, so it definitely pays off to do your best here.

Note - please find a way to spend enough time on this class (the investment will pay off in terms of skills, being able to get a job, etc.).

Grade Meanings - The letter grades are intended to have the following rough meaning. The list of achievements needed for each was chosen with this in mind.

  • A+/A: You understand everything and probably could teach the course yourself.
  • B+/A-: You understand nearly everything, and should be all set to use this knowledge in other courses or in a job.
  • C/C+/B-/B: Some things you understand very well and others you don't (more towards the former for a B and more towards the latter for a C).
  • D-/D+/C-: You did put some effort in, and understand many things at a high level, but you haven't mastered the details well enough to be able to use this knowledge in the future. Note that the lowest grade for grad courses is a C, so if you fall in the range below C then your letter grade will be an F.
  • F: Normally, students that get an F simply stopped doing the required work at some point.

CS-Specific Items

This section contains items that are generally the same for all CS courses (and in particular those taught by this instructor).

CS Course Policies

Note that this course follows all standard CS course policies. In particular, (a) cheating/plagiarism by graduate students results in an F in the course, (b) and there will be no makeup exams. See http://cs.indstate.edu/info/policies.html for details.

Lab Help

We have a few lab assistants who are available to help students in beginning computer science courses. Please see https://cs.indstate.edu/wiki/index.php/Unix_Lab_and_Help for details. The lab hours are in a calendar on the CS homepage, at http://cs.indstate.edu/info/index.php#lab_hours. You can join the lab when working on your programs. You can ask the lab assistants to look at your programs, and you can work with any other CS students that are there (you could use the lab as a regular meeting place to work with your classmates).

Course Announcements

Announcements regarding the course will be made both during class and via email to your @sycamores.indstate.edu email address. You should regularly check this email account or have it forwarded to an account that you check regularly. You can set the account to forward by logging into your indstate.edu email online (if you aren't able to find the option, try a different browser or search online for things like - outlook online forward email setting).

Classroom conduct

You may not use cell phones, iPods/music players, etc. during class. You should be civil and respectful to both the instructor and your classmates, and you should arrive to class a few minutes before the scheduled lecture so you are ready for lecture to begin on time. You may use your computer during class if you are using it to follow along with the examples that are being discussed. You should avoid spending time on email, Facebook, work on other courses, etc. during the lecture for this class (be fully present wherever you are, make the most of each experience).

Academic Integrity

Please follow these guidelines to avoid problems with academic misconduct in this course:

Homework: You may discuss the homework assignments, but should solve and finish them on your own. To make sure you are not violating this, if you discuss with someone, you should DESTROY any work or evidence of the discussion, go your separate ways, SPEND at least an hour doing something completely unrelated to the assignment, and then you should be able to RECREATE the program/solution on your own, then turn that in. If you cannot recreate the solution on your own, then it is not your work, and you should not turn it in.

Note on sources: if you use some other source, the web or whatever, you better cite it! Not doing so is plagiarism.

Exams: This should be clear no cheating during exams. Each instructor has different rules for what is allowed on exams in terms of notes, etc. If not noted otherwise, you should assume that a quiz or exam is closed notes, no computer, no calculator.

Projects: You should not copy from the Internet or anywhere else. The project should be your own work. It will be fairly obvious to me if you do copy code from the Internet, and the consequences will be at the least a 0 on the project. If cheating is observed, you will at the least receive a 0 for the assignment (and may receive an F for the course), and I will file a Notification of Academic Integrity Violation Report with Student Judicial Programs, as required by the university's policy on Academic Integrity. A student who is caught cheating twice (whether in a single course or different courses) is likely to be brought before the All University Court hearing panel, which can impose sanctions up to and including suspension/expulsion. See http://www.indstate.edu/sjp/docs/code.pdf and http://www.indstate.edu/academicintegrity/ for more information.

Please ask the instructor if you have doubts about what is considered cheating in this course.

Office hours (using Teams)

Office hours will be through Microsoft Teams by default. If you would like to meet in person you should reserve an appointment using outlook calendar. I am normally in my office during my listed office hours, but by making an appointment you can be more certain. For meeting through Teams, you should start Teams in your browser or start the application. You should be logged in using your ISU credentials. Once you have Teams open you can message me to ask me questions or to ask to talk. We can use Teams to message (better than emailing back and forth repeatedly if you have questions about something that you just want to write about) or to talk and share screens (e.g., to take a look at your code). I normally have Teams open on my computer all of the time, including during my office hours. During my office hours I will normally reply right away; at other times I will reply when I get a chance.

Canvas

The course has a canvas site. Click https://indstate.instructure.com/ to go to canvas. You should see this course listed under your courses for the current term. If you don't you may need to click on the Courses icon and then click the "All courses" link. The canvas site is used for giving you your grades, for quizzes/exams, and for getting to online lectures (which are done using Zoom). Announcements will be sent through canvas and to your university email. Links and such will be kept on this website.

Lectures (using Zoom)

Here at ISU section numbers starting with the number 3 (e.g.3xx: 301, 302, etc.) are generally online sections. There are 2 types of online sections, synchronous online and asynchronous online. Sections that are synchronous should be joined at the regularly scheduled time of the course, whereas sections that are asynchronous generally keep up with the material independently without regularly scheduled meetings. In general async sections are more difficult to stay on top of, and require a great deal of self-discipline (it is much easier to think "I can watch the videos tomorrow" and just get behind). So if you are in one of these sections make sure you get off to a strong start, and ask for help sooner rather than later. If you are in an online section, check your course schedule for course meeting times; if you have a meeting time, then your section is synchronous, otherwise it is asynchronous (or there is an error in the system).

This course has a 301 section (synchronous online) and 001 section (face to face). Students in either section can participate in whatever way you need to.

For ISU's links to information on getting started with Zoom, see https://indstate.teamdynamix.com/TDClient/1851/Portal/KB/ArticleDet?ID=107534. You can also see the information linked at https://www.indstate.edu/services/student-success/cfss. You will get to the lectures for this course by going to Canvas, select this course, click Modules on the menu on the left, and click on the Zoom module. Once there you should see a schedule of lectures and be able to view recorded lectures. Note that you should install the Zoom application for your computer, and you will need to be logged into to Zoom with your ISU credentials to be able to connect. Also note that the lectures are recorded and only available to those in our class. Recorded lectures normally appear later the same day as the lecture.

Note that if you have not used Zoom with your ISU account previously, you need to go to https://indstate-edu.zoom.us and login with your ISU email address and password to get it setup.

Participating online

If you are participating online, please see the information at https://www.indstate.edu/services/student-success/cfss about participating in online courses. You are expected to either join lectures live through Zoom or watch the recordings once they are available. You will complete assignments, quizzes, and exams on the same schedule as the rest of the class. For quizzes and exams you will normally have a 24 hour period during which to take the quiz/exam (note that different students will have slightly different questions and any communication between students about quiz/exam content is academic misconduct).

So also the General Information section at the top of this page for setting up a normal check-in time with the instructor.

ISU Required Syllabus Items

The items in this section are required and are the same for every ISU course.

COVID-19 Information

Information specific to CS courses - Start of Term Announcements

Standard ISU language required in all syllabi (read this all once, then skim for your other courses)...

Students are expected to adhere to course attendance policies, as stated in the course syllabus. Documented COVID-related absences will be treated like any other serious medical issue. Following University policy, students with a documented, serious medical issue must contact the Office of the Dean of Students for assistance. The Office of the Dean of Students will supply documentation for faculty. Students with a documented serious medical issue should not be penalized and will be given a reasonable chance to complete exams or assignments. Once notification is made, faculty will make reasonable efforts to accommodate the student’s absence and will communicate that accommodation directly to the student. Please note that faculty are not required to accommodate a serious medical issue with virtual content options, like streaming or recorded lectures. To avoid the potential of missing significant class time, students are strongly encouraged to receive the COVID vaccination that has been made available on campus. For more information about the vaccines or to find a vaccination site, go to: ourshot.in.gov. The ISU Health Center also administers COVID-19 vaccines by appointment.

Students should contact the Office of the Dean of Students with questions by calling 812-237-3829.

The information provided in this section of the syllabus is subject to modification based on guidance by public health authorities. Changes to Covid-related policies or updated information will, as always, be posted on the ISU website and communicated in multiple ways.

Special Needs / Disability Services

If I've not e-mailed you saying that this sentence doesn't apply to you, I've not recieved information from Student Support Services. If you believe I should have, check with me, student support services, or whatever relavent program. Standard ISU language required in all syllabi...

Indiana State University recognizes that students with disabilities may have special needs that must be met to give them equal access to college programs and facilities. If you need course adaptations or accommodations because of a disability, please contact us as soon as possible in a confidential setting either after class or in my office. All conversations regarding your disability will be kept in strict confidence. Indiana State University's Student Support Services (SSS) office coordinates services for students with disabilities: documentation of a disability needs to be on file in that office before any accommodations can be provided. Student Support Services is located on the lower level of Normal Hall in the Center for Student Success and can be contacted at 812-237-2700, or you can visit the ISU website under A-Z, Disability Student Services and submit a Contact Form. Appointments to discuss accommodations with SSS staff members are encouraged.

Once a faculty member is notified by Student Support Services that a student is qualified to receive academic accommodations, a faculty member is obligated to provide or allow a reasonable classroom accommodation under ADA.

Disclosures Regarding Sexual Misconduct

Standard ISU language required in all syllabi...

Indiana State University Policy 923 strictly prohibits discrimination on the basis of: age, disability, genetic information, national origin, pregnancy, race/color, religion, sex, gender identity or expression, sexual orientation, veteran status, or any other class protected by federal and state statutes in ISU programs and activities or that interferes with the educational or workplace environment.

Title IX of the Educational Amendments of 1972 prohibits discrimination based on sex, including sexual harassment. Sexual harassment includes quid pro quo harassment, unwelcome verbal or physical conduct, sexual assault, dating violence, domestic violence, and stalking.

If you witness or experience any forms of the above discrimination, you may report to:

Office: Equal Opportunity & Title IX; (812) 237-8954; Rankin Hall, Room 426
Email: ISU-equalopportunity-titleix@mail.indstate.edu
Online: https://cm.maxient.com/reportingform.php?IndianaStateUniv&layout_id=10

Disclosures made to the following confidential campus resources will not be reported to the Office of Equal Opportunity and Title IX:
ISU Student Counseling Center: (812) 237-3939; Gillum Hall, 2nd Floor
Victim Advocate: (812) 237-3829; HMSU 7th Floor
UAP Clinic/ISU Health Center: (812) 237-3883; 567 N. 5th Street