ENS210 - Fall 2023

Instructor

Name: Ogun Adebali

E-mail: oadebali@sabanciuniv.edu

Office: FENS-1055

Office hours: Fri 10.40-11:30 (by appointment only)

Teaching Assistants

TA E-mail Office Day Office Hours Office
Veysel Ogulcan Kaya vogulcan@sabanciuniv.edu Monday 10:40am-12:30pm online
Yagmur Sozeri yagmur.sozeri@sabanciuniv.edu Tuesday 14:40pm-16:30pm FENS-L038
Cem Azgari cemazgari@sabanciuniv.edu Wednesday 14:40pm-16:30pm FENS-L038

Learning Assistants

LA E-mail
Bahar Sevgin baharsevgin@sabanciuniv.edu
Durmuş Erdem Kertmen ekertmen@sabanciuniv.edu
Deniz Muratli denizm@sabanciuniv.edu

Class hours

  • Wed 16:40-17:30 FASS-G022 (lecture/prelab)
  • Wed 17:40-19:30 FASS-G022 (lab)
  • Fri 12:40-14:30 FASS-G022 (lectures)

Content

Course Description

Have you ever considered how the code in each of your cells determines your physical appearance, disease risk, even your behaviors? Do you know why you and the annoying fly buzzing in the middle of the night are unique? Why does a diet work well for you whereas it might not for others? It is all genome! If the genome contains so much information why can’t we design person-specific drugs, diets, treatments etc? It is because we don’t understand what the code exactly means! Identifying code is no more a barrier, but its analysis is. In this course, we will learn the basics in computational genomics with the aim of gaining fundamentals of bioinformatics applications. We will learn using publically available tools as well as writing custom python scripts in order to answer biological questions. The more details regarding the content, grading and policy can be found below.

Learning objectives

  • Explain why bioinformatics is necessary today.
  • Use UNIX environment to parse genome data files.
  • Write Python scripts to perform basic DNA and protein sequence analyses.
  • Find hypothetical genes in a given DNA sequence.
  • Synthesize protein sequence with a given DNA sequence.
  • Use regular expressions to find protein motifs and visualize them on protein structure.
  • Understand what homology is, how homology information can be used in protein function identification.
  • Build and interpret multiple sequence alignments.
  • Build, visualize and analyze phylogenetic trees.
  • Understand what protein domains are and how they are predicted with a given protein sequence.
  • Know a variety of NGS methods and what they are designed for.
  • Build NGS analysis pipelines.

Requirements and expectations

  • There is no official textbook for the class. Slides will be made available after each class. In order to be successful the best way is in-class learning and taking notes.
  • Being active in lectures and lab sessions is encouraged.
  • There is NO stupid question. Do not hesitate to ask any question.
  • Bring a laptop to every class and lab.
  • Late work will not be accepted.
  • Lab work should be completed within lab hours. The assignment system will have a firm deadline unless your instructor (or TA) agrees that extra time is required. If extra time is given, the new due date will be midnight. Therefore, please arrange your program accordingly.

Honesty

  • All the work should be completed personally unless stated otherwise. You will be assigned a single group project where you are expected to collaborate, the rest will be individual assignments. For group assignments, groups may not share their codes, individual are allowed to share their work (code) with other group members only.
  • Plagiarism will NOT be tolerated. This does not mean that you are disallowed to use the internet. However, you may not copy and paste any code from the internet. You need to cite the references/websites properly whenever you get inspired otherwise your work will be treated as plagiarism.
  • You are not allowed to share code in any case (except for group assignment).

Attendance

  • Attendance is required. If you are not able to attend, send an e-mail to me and state your excuse before the class. 6 (for lectures) or 2 (for labs) unexcused absences will be considered legitimate for grade reduction.
  • Make-ups are only given for midterms and the final examination. A medical report must be brought.
  • No make-up will be given for any missed lab.

Group presentations

  • At the time of group presentations, you must be present and ready to present your group work in class. One of the group members will randomly be called to give their presentation. The group members might receive different grades as personal contribution is a component (20%).
  • Late work will not be accepted.

Academic Integrity

To uphold with Sabanci University Academic Integrity Statement

I will not lie and cheat in my academic work.

I will act (by letting the instructor know) if the academic integrity is compromised.

I will not share the video conference link and lecture records with anyone else.

By being registered in this class, you will be assumed to have accepted the rules written in this syllabus.

Evaluation

Component Weight
Lab/quiz/homework/participation 30%
Group project 10%
Midterm I 15%
Midterm II 15%
Final 30%

Each lab, homework and announced quiz will have a weight of 2 units, a pop-up quiz will have a weight of 1 unit.

You may receive tophat questions throughout the course.

Your lowest 1 lab score will be dropped. No make-ups will be given for the missed labs. For the unexcused missed labs no points will be given.

Each lab will be evaluated out of 10 points. Homeworks and quizes will be evaluated out of 10 points. Tophat questions will be evaluated based on the point assigned in the Tophat system. All the points will be summed up at the end of the semester. The total will comprise 30% of your total score.

Attendance and active participation are expected. Each of you will receive a participation score at the end of the semester (extra 2 points out of 100). Participation score will contribute to the Lab/quiz/homework/participation segment, it won’t be dropped. Please note that participation score will subjective and will be given by the insructor in light of your participation in lectures and labs.

Enrol in Tophat

Please go to this link to enrol in the tophat classroom.

If you miss an exam (midterm or final) or more than two labs you will automatically fail and get NA.

Objections

After the results are announced for each exam, and objection days and time slots will be announced. You will only be able to object on the announced days. If the time slots don’t fit to your schedule, you are supposed request an appointment from the instructor on the same date of the announcement for the objection.

If you miss the objection period, you won’t be given a second chance to see your exam papers.

Grading

The grading will be based on the class performance. Curve-based grading will be applied.

There will be no extra homework/project to increase grades at the end of semester. This is not negotiable.

Individual graduation situations are not important, and they won’t change your letter grade at the end of the semester

ANY kind of misconduct including code sharing, plagiarism, cheating etc will NOT be tolerated. You will fail the course. Disciplinary actions will be taken.

Course Plan

The course plan given below is subject to change.


Week # Date Topic
1 4 Oct Course introduction (starts at 17:40 for this day only)
  6 Oct Lab 0: Git setup
     
2 11 Oct Introduction to Git + UNIX
  11 Oct Lab 1: Analyze Files in Linux
  13 Oct Introduction to Genomics
     
3 18 Oct PROJECT description
  18 Oct Useful command line tools
  18 Oct Lab 2: Analyze Genomic Files in Linux
  20 Oct What is a gene?
     
4 25 Oct Introduction to Python
  25 Oct Lab 3: Sequence processing in Python
  27 Oct Whay is a gene?
     
5 1 Nov Codon tables
  1 Nov Lab 4: Finding a gene
  3 Nov From DNA to Protein
     
6 8 Nov Compare two sequences
  8 Nov Lab 5: DNA to Protein
  10 Nov Homology
     
7 15 Nov Lab 6: Protein to DNA
  15 Nov DEADLINE: Project first report (by 23:59)
  17 Nov Project - Variant Calling Results (presentations)
     
8 22 Nov BLAST
  22 Nov Lab 7: BLAST
  24 Nov Homology - Multiple sequence comparison
     
9 29 Nov How to align multiple sequences
  29 Nov Lab 8: Multiple Sequence Alignment
  1 Dec Midterm
     
10 6 Dec Conservation analysis from multiple sequence alignment
  6 Dec Lab 9: Measure Conservation
  8 Dec Multiple sequence alignment algorithms
     
11 13 Dec Phylogenetic Trees
  13 Dec Lab 10: Phylogenetics
  15 Dec Protein Domains and Motifs
     
12 20 Dec Lab Midterm
  20 Dec Lab Midterm
  22 Dec Phylogenetics
     
13 27 Dec Project Q&A
  27 Dec Lab 11: Molecular Docking
  29 Dec Phylogenetics, NGS methods
  1 Jan DEADLINE: Project final report (by 23:59)
     
14 3 Jan Group presentations I
  3 Jan Group presentations II
  5 Jan Group presentations II




Week 1

Setup for the lab

This browser does not support PDFs. Please download the PDF to view it: Download PDF.

</embed>




Lab-0

  • Go to the following assignment

  • Accept the assignment.

  • Copy the link of your repository

  • Clone the repo to your local machine with git clone REPOSITORY_LINK

  • Follow the instructions in the readme.md file in your cloned repository.




Week 2

Wednesday

  • Introduction to the Course - slides

  • Introduction to Genomics - slides

  • Genome statistics
  • Central Dogma of Biology
  • Chargaff’s First Parity Rule
  • Structure of Nucleic Acids DNA and RNA
  • DNA structure discovery

Lab-1

  • Go to the following assignment This link will also be available on SUcourse at 16:40 on the class day.

  • Accept the assignment.

  • Copy the link of your repository

  • Clone the repo to your local machine with git clone REPOSITORY_LINK

  • Follow the instructions in the readme.md file in your cloned repository.

Extra-material (required)
Command Line Basics - PDF

This browser does not support PDFs. Please download the PDF to view it: Download PDF.

</embed>


Documentation for tr

Documentation for wc

Documentation for grep

Friday

  • What is Gene? - slides

  • Gene definition
  • Variation effect at different levels
  • Eukaryotic vs Prokaryotic cells
  • Gene structure
  • Eukaryotic genes vs Prokaryotic genes
  • Alternative splicing
  • Epigenetic regulation
  • Operon structure
  • Lac operon




Week 3

Wednesday

  • Introduction to the Group Project

Lab-2

  • Go to the following assignment This link will also be available on SUcourse at 16:40 on the class day.

  • Accept the assignment.

  • Copy the link of your repository

  • Clone the repo to your local machine with git clone REPOSITORY_LINK

  • Follow the instructions in the readme.md file in your cloned repository.

A short tutorial for awk

Friday

  • Introduction to Genomics II - slides

  • DNA vs RNA
  • RNA structures
  • How to predict RNA structure
  • Sanger sequencing
  • Gel electrophoresis
  • Shotgun sequencing
  • How to calculate the size of a genome in bytes




Week 4

Wednesday

  • Group Project - Progress of groups

Lab-3

  • Go to the following assignment This link will also be available on SUcourse at 16:40 on the class day.

  • Accept the assignment.

  • Copy the link of your repository

  • Clone the repo to your local machine with git clone REPOSITORY_LINK

  • Follow the instructions in the readme.md file in your cloned repository.

Friday

  • What is Gene? - slides

  • Key points in transcription.
  • Sense vs antisense strand, coding vs non-coding strand, template vs non-tepmlate strand, transcribed vs non-transcribed strand
  • How to find a gene or motif on both strands
  • How to predict a eukaryotic and prokaryotic gene?
  • What is genome annotation?
  • Genome size and complexity discussion.
  • Gene size differences across species.
  • How to measure the performance of a tool?
  • Why is CpG island relevant in gene prediction?
  • How to find a CpG island.

  • Nussinov algorithm - slides




Week 5

Wednesday

  • Group Project - Progress of groups
  • How to read and write files in Python
  • Introduction to FASTA format

Lab-4

  • Go to the following assignment.

  • Accept the assignment.

  • Copy the link of your repository

  • Clone the repo to your local machine with git clone REPOSITORY_LINK

  • Follow the instructions in the readme.md file in your cloned repository.

Friday

  • How to apply Nussinov algorithm
  • How to deal with bifurcation (Nussinov)

  • DNA to Protein - slides
  • Genetic code
  • Features of the genetic code
  • Stop codon introduction
  • Steps of translation
  • tRNA
  • Codon vs anti-codon
  • Translation-transcription coupling
  • Wobble pairing
  • Codon usage
  • Amino acid structure
  • Amino acid groupings
  • Protein structure
  • Membrane proteins




Week 6

Wednesday

  • Group Project - Progress of groups
  • How to use while loop in python?
  • How to use find function?

Lab-5

  • Go to the following assignment.

  • Accept the assignment.

  • Copy the link of your repository

  • Clone the repo to your local machine with git clone REPOSITORY_LINK

  • Follow the instructions in the readme.md file in your cloned repository.

Friday

  • Homology - slides
  • Similarity vs homology
  • Pairwise sequence comparison
  • Sequence variations
  • Insertions, deletions and protein structure
  • The space of global alignment
  • Gap penalty functions
  • How to score an alignment
  • How can we find the best alignment?
  • Dynamic programming
  • Global alignment
  • Needleman-Wunch Algorithm




Week 7

Wednesday

Lab-6

  • Go to the following assignment.

  • Accept the assignment.

  • Copy the link of your repository

  • Clone the repo to your local machine with git clone REPOSITORY_LINK

  • Follow the instructions in the readme.md file in your cloned repository.

Friday

  • Group Projects - Stage 1




Week 8

Wednesday

Lab-7

  • Go to the following assignment.

  • Accept the assignment.

  • Copy the link of your repository

  • Clone the repo to your local machine with git clone REPOSITORY_LINK

  • Follow the instructions in the readme.md file in your cloned repository.

Friday

  • Discussion on Patient X’s gender. How to reveal it with WES?
  • Why does Patient X have many variants for spermatogenesis?
  • Homology (continued)- slides
  • Local alignment
  • Smith-Waterman Algorithm
  • Blast algorithm
  • Substitution Matrices
  • Local vs Global Alignment




Week 9

Wednesday

Lab-8

  • Go to the following assignment.

  • Accept the assignment.

  • Copy the link of your repository

  • Clone the repo to your local machine with git clone REPOSITORY_LINK

  • Follow the instructions in the readme.md file in your cloned repository.

Friday

  • Midterm I




Week 10

Wednesday

Lab-9

  • Go to the following assignment.

  • Accept the assignment.

  • Copy the link of your repository

  • Clone the repo to your local machine with git clone REPOSITORY_LINK

  • Follow the instructions in the readme.md file in your cloned repository.

Friday




Week 11

Wednesday

Lab-10

  • Go to the following assignment.

  • Accept the assignment.

  • Copy the link of your repository

  • Clone the repo to your local machine with git clone REPOSITORY_LINK

  • Follow the instructions in the readme.md file in your cloned repository.

Friday

  • Multiple Sequence Alignment - slides
  • Assumption of MSA
  • The use of MSA
  • Why is MSA useful compared to pairwise sequence alignment
  • How to score MSA
  • Dynamic Programming and its complexity
  • Star Alignment and its Problems
  • Progressive Alignment
  • Iterative Alignment
  • Progressive alignment; get pairs from newick tree.
  • Template-based alignment
  • MUSCLE
  • MAFFT

  • Protein Domain and Motif - Slides
  • Protein Domain vs Motif
  • Domain evolution
  • Sequence-based domain identification




Week 12

Wednesday

Lab Midterm

  • Go to the following assignment.

  • Accept the assignment.

  • Copy the link of your repository

  • Clone the repo to your local machine with git clone REPOSITORY_LINK

  • Follow the instructions in the readme.md file in your cloned repository.

Friday

  • PSSM
  • CDD
  • PSI-BLAST
  • RPS-BLAST
  • Three ways of identifying domains
  • Consensus sequence
  • Advantages/Disadvantages of PSI-BLAST
  • RPS-BLAST
  • HMM
  • HMMER tools
  • Databases: Pfam, CDD, Tigrfam
  • HHsearch

  • Phylogenetics

  • How to build phylogenetic trees
  • Rooting trees
  • Species tree vs Gene tree
  • Horizontol Gene Transfer
  • How to interpret phylogenetic trees
  • Maximum parsimony




Week 13

Wednesday

Check your mailbox.

Friday

  • UPGMA
  • Neighbor joining
  • Maximum likelihood
  • Bootstraping
  • Reconciled trees
  • Paralogy/Orthology
  • Differential gene loss

  • Next Generation Sequencing
  • PCR
  • Sanger Sequencing
  • Whole Genome Sequencing
  • Coverage concept
  • How much coverage do we need?
  • Exom sequencing
  • Microarray vs RNA-seq

Week 14

Wednesday

  • RNA-seq normalization
  • RNA-seq pipeline
  • NET-seq, GRO-seq
  • ChIP-seq
  • ATAC-seq
  • DNase-seq and others
  • Hi-C-seq

Final Group Projects - 4 groups

Friday

Final Group Projects - 4 groups