CS 502

Algorithms in Computational Molecular Biology

CS 502 • Spring 2025 • University of Illinois Chicago

Introduction

The revolutionary development of genomics has led to an immense volume of digital genetic information, offering vast opportunities for biological and medical discoveries. To analyze this data, advanced computational methods have become essential, giving rise to the field of Computational Molecular Biology. This field attracts computer scientists because genes, proteins, and genomes can be viewed as strings of symbols, making techniques from text processing applicable. Moreover, large-scale projects like ENCODE and Human BioMolecular Atlas Program (HuBMAP) have generated massive datasets, posing complex computational challenges.

The course will be focused on the design and analysis of efficient (combinatorial) algorithms for important problems in computational molecular biology. It will also delve into recent advancements in deep learning techniques for computational biology in the final two weeks of lectures. The format of the course will include lectures by the instructor, class discussion, directed reading, homeworks and student presentations. We emphasize mathematics, algorithms, and data structures instead of biological implications and applications, although some relevant biological background and motivations will be discussed.


Logistics

Textbooks

  • Bioinformatics Algorithms: An Active Learning Approach (UIC edition)
    Phillip Compeau and Pavel Pevzner
    Active Learning Publishers, 2018

The following textbooks may also be helpful as references.

  • An Introduction to Bioinformatics Algorithms
    Neil C. Jones and Pavel Pevzner
    MIT Press, 2004

  • Introduction to Computational Biology: Maps, Sequences and Genomes
    Michael S. Waterman
    Chapman Hall, 1995

Prerequisites

  • Prerequisites: CS 401 Computer Algorithms I
  • Recommended background: CS 501 and some exposure to basic biology.

Syllabus

  • Introductory lecture (1 lecture)
  • DNA mapping (1 lecture) (Slides)
  • Motif finding (2 lectures) (Slides)
  • Genome rearrangements (2 lectures) (Slides)
  • Sequence alignment (2.5 lectures) (Slides)
  • Multiple sequence alignment (1 lecture)
  • Graph algorithms for DNA sequencing (2 lectures) (Slides)
  • Reconstruction of evolutionary trees (3 lectures) (Slides)
  • HMM (1.5 lectures) (Slides)
  • Gene expression analysis (1 lecture) (Slides)
  • DL for protein structure prediction (2 lectures)
  • DL for single cell genomics (2 lectures)

Reading assignments


Grading

Grading will be based upon three problem sets containing either programming questions or written questions (50%), a presentation (40%), and participation (10%). Attendance in lecture is important as the class moves quickly and you will need to be present.

Paper presentation

Choose a paper from the Proceedings of RECOMB 2023 or 2024. Reserve your spot early using the provided sign-up sheet. Presentations will begin on March 20th and continue through the last week of instruction, ending on Thursday, May 1st. Please email your PowerPoint file to me at least two days before your scheduled presentation. Each presentation should last approximately 20 minutes, followed by 5 minutes for audience questions.

Ensure your presentation covers the following sections: (1) Biological background (~3 minutes): introduce the biological context. For example, if the paper discusses protein structure prediction, explain what proteins are, the significance of their structure, how it is determined, and why it matters. (2) Computational problem overview (~2 minutes): highlight the main computational problems addressed in the paper and the prediction target. (3) Existing approaches (~2 minutes): briefly discuss existing algorithms or methods used to solve the stated problem. (4) Proposed method (~5 minutes): describe the method introduced in the paper in detail. (5) Results (~5 minutes): summarize the key findings and results presented in the paper. (6) Future directions (~3 minutes): conclude with potential future research directions based on the paper's work.


  • Instructor Hao Chen
  • Email: chenhao [AT] uic [DOT] edu
  • Office hours: Thursday 12:30 - 1:30 pm