Statistics and Machine Learning for Genomics
BIOL-GA 2031
Next‐generation sequencing has led to the rise of large and noisy biological datasets, which require increasingly advanced analytical methods to glean biological insights. This course aims to enable students to analyze diverse types of genomic data, ranging from studies focused on human genetics (i.e. Genome‐wide association studies), to functional genomics (i.e. ChIP‐seq or RNA_seq, extending even to the single cell level) To accomplish this, we will review the theory and implementation behind key concepts and methods in statistical learning, and apply these to genomic datasets. The course will roughly be divided into two sections: supervised and unsupervised learning. In the first section, we will focus on predictive algorithms that perform classification and regression based on training datasets. In the second section, we will explore methods used to identify hidden structure in large genomics datasets, in particular, clustering algorithms and dimensional reduction techniques.