The following is a list of electives that students in the Biomedical Informatics program have taken in the past. These courses are not guaranteed to run every year and serve only as examples of suitable elective courses. Students should choose electives with the consultation of the Program Director.

Covers basic statistical techniques that are important for analyzing data arising from epidemiology, environmental health and biomedical and other public health-related research. Major topics include descriptive statistics, elements of probability, introduction to estimation and hypothesis testing, nonparametric methods, techniques for categorical data, regression analysis, analysis of variance, and elements of study design. Applications are stressed. Designed as an alternate to BIO200, for students desiring more emphasis on theoretical developments. Background in algebra and calculus strongly recommended.

Covers research design, sample selection, questionnaire construction, interviewing techniques, the reduction and interpretation of data, and related facets of population survey investigations. Focuses primarily on the application of survey methods to problems of health program planning and evaluation. Treatment of methodology is sufficiently broad to be suitable for students who are concerned with epidemiological, nutritional, or other types of survey research. Formerly BIO212

This course will introduce students involved with clinical research to the practical application of multiple regression analysis. Linear regression, logistic regression and proportional hazards survival models will be covered, as well as general concepts in model selection, goodness-of-fit, and testing procedures. Each lecture will be accompanied by a data analysis using SAS and a classroom discussion of the results. The course will introduce, but will not attempt to develop the underlying likelihood theory. Background in SAS programming ability required.

Designed for individuals interested in the scientific, policy, and management aspects of clinical trials. Topics include types of clinical research, study design, treatment allocation, randomization and stratification, quality control, sample size requirements, patient consent, and interpretation of results. Students design a clinical investigation in their own field of interest, write a proposal for it, and critique recently published medical literature. Course Prerequisites: BIO201 or ID200 or ID201 or ID207 or BIO202&203 or BIO206&207 or BIO206&208 or BIO206&209Formerly BIO214

This course is intended for students who are already very comfortable with fundamental techniques in statistics. The course will cover methods for building and interpreting linear regression models, including statistical assumptions and diagnostics, estimation and testing, and model building techniques. These models will be extended to handle data arising from longitudinal studies employing repeated measurement of subjects over time. Summer/Residential Course Note (Section 1): Lectures will be accompanied by computing exercises using the SAS statistical package. Online Course Note (Section 2): Lectures will be accompanied by computing exercises using the Stata statistical package. Course Prerequisites: EPI522 or BST201 or ID200 or ID201 or ID207 or BST202&203 or BST206&207 or BST206&208Formerly BIO501

The goal of this course is to enable scientists and public health professionals who already have an introductory background in biostatistics and clinical trials to acquire the competencies in quantitative skills and systems thinking required to understand and participate in drug development and regulatory review processes. The course illustrates how statistical and quantitative methods are used to transform information into evidence demonstrating the safety, efficacy and effectiveness of drugs and devices over the course the product's life cycle from a regulatory perspective. Content is delivered using a blended-learning approach involving lectures, web-based media and selected case study examples derived from actual FDA decision-making and regulatory assessments to highlight and describe each phase of the regulatory drug approval process. Case studies will illustrate regulatory science in action and practice and will include content publically available from the FDA's website that can be used in conjunction with FDA science-based guidance and decision precedents. Course Prerequisites: ID538 or [(BIO200 or ID200 or BIO201 or BIO202&203 or BIO206&207/8/9) and (EPI200 or EPI201 or EPI208 or EPI505)]Formerly BIO523

This course will provide a basic, yet thorough introduction to the probability theory and mathematical statistics that underlie many of the commonly used techniques in public health research. Topics to be covered include probability distributions (normal, binomial, Poisson), means, variances and expected values, finite sampling distributions, parameter estimation (method of moments, maximum likelihood), confidence intervals, hypothesis testing (likelihood ratio, Wald and score tests). All theoretical material will be motivated with problems from epidemiology, biostatistics, environmental health and other public health areas. This course is aimed towards second year doctoral students in fields other than Biostatistics. Background in algebra and calculus required. Course Prerequisites: BST210 or BST213 Formerly BIO222

Topics will include types of censoring, hazard, survivor, and cumulative hazard functions, Kaplan-Meier and actuarial estimation of the survival distribution, comparison of survival using log rank and other tests, regression models including the Cox proportional hazards model and the accelerated failure time model, adjustment for time-varying covariates, and the use of parametric distributions (exponential, Weibull) in survival analysis. Methods for recurrent survival outcomes and competing risks will also be discussed, as well as design of studies with survival outcomes. Class material will include presentation of statistical methods for estimation and testing along with current software (SAS, Stata) for implementing analyses of survival data. Applications to real data will be emphasized. Course Prerequisite(s): BST210 or BST213 or BST 230, or permission of instructor required. BST 213 may be taken concurrently. Formerly BIO223

This course introduces students to the diverse statistical methods used throughout the process of statistical genetics, from familial aggregation and segregation studies to linkage scans and association studies. Topics covered include basic principles from population genetics, multipoint and model-free linkage analysis, family-based and population-based association testing, and Genome Wide Association analysis. Instructors use ongoing research into the genetics of respiratory disease, psychiatric disorders and cancer to illustrate basic principles. Weekly homework supplements reading, course lectures, discussion and section. Relevant concepts in genetics and molecular genetics will be reviewed in lectures and labs. The emphasis of the course is fundamental principles and concepts. Course Prerequisites: BST210 (concurrent enrollment allowed)Course Note: There will be a weekly lab section; the time will be scheduled at first meeting. Formerly BIO227

This course is a practical introduction to the Bayesian analysis of biomedical data. It is an intermediate Master's level course in the philosophy, analytic strategies, implementation, and interpretation of Bayesian data analysis. Specific topics that will be covered include: the Bayesian paradigm; Bayesian analysis of basic models; Bayesian computing: Markov Chain Monte Carlo; STAN R software package for Bayesian data analysis; linear regression; hierarchical regression models; generalized linear models; meta-analysis; models for missing data. Programming and case studies will be used throughout the course to provide hands-on training in these concepts. Prerequisites: BST210 and BST222, or permission of the instructor

Introduction to the data structures and computer algorithms that are relevant to statistical computing. The implementation of data structures and algorithms for data management and numerical computations are discussed. Course Prerequisite(s): Instructor's Permission Formerly BIO514

BST247 is a seminar style course with readings selected from the literature in areas of expertise of the participating faculty. Content may vary from year to year. The specific objectives are (1) To train students to critically read foundational papers and current journal articles in Statistical Genetics, (2) To train students to present sophisticated ideas to an audience of peers, and (3) To prepare students to engage in doctoral level research in the area. After the course, students are expected to have an in-depth and broad understanding on important topics of statistical genetics research. Course Prerequisite(s): BIO227 and (BIO231 or EPI511). BIO231 may be taken concurrently. Formerly BIO257

This course is the second course in the foundational sequence of the School’s newly approved Master’s Degree in Health Data Science. The course will build upon our existing course, BST260 Introduction to Data Science, in presenting a set of tools for modeling and understanding complex datasets. Specifically, the course will provide practical regression and tree-based techniques for big data. Specific topics that will be covered include: linear model selection and regularization: LASSO and regularization; principal component regression and partial least squares; tree-based methods: decision trees; bagging, random forests, and boosting; unsupervised learning: principal components analysis, cluster analysis. Programming (Python and R) and case studies will be used throughout the course to provide hands-on training in these concepts. Prerequisites: BST260 or permission of instructor

Many systems of scientific and societal interest consist of a large number of interacting components. The structure of these systems can be represented as networks where network nodes represent the components and network edges the interactions between the components. Network analysis can be used to study how pathogens, behaviors and information spread in social networks, having important implications for our understanding of epidemics and the planning of effective interventions. In a biological context, at a molecular level, network analysis can be applied to gene regulation networks, signal transduction networks, protein interaction networks, and more. This introductory course covers some basic network measures, models, and processes that unfold on networks. The covered material applies to a wide range of networks, but we will focus on social and biological networks. To analyze and model networks, we will learn the basics of the Python programming language and its Network X module. The course contains a number of hands-on computer lab sessions. There are five homework assignments and four reading assignments that will be discussed in class. In addition, each student will complete a final project that applies network analysis techniques to study a public health problem. Course Prerequisites: BST201 or ID200 or ID201 or ID207 or BST202&203 or BST206&207 or BST206&208 Formerly BIO521

This course is an introduction to modern statistical computing techniques used to characterize and interpret cancer genome sequencing datasets. This Master's level course will begin with a basic introduction to DNA, genes, and genomes for students with no biology background. It will then introduce cancer as an evolutionary process and review landmarks in the history of cancer genetics, and discuss the basics of sequencing technology and modern Next Generation Sequencing. The course will cover the main steps involved in turning billions of short sequencing reads into a representation of the somatic genetic alterations characterizing an individual patient’s cancer, and will build on this foundation to study topics related to identifying mutations under positive selection from multiple tumors sampled in a population. By the end of the course, students will be able to apply state-of-the art analysis to cancer genome datasets and to critically evaluate papers employing cancer genome data.

Epigenetics is a fast growing field, with increasing applicability in environmental and epidemiology studies, focusing on the alterations in chromatin structure that can stably and heritably influence gene expression. Epigenetic changes can be as profound as those exerted by mutation, but, unlike mutations, are reversible and responsive to environmental influences. The course will focus on epigenetic mechanisms and laboratory methods for DNA methylamine, his tone modifications, small non-coding RNAs, and epigenomics. Ongoing experimental, and epidemiology studies (cohort, case-control, cross-sectional and repeated measurement studies) will be presented to introduce the students to the epigenetic effects in prenatal/early and adult life of environmental factors, including air pollution, metals, pesticides, benzene, PCBs, persistent organic pollutants, and diet. The course will enable them to understand and apply epigenetic methods in multiple areas, including cardiovascular and respiratory disease, aging, reproductive health, inflammation/immunity, and cancer.

EPI201 introduces the principles and methods used in epidemiologic research. The course discusses the conceptual and practical issues encountered in the design and analysis of epidemiologic studies for description and causal inference. EPI201 is the first course in the series of methods courses designed for students majoring in Epidemiology, Biostatistics and related fields, and those interested in a detailed introduction to the design and conduct of epidemiologic studies. Students who take EPI201 are expected to take EPI202 (Methods II). Course Note: Thursday or Friday lab required.

This course will present an introduction to the methods of data mining and predictive modeling, with applications to both genetic and clinical data. Basic concepts and philosophy of supervised and unsupervised data mining as well as appropriate applications will be discussed. Topics covered will include multiple comparisons adjustment, cluster analysis, principal component analysis, and predictive model building through logistic regression, classification and regression trees (CART), multivariate adaptive splines (MARS), neural networks, random forests, and bagging and boosting. Course Activities: Computer labs. Course Note: Students should be familiar with logistic regression.

Like all living things, pathogens have evolved by natural selection. The application of evolutionary principles to infectious disease epidemiology is crucial to such diverse subjects as outbreak analysis, the understanding of how different genomic combinations of virulence and drug resistance determinants emerge, and how selection acts to produce successful pathogens that balance the costs and benefits of virulence and transmission. The goal of this course is to introduce basic evolutionary concepts, highlighting the importance of transmission to the fitness as illustrated by comparisons of the adaptive process among different sorts of pathogenic microorganisms. Students will also learn the basics of phylogenetic sequence analysis for the study of outbreaks and transmission, and the construction of simple mathematical models that probe the adaptive process.Students outside of HSPH must request instructor permission to enroll in this course.

This is an introductory level class on the analysis of mortality, fertility and population change. It is required for all masters' and doctoral students in the department of Global Health and Population. Students are introduced to the core literature in this field through lectures, and assigned readings selected from peer-reviewed journals and textbooks. Together, these provide a graduate-level introduction to the principle sources and characteristics of population data and to the essential methods used for the analysis of population problems. The emphasis throughout is on understanding the key processes, models and assumptions used primarily for the analysis of demographic components. Practical training will be given through a required weekly laboratory session, assignments, and a final examination. Examples presented in class and used in assignments are drawn from several countries, combining both developed and developing in assignments are drawn from several countries, combining both developed and developing world realities.

This course is designed to introduce the student to the methods and growing range of applications of decision analysis and cost-effectiveness analysis in health technology assessment, medical and public health decision making, and health resource allocation. The objectives of the course are: (1) to provide a basic technical understanding of the methods used, (2) to give the student an appreciation of the practical problems in applying these methods to the evaluation of clinical interventions and public health policies, and (3) to give the student an appreciation of the uses and limitations of these methods in decision making at the individual, organizational, and policy level both in developed and developing countries.

Moving from simple (two-party, one-shot, price deals) to complex (multiple parties and issues, internal divisions, long time-frames, cross-border deals), the course integrates three complementary perspectives: analytic, behavioral, and contextual. While we will analyze a number of traditional case studies, the heart of the course is a series of interactive negotiation exercises. These exercises will give you hands-on negotiating experience. You will learn first by actually negotiating, and then by stepping back to compare your approach and results with others. You will be able to test your analytic ability and tactical skill, and to experiment with new approaches.

The course is a laboratory in which you will be both experimenter and subject. Sometimes the most important learning comes from apparent "failure"-and so the course is designed to let you fail in the safe setting of a classroom, and thus help you avoid costly real mistakes.

This course will provide a firm foundation for understanding the relationship between molecular biology, developmental biology, genetics, genomics, bioinformatics, and medicine. The goal is to develop explicit connections between basic research, medical understanding, and the perspective of patients. During the course the principles of human genetics will be reviewed. Students will become familiar with the translation of clinical understanding into analysis at the level of the gene, chromosome and molecule, the concepts and techniques of molecular biology and genomics, and the strategies and methods of genetic analysis, including an introduction to bioinformatics. The course will extend beyond basic principles to current research activity in human genetics.

Usability and design as keys to successful technology. Covers user observation techniques, needs assessment, low and high fidelity prototyping, usability testing methods, as well as theory of human perception and performance, and design best practices. Focuses on understanding and applying the lessons of human interaction to the design of usable systems; will also look at lessons to be learned from less usable systems. The course includes several small and one large project.

Data Science 1 is the first half of a one-year introduction to data science. The course will focus on the analysis of messy, real life data to perform predictions using statistical and machine learning methods. Material covered will integrate the five key facets of an investigation using data: (1) data collection - data wrangling, cleaning, and sampling to get a suitable data set;&nbsp; (2) data management - accessing data quickly and reliably; (3) exploratory data analysis, generating hypotheses and building intuition; (4) prediction or statistical learning; and (5) communication , summarizing results through visualization, stories, and interpretable summaries. Recommended: Programming knowledge at the level of CS 50 or above, and statistics knowledge at the level of Stat 100 or above (Stat 110 recommended).

Data Science 2 is the second half of a one-year introduction to data science. Building upon the material in Data Science 1, the course introduces advanced methods for data wrangling, data visualization, and statistical modeling and prediction. Topics include big data and database management, interactive visualizations, nonlinear statistical models, and deep learning.

The Harvard Catalyst Postgraduate Education Program in Clinical & Translational Science provides training to clinical investigators through a range of educational offerings. This course is part of the advanced curriculum and is designed for independent researchers.

This course offers a comprehensive introduction to biostatistics in medical research. The course includes a review of the most common techniques in the field, as well as the manner in which these techniques are applied in standard statistical software. At the conclusion of the course, participants will be able to choose an appropriate study design, calculate the sample size needed to complete a study, analyze the collected data, and communicate the results from their experiment.