Observation Theory: Estimating the Unknown (edX)

Learn how to estimate parameters from observational data for real-world engineering applications and assess the quality of the results. Are you an engineer, scientist or technician? Are you dealing with measurements or big data, but are you unsure about how to proceed? This is the course that teaches you how to find the best estimates of the unknown parameters from noisy observations. You will also learn how to assess the quality of your results.

TU Delft’s approach to observation theory is world leading and based on decades of experience in research and teaching in geodesy and the wider geosciences. The theory, however, can be applied to all the engineering sciences where measurements are used to estimate unknown parameters.

The course introduces a standardized approach for parameter estimation, using a functional model (relating the observations to the unknown parameters) and a stochastic model (describing the quality of the observations). Using the concepts of least squares and best linear unbiased estimation (BLUE), parameters are estimated and analyzed in terms of precision and significance.

The course ends with the concept of overall model test, to check the validity of the parameter estimation results using hypothesis testing. Emphasis is given to develop a standardized way to deal with estimation problems. Most of the course effort will be on examples and exercises from different engineering disciplines, especially in the domain of Earth Sciences.

This course is aimed towards Engineering and Earth Sciences students at Bachelor’s, Master’s and postgraduate level.

Course Syllabus

Week 1: Introduction

Introduction on what is “estimation” and when do we need it? What are the generic sources of uncertainty in observations, and what concepts are needed, e.g. deterministic vs. stochastic parameters, random vs. systematic errors, precision vs. accuracy, bias, and the probability distribution function as a metric of randomness. All the concepts are explained by various practical examples.

Week 2: Mathematical models

Learn how to develop a systematic approach to translate real-life problems into mathematical models in the form of observation-equation system including four fundamental blocks: vector of observations, vector of unknown parameters, linear (or linearized) functional relation between observations and unknowns, and stochastic characteristics of observations in the form of dispersion (or covariance matrix) of the observation vector. As well as discussion on different concepts, such as linear vs. nonlinear models, functional vs. stochastic models, consistent vs. inconsistent models, over/under –determined models, redundancy, and solvability of observation-equation systems. All the aforementioned concepts are explained by various practical examples.

Week 3: Least Squares Estimation (LSE)

Given a mathematical model, how to find an estimate that predicts the observations as close as possible? Introduction to (weighted) least squares estimation (WLSE), its mathematical logic and its main properties. Different applications of WLSE are demonstrated via practical examples, as well as discussion on numerical/computational aspects of applying WLSE.

Week 4: Best Linear Unbiased Estimation (BLUE)

How to find the most precise and accurate estimate in linear models? Introduction to the concept of Best Linear Unbiased Estimation (BLUE), its theory and implication, and its relation to other estimators such as WLSE, maximum likelihood, and minimum variance estimators. The concept of BLUE and its application in various real problems are demonstrated by examples and exercises.

Week 5: How precise is the estimate?

Discussion on how the uncertainty/randomness in observations (depicted by a stochastic model) propagates to the uncertainty/randomness of estimates (depicted by probability density function or covariance matrix of estimators). Introduction to the concept of error propagation and its application in specification of the uncertainty/precision of estimates, inferring confidence intervals or statistical tolerance levels of the results, and describing the expected variability of the results of an estimation. The interpretation of covariance matrices and confidence intervals is discussed and clarified via different examples and exercises.

Week 6: Does the estimate make sense?

Introduction to a probabilistic decision making process (or statistical hypothesis testing) in validating the results of estimation in order to avoid wrong decisions/interpretation of the results. Students learn how to verify the validity of a chosen mathematical model, and how to detect and identify model misspecifications. The concepts are explained by various practical examples.

This course focuses on one of the most important tools in your data analysis arsenal: regression analysis. Using either SAS or Python, you will begin with linear regression and then learn how to adapt when two variables do not present a clear linear relationship. You will examine multiple predictors of your outcome and be able to identify confounding variables, which can tell a more compelling story about your results. You will learn the assumptions underlying regression analysis, how to interpret regression coefficients, and how to use regression diagnostic plots and other tools to evaluate the quality of your regression model. Throughout the course, you will share with others the regression models you have developed and the stories they tell you.

Have you ever had the perfect data science experience? The data pull went perfectly. There were no merging errors or missing data. Hypotheses were clearly defined prior to analyses. Randomization was performed for the treatment of interest. The analytic plan was outlined prior to analysis and followed exactly. The conclusions were clear and actionable decisions were obvious. Has that every happened to you? Of course not. Data analysis in real life is messy. How does one manage a team facing real data analyses? In this one-week course, we contrast the ideal with what happens in real life. By contrasting the ideal, you will learn key concepts that will help you manage real life analyses.

Statistical inference is the process of drawing conclusions about populations or scientific truths from data. There are many modes of performing inference including statistical modeling, data oriented strategies and explicit use of designs and randomization in analyses. Furthermore, there are broad theories (frequentists, Bayesian, likelihood, design based, …) and numerous complexities (missing data, observed and unobserved confounding, biases) for performing inference. A practitioner can often be left in a debilitating maze of techniques, philosophies and nuance.

This course focuses on the concepts and tools behind reporting modern data analyses in a reproducible manner. Reproducible research is the idea that data analyses, and more generally, scientific claims, are published with their data and software code so that others may verify the findings and build upon them. The need for reproducibility is increasing dramatically as data analyses become more complex, involving larger datasets and more sophisticated computations. Reproducibility allows for people to focus on the actual content of a data analysis, rather than on superficial details reported in a written summary.

This course covers the essential exploratory techniques for summarizing data. These techniques are typically applied before formal modeling commences and can help inform the development of more complex statistical models. Exploratory techniques are also important for eliminating or sharpening potential hypotheses about the world that can be addressed by the data. We will cover in detail the plotting systems in R as well as some of the basic principles of constructing data graphics. We will also cover some of the common multivariate statistical techniques used to visualize high-dimensional data.

In business, data and algorithms create economic value when they reduce uncertainty about financially important outcomes. This course teaches the concepts and mathematical methods behind the most powerful and universal metrics used by Data Scientists to evaluate the uncertainty-reduction – or information gain - predictive models provide. We focus on the two most common types of predictive model - binary classification and linear regression - and you will learn metrics to quantify for yourself the exact reduction in uncertainty each can offer. These metrics are applicable to any form of model that uses new information to improve predictions cast in the form of a known probability distribution – the standard way of representing forecasts in data science.

In this course, you will develop and test hypotheses about your data. You will learn a variety of statistical tests, as well as strategies to know how to apply the appropriate one to your specific data and question. Using your choice of two powerful statistical software packages (SAS or Python), you will explore ANOVA, Chi-Square, and Pearson correlation analysis. This course will guide you through basic statistical principles to give you the tools to answer questions you have developed. Throughout the course, you will share your progress with others to gain valuable feedback and provide insight to other learners about their work.

This course is for novice programmers or business people who'd like to understand the core tools used to wrangle and analyze big data. With no prior experience, you'll have the opportunity to walk through hands-on examples with Hadoop and Spark frameworks, two of the most common in the industry. You will be comfortable explaining the specific components and basic processes of the Hadoop architecture, software stack, and execution environment. In the assignments you will be guided in how data scientists apply the important concepts and techniques, such as Map-Reduce that are used to solve fundamental problems in big data. You'll feel empowered to have conversations about big data and the data analysis processes.

One of the skills that characterizes great business data analysts is the ability to communicate practical implications of quantitative analyses to any kind of audience member. Even the most sophisticated statistical analyses are not useful to a business if they do not lead to actionable advice, or if the answers to those business questions are not conveyed in a way that non-technical people can understand. In this course you will learn how to become a master at communicating business-relevant implications of data analyses.

Get an overview of the data, questions, and tools that data analysts and data scientists work with. This is the first course in the Johns Hopkins Data Science Specialisation. In this course you will get an introduction to the main tools and ideas in the data scientist's toolbox. The course gives an overview of the data, questions, and tools that data analysts and data scientists work with. There are two components to this course. The first is a conceptual introduction to the ideas behind turning data into actionable knowledge. The second is a practical introduction to the tools that will be used in the program like version control, markdown, git, Github, R, and Rstudio.

MOOCs – Massive Open Online Courses – enable students around the world to take university courses online. This guide, by the instructors of edX’s most successful MOOC in 2013-2014, Principles of Written English (based on both enrollments and rate of completion), advises current and future students how to get the most out of their online study, covering areas such as what types of courses are offered and who offers them, what resources students need, how to register, how to work effectively with other students, how to interact with professors and staff, and how to handle assignments. This second edition offers a new chapter on how to stay motivated. This book is suitable for both native and non-native speakers of English, and is applicable to MOOC classes on any subject (and indeed, for just about any type of online study).