► This dissertation research consists of five chapters with a focus on modeling spatial and temporal data. In chapter 1, we explained different terminology and…
(more)

▼ This dissertation research consists of five chapters with a focus on modeling spatial and temporal data. In chapter 1, we explained different terminology and principles that appear frequently in the analysis of spatial and temporal data. These concepts were explained in detail to form a basis and motivation for the research work. In particular, the measures of spatial autocorrelation were discussed in detail and various methods of the computing these measures were discussed. In chapter 2, Spatial Modeling Techniques for Lattice Data were discussed. In addition to Ordinary least squares, a conventional method of modeling spatial data; various types of spatial regression techniques, such as Simultaneous Autoregressive (SAR), Conditional Autoregressive (CAR), Generalized Least Squares (GLS), Linear Mixed Effects (LME), and Geographically Weighted Regression (GWR) were discussed. Comparative studies of these modeling techniques were carried out using a real world dataset and an artificially generated spatial dataset. In chapter 3, a recently developed spatial analytical tool, Geographically Weighted Regression (GWR) was used to deal with spatial non stationarity in modeling the crop residue yield potential for North Central region of the USA. The explanatory power of the Ordinary Least Squares and Geographically Weighted Regression models were assessed by approximate likelihood ratio test. Furthermore, the effect of sample size on the spatial heterogeneity of the GWR parameters was investigated by using data sets with small and large samples. In chapter 4, Statistical Analysis of Land Cover of South Dakota was carried out. In particular, the research work focused on how land cover of 66 counties of South Dakota State changed over the years 2001-2006. In addition, it studied the existing relationships between population density and agricultural land cover for 66 counties of South Dakota for these years. In chapter 5, we conclude this study with a summary of the results and directions for future research.
Advisors/Committee Members: Gary D. Hatfield.

▼ Markov distributions describe multivariate data with conditional independence structures. Dawid and Lauritzen (1993) extended this idea to hyper Markov laws for prior distributions. A hyper Markov law is a distribution over Markov distributions whose marginals satisfy the same conditional independence constraints. These laws have been used for Gaussian mixtures (Escobar, 1994; Escobar and West, 1995) and contingency tables (Liu and Massam, 2006; Dobra and Massam, 2009). In this paper, we develop a family of non-parametric hyper Markov laws that we call hyper Dirichlet processes, combining the ideas of hyper Markov laws and non-parametric processes. Hyper Dirichlet processes are joint laws with Dirichlet process laws for particular marginals. We also describe a more general class of Dirichlet processes that are not hyper Markov, but still contain useful properties for describing graphical data. The graphical Dirichlet processes are simple Dirichlet processes with a hyper Markov base measure. This class allows an extremely straight-forward application of existing Dirichlet knowledge and technology to graphical settings. Given the wide-spread use of Dirichlet processes, there are many applications of this framework waiting to be explored. One broad class of applications, known as Dirichlet process mixtures, has been used for constructing mixture densities such that the underlying number of components may be determined by the data (Lo, 1984; Escobar, 1994; Escobar and West, 1995). I consider the use of the new graphical Dirichlet process in this setting, which imparts a conditional independence structure inside each component. In other words, given the component or cluster membership, the data exhibit the desired independence structure. We discuss two applications. Expanding on the work of Escobar and West (1995), we estimate a non-parametric mixture of Markov Gaussians using a Gibbs sampler. Secondly, we employ the Mode-Oriented Stochastic Search of Dobra and Massam (2009) for determining a suitable conditional independence model, focusing on contingency tables. In general, the mixing induced by a Dirichlet process does not drastically increase the complexity beyond that of a simpler Bayesian hierarchical models sans mixture components. We provide a specific representation for decomposable graphs with useful algorithms for local updates.

► In this thesis, I derive generalization error bounds — bounds on the expected inaccuracy of the predictions — for time series forecasting models. These bounds…
(more)

▼ In this thesis, I derive generalization error bounds — bounds on the expected inaccuracy of the predictions — for time series forecasting models. These bounds allow forecasters to select among competing models, and to declare that, with high probability, their chosen model will perform well — without making strong assumptions about the data generating process or appealing to asymptotic theory. Expanding upon results from statistical learning theory, I demonstrate how these techniques can help time series forecasters to choose models which behave well under uncertainty. I also show how to estimate the β-mixing coefficients for dependent data so that my results can be used empirically. I use the bound explicitly to evaluate different predictive models for the volatility of IBM stock and for a standard set of macroeconomic variables. Taken together my results show how to control the generalization error of time series models with fixed or growing memory.

► The regular regression procedure predicts the value of the dependent variable using the error free independent variable, but in reality, we may always have…
(more)

▼ The regular regression procedure predicts the value of the dependent variable using the error free independent variable, but in reality, we may always have some degree of measurement error in independent variable, which makes the parameter estimates biased. In these cases, we need to look for an alternative to the regular regression model for the correct prediction of the dependent variable. The alternative to Ordinary Least Square (OLS) could be Reduced Major Axis (RMA) and Grouping method when the error variances of both the independent and dependent variables are not exactly known, whereas, in the situation where the error variances are exactly known, the Deming/Orthogonal regression could be the best alternative to OLS. This paper basically focuses on a brief introduction to the different error models with major emphasis on the application of Deming regression procedure, modeled on the principle of projection of data points to the regression line so that the square of the total distance between data points and the regression line is minimized. The angle of projection of data point to the regression line is given by the ratio of the error variances between the independent and dependent variables. When the assumption that the error variable is fixed is violated, the parameter estimates from the Deming regression are unbiased as compared to the Ordinary Least square (OLS) estimates which gives a more accurate prediction. In this study of prediction of starch concentration in the corn flour using the difference in composition of spectra as an input variable, the data points are 34% closer to Deming line as compared to OLS line. For research in biological sciences where independent variable has noise greater than 10%, prediction made by Deming regression beats the OLS and reduces or eliminates bias. We can consider 10% noise in the independent variable as threshold to switch from OLS to Deming regression.
Advisors/Committee Members: Thomas Brandenburger.

► Dual-Process Theories provide a useful framework for exploring the potential constraints and sources of errors on reasoning tasks, such as the classic base rate reasoning…
(more)

▼ Dual-Process Theories provide a useful framework for exploring the potential constraints and sources of errors on reasoning tasks, such as the classic base rate reasoning task. Three competing accounts for base rate neglect have been offered in the literature (i.e., knowledge-deficit, monitoring-failure, and inhibition-failure). However, efforts to test the underlying processes and sources of errors offered by these accounts have been limited by 1) a lack of proper problem type comparisons, 2) a lack of individual difference measures, and 3) the use of binary selection paradigms, which lacks the sensitivity to detect more nuanced cases of base rate utilization (e.g., when base rate use does not simply manifest itself in terms of which group is selected as “more probable”). The current study addressed these issues by independently manipulating the utility of base rate probabilities (i.e., “Bayesian priors”) and the diagnosticity of feature probabilities. This study also more directly measured deviations from Bayesian inference by collecting subjective posterior probability judgments for each problem and comparing these with objective Bayesian posterior probability estimates. Furthermore, individual difference measures related to each of the three prevailing accounts for base rate neglect were used to predict performance. Findings indicate that base rate neglect is partly due to lack of knowledge in how to apply base rates, even in the absence of a prepotent heuristic response. The results support a multifaceted account of why people neglect base rate information, indicate that forced-choice response paradigms inflate the incidence of base rate neglect, and suggest that base rate application approximates a simple averaging norm more than Bayesian norms.
Advisors/Committee Members: Griffin, Thomas D (advisor).

► Multiclass classification with high-dimensional data is an applied topic both in statistics and machine learning. The classification procedure could be done in various ways.…
(more)

▼ Multiclass classification with high-dimensional data is an applied topic both in statistics and machine learning. The classification procedure could be done in various ways. In this thesis, we review the theory of the Lasso procedure which provides a parameter estimator while simultaneously achieving dimension reduction due to a property of the L1 norm. Lasso with elastic net penalty and sparse group lasso are also reviewed. Our data is high-dimensional proteomic data (iTRAQ ratios) of breast cancer patients with four subtypes of breast cancer. We use the multinomial logistic regression to train our classifier and use the false classification rates obtained from cross validation to compare models.
Advisors/Committee Members: Todd Kuffner, Jose Figueroa-Lopez, Nan Lin.

► Understanding the complexities that are involved in the genetics of multifactorial diseases is still a monumental task. In addition to environmental factors that can influence…
(more)

▼ Understanding the complexities that are involved in the genetics of multifactorial diseases is still a monumental task. In addition to environmental factors that can influence the risk of disease, there is also a number of other complicating factors. Genetic variants associated with age of disease onset may be different from those variants associated with overall risk of disease, and variants may be located in positions that are not consistent with the traditional protein coding genetic paradigm. Latent Variable Models are well suited for the analysis of genetic data. A latent variable is one that we do not directly observe, but which is believed to exist or is included for computational or analytic convenience in a model. This thesis presents a mixture of methodological developments utilising latent variables, and results from case studies in genetic epidemiology and comparative genomics. Epidemiological studies have identified a number of environmental risk factors for appendicitis, but the disease aetiology of this oft thought useless vestige remains largely a mystery. The effects of smoking on other gastrointestinal disorders are well documented, and in light of this, the thesis investigates the association between smoking and appendicitis through the use of latent variables. By utilising data from a large Australian twin study questionnaire as both cohort and case-control, evidence is found for the association between tobacco smoking and appendicitis. Twin and family studies have also found evidence for the role of heredity in the risk of appendicitis. Results from previous studies are extended here to estimate the heritability of age-at-onset and account for the eect of smoking. This thesis presents a novel approach for performing a genome-wide variance components linkage analysis on transformed residuals from a Cox regression. This method finds evidence for a dierent subset of genes responsible for variation in age at onset than those associated with overall risk of appendicitis. Motivated by increasing evidence of functional activity in regions of the genome once thought of as evolutionary graveyards, this thesis develops a generalisation to the Bayesian multiple changepoint model on aligned DNA sequences for more than two species. This sensitive technique is applied to evaluating the distributions of evolutionary rates, with the finding that they are much more complex than previously apparent. We show strong evidence for at least 9 well-resolved evolutionary rate classes in an alignment of four Drosophila species and at least 7 classes in an alignment of four mammals, including human. A pattern of enrichment and depletion of genic regions in the profiled segments suggests they are functionally significant, and most likely consist of various functional classes. Furthermore, a method of incorporating alignment characteristics representative of function such as GC content and type of mutation into the segmentation model is developed within this thesis. Evidence of fine-structured segmental variation is…

► The research objectives of this thesis were to contribute to Bayesian statistical methodology by contributing to risk assessment statistical methodology, and to spatial and spatio-temporal…
(more)

▼ The research objectives of this thesis were to contribute to Bayesian statistical methodology by contributing to risk assessment statistical methodology, and to spatial and spatio-temporal methodology, by modelling error structures using complex hierarchical models. Specifically, I hoped to consider two applied areas, and use these applications as a springboard for developing new statistical methods as well as undertaking analyses which might give answers to particular applied questions. Thus, this thesis considers a series of models, firstly in the context of risk assessments for recycled water, and secondly in the context of water usage by crops. The research objective was to model error structures using hierarchical models in two problems, namely risk assessment analyses for wastewater, and secondly, in a four dimensional dataset, assessing differences between cropping systems over time and over three spatial dimensions. The aim was to use the simplicity and insight afforded by Bayesian networks to develop appropriate models for risk scenarios, and again to use Bayesian hierarchical models to explore the necessarily complex modelling of four dimensional agricultural data. The specific objectives of the research were to develop a method for the calculation of credible intervals for the point estimates of Bayesian networks; to develop a model structure to incorporate all the experimental uncertainty associated with various constants thereby allowing the calculation of more credible credible intervals for a risk assessment; to model a single day’s data from the agricultural dataset which satisfactorily captured the complexities of the data; to build a model for several days’ data, in order to consider how the full data might be modelled; and finally to build a model for the full four dimensional dataset and to consider the timevarying nature of the contrast of interest, having satisfactorily accounted for possible spatial and temporal autocorrelations. This work forms five papers, two of which have been published, with two submitted, and the final paper still in draft. The first two objectives were met by recasting the risk assessments as directed, acyclic graphs (DAGs). In the first case, we elicited uncertainty for the conditional probabilities needed by the Bayesian net, incorporated these into a corresponding DAG, and used Markov chain Monte Carlo (MCMC) to find credible intervals, for all the scenarios and outcomes of interest. In the second case, we incorporated the experimental data underlying the risk assessment constants into the DAG, and also treated some of that data as needing to be modelled as an ‘errors-invariables’ problem [Fuller, 1987]. This illustrated a simple method for the incorporation of experimental error into risk assessments. In considering one day of the three-dimensional agricultural data, it became clear that geostatistical models or conditional autoregressive (CAR) models over the three dimensions were not the best way to approach the data. Instead CAR models are used with neighbours…

Donald, M. (2011). Using Bayesian methods for the estimation of uncertainty in complex statistical models. (Thesis). Queensland University of Technology. Retrieved from https://eprints.qut.edu.au/47132/

Note: this citation may be lacking information needed for this citation format:Not specified: Masters Thesis or Doctoral Dissertation

► It is difficult to understand data and statisticalmodels in high-dimensional space. One way to approach the problem is conditional visualisation, but methods in this…
(more)

▼ It is difficult to understand data and statisticalmodels in high-dimensional space.
One way to approach the problem is conditional visualisation, but methods in this
area have lagged behind the considerable advances in statistical modelling in recent
decades. This thesis presents a new approach to conditional visualisation which
uses interactive computer graphics, and supports the exploration of a broad range
of statisticalmodels.
The new approach to conditional visualisation consists of visualising a single lowdimensional
section at a time, showing fitted models on the section, and enhancing
the section by displaying observed data which are near the section according to a
similarity measure. Two ways of choosing sections are given |choosing sections
interactively using data summary graphics, and choosing sections programmatically
according to some criteria.
The visualisations in this thesis necessitate interactive graphics, which are implemented
in the condvis package in R.

►Statisticalmodels of non-rigid deformable shape have wide application in many fields, including computer vision, computer graphics, and biometry. We show that shape deformations are…
(more)

▼Statisticalmodels of non-rigid deformable shape have
wide application in many fields, including computer vision,
computer graphics, and biometry. We show that shape deformations
are well represented through nonlinear manifolds that are also
matrix Lie groups. These pattern-theoretic representations lead to
several advantages over other alternatives, including a principled
measure of shape dissimilarity and a natural way to compose
deformations. Moreover, they enable building models using
statistics on manifolds. Consequently, such models are superior to
those based on Euclidean representations. We demonstrate this by
modeling 2D and 3D human body shape. Shape deformations are only
one example of manifold-valued data. More generally, in many
computer-vision and machine-learning problems, nonlinear manifold
representations arise naturally and provide a powerful alternative
to Euclidean representations. Statistics is traditionally concerned
with data in a Euclidean space, relying on the linear structure and
the distances associated with such a space; this renders it
inappropriate for nonlinear spaces. Statistics can, however, be
generalized to nonlinear manifolds. Moreover, by respecting the
underlying geometry, the statisticalmodels result in not only more
effective analysis but also consistent synthesis. We go beyond
previous work on statistics on manifolds by showing how, even on
these curved spaces, problems related to modeling a class from
scarce data can be dealt with by leveraging information from
related classes residing in different regions of the space. We show
the usefulness of our approach with 3D shape deformations. To
summarize our main contributions: 1) We define a new 2D articulated
model – more expressive than traditional ones – of deformable
human shape that factors body-shape, pose, and camera variations.
Its high realism is obtained from training data generated from a
detailed 3D model. 2) We define a new manifold-based representation
of 3D shape deformations that yields statistical
deformable-template models that are better than the current
state-of-the-art. 3) We generalize a transfer learning idea from
Euclidean spaces to Riemannian manifolds. This work demonstrates
the value of modeling manifold-valued data and their statistics
explicitly on the manifold. Specifically, the methods here provide
new tools for shape analysis.
Advisors/Committee Members: Black, Michael (Director), Bienenstock, Elie (Reader), Sudderth, Erik (Reader), Fisher III, John (Reader).

This thesis is concerned with aspects of the integrable Temperley–Lieb loop (TL(n)) model on a vertically infinite lattice with two non-trivial boundaries. The TL(n) model is central in the field of integrable lattice models, and different values of n relate to different physical models. For instance, the point n = 0 relates to critical dense polymers and a corresponding logarithmic conformal field theory. The point n = 1 corresponds to critical bond percolation on the square lattice, and has connections with a combinatorial counting problem of alternating sign matrices and their symmetry classes. For general n, the TL(n) model is closely related to the XXZ quantum spin chain and the 6-vertex model.
We construct the transfer matrix of the model, which describes the weights of all the possible configurations of one row of the lattice. When n = 1 the ground state eigenvector of this matrix can be interpreted as a probability distribution of the possible states of the system. Because of special properties exhibited by the transfer matrix at n = 1, we can show that the eigenvector is a solution of the q-deformed Knizhnik–Zamolodchikov equation, and we use this fact to explicitly calculate some of the components of the eigenvector. In addition, recursive properties of this transfer matrix allow us to compute the normalisation of the eigenvector, and show that it is the product of four Weyl characters of the symplectic group. Previous work in this area has produced results for the TL(1) loop model with periodic boundary conditions, two trivial boundaries and mixed (one trivial, one non-trivial) boundaries, but until recently little progress had been made on the case with two non-trivial boundaries. This boundary condition lends itself to calculations relating to horizontal percolation, which is not possible with the other boundary conditions.
One of these calculations is a type of correlation function that can be interpreted as the density of percolation cluster crossings between the two boundaries of the lattice. It is an example of a class of parafermionic observables recently introduced in an attempt to rigorously prove conformal invariance of the scaling limit of critical two-dimensional lattice models. It also corresponds to the spin current in the Chalker–Coddington model of the quantum spin Hall effect. We derive an exact expression for this correlation function, using properties of the transfer matrix of the TL(1) model, and find that it can be expressed in terms of the same symplectic characters as the normalisation.
In order to better understand these solutions, we use Sklyanin’s scheme to perform separation of variables on the symplectic character. We construct an invertible separating operator that transforms the multivariate character into a product of single variable polynomials. Analysing the asymptotics of these polynomials will lead, via the inverse transformation, to the…

► Prediction of rainfall over the Amazonian rainforest during wet season is fundamental to assess the regional water and energy balance and global carbon-climate feedbacks. Previous…
(more)

▼ Prediction of rainfall over the Amazonian rainforest during wet season is fundamental to assess the regional water and energy balance and global carbon-climate feedbacks. Previous observational analysis has identified some large-scale atmospheric dynamic and thermodynamics conditions that can influence the rainfall anomalies during the wet season. Based on these observed persistent conditions that started between June and August (JJA, dry season), we have developed and evaluated several statisticalmodels to predict rainfall conditions during September to November (SON, early wet season) for the Southern Amazonia (5-15oS, 50-70oW). Multivariate Empirical Orthogonal Function (EOF) Analysis is applied to the following four fields during JJA from the ECMWF Reanalysis (ERA-Interim) spanning from year 1979 to 2015: geopotential height at 200 hPa, surface relative humidity, convective inhibition energy (CIN) index and convective available potential energy (CAPE), to filter out noise and highlight the most coherent spatial and temporal variations. The first 10 EOF modes are retained for inputs to the statisticalmodels, accounting for at least 70% of the total variance in the predictor fields. Then the 12-fold cross-validation method is used to estimate the tuning parameters used in the regression algorithms. Ridge Regression and Lasso Regression are able to capture the spatial pattern and magnitude of rainfall anomalies. Compared with the seasonal prediction based on dynamical models, this statistical prediction system has better predictions than the seasonal predictions of the dynamic climate model. The statisticalmodels show longer and more accurate predictive persistence of the rainfall anomalies. In addition, we use Logistic regression and Neural Networks to predict the categorical states of rainfall over the Southern Amazon by classifying the rainfall states into two categories, i.e., dry and wet. Our statisticalmodels show overall better predictions of categorical rainfall states than the magnitudes of rainfall in our study region. The accuracy of the statistical prediction based on Neural Networks method can reach greater than 90%, which is much higher than the simple logistic regression method, indicating the non-linearity of the atmospheric processes. Both predictions of the magnitudes and states of rainfall anomalies can be combined to provide more accurate information. The models we have developed have broad implications on the future development of seasonal climate prodictions and can be used for real-time forecasts in the future.
Advisors/Committee Members: Daniels, Michael Joseph (advisor).

-8196-3436. (2016). On the predictability of rainfall anomalies over the Southern Amazonia : a comparison between NMME and statistical models. (Thesis). University of Texas – Austin. Retrieved from http://hdl.handle.net/2152/46021

Note: this citation may be lacking information needed for this citation format:Author name may be incomplete
Not specified: Masters Thesis or Doctoral Dissertation

Chicago Manual of Style (16th Edition):

-8196-3436. “On the predictability of rainfall anomalies over the Southern Amazonia : a comparison between NMME and statistical models.” 2016. Thesis, University of Texas – Austin. Accessed January 21, 2019.
http://hdl.handle.net/2152/46021.

Note: this citation may be lacking information needed for this citation format:Author name may be incomplete
Not specified: Masters Thesis or Doctoral Dissertation

Note: this citation may be lacking information needed for this citation format:Author name may be incomplete

Vancouver:

-8196-3436. On the predictability of rainfall anomalies over the Southern Amazonia : a comparison between NMME and statistical models. [Internet] [Thesis]. University of Texas – Austin; 2016. [cited 2019 Jan 21].
Available from: http://hdl.handle.net/2152/46021.

Note: this citation may be lacking information needed for this citation format:Author name may be incomplete
Not specified: Masters Thesis or Doctoral Dissertation

Council of Science Editors:

-8196-3436. On the predictability of rainfall anomalies over the Southern Amazonia : a comparison between NMME and statistical models. [Thesis]. University of Texas – Austin; 2016. Available from: http://hdl.handle.net/2152/46021

Note: this citation may be lacking information needed for this citation format:Author name may be incomplete
Not specified: Masters Thesis or Doctoral Dissertation

► Outlier detection is one of the most important challenges with many present-day applications. Outliers can occur due to uncertainty in data generating mechanisms or…
(more)

▼ Outlier detection is one of the most important challenges with many present-day applications. Outliers can occur due to uncertainty in data generating mechanisms or due to an error in data recording/processing. Outliers can drastically change the study's results and make predictions less reliable. Detecting outliers in longitudinal studies is quite challenging because this kind of study is working with observations that change over time. Therefore, the same subject can produce an outlier at one point in time produce regular observations at all other time points. A Bayesian hierarchical modeling assigns parameters that can quantify whether each observation is an outlier or not. The purpose of this thesis is to detect the outlying observations by developing three approaches of techniques and comparing each of them under dierent data generating mechanisms. In the rst chapter, we introduce the important concepts in Bayesian inference with three examples. The rst two examples (Binomial and Poisson distributions) are to illustrate the idea behind the Monte Carlo method, while the last example (normal distribution) is to illustrate the Markov Chain Monte Carlo (MCMC). We visited three dierent types of MCMC Methods: Metropolis-Hastings, Gibbs sampler and Slice sampler which we have used in the three algorithms of outlier detection. In Chapter Two, we used Gibbs sampler techniques with the linear regression model. Simulated data with three covariates were used, and then we applied our method to a real dataset: the Strong Rock data. We explained the ndings using diagrams. In Chapter Three, we focused on the core problem of identifying outliers by using three methods. We applied our methods on four simulation datasets. We found that the rst two methods did not work well under assumptions of systematic heteroscedasticity but the last one did an ecient job, as we expected, even when the functional form of heteroscedasticity was not correctly specied. Next, we formulated our model for the real data, so we could apply the methods that we developed in chapter three. Given access to the real data that have large numbers of observations, we will apply these methods.
Advisors/Committee Members: Avishek Chakraborty, Mark Arnold, Giovanni Petris.

Al-Sharea, Z. (2017). Bayesian Model for Detection of Outliers in Linear Regression with Application to Longitudinal Data. (Masters Thesis). University of Arkansas. Retrieved from http://scholarworks.uark.edu/etd/2591

Al-Sharea Z. Bayesian Model for Detection of Outliers in Linear Regression with Application to Longitudinal Data. [Masters Thesis]. University of Arkansas; 2017. Available from: http://scholarworks.uark.edu/etd/2591

►Statisticalmodels have been very popular for estimating the performance of highway safety improvement programs which are intended to reduce motor vehicle crashes. The traditional…
(more)

▼Statisticalmodels have been very popular for estimating the performance of highway
safety improvement programs which are intended to reduce motor vehicle crashes. The
traditional Poisson and Poisson-gamma (negative binomial) models are the most popular
probabilistic models used by transportation safety analysts for analyzing traffic crash
data. The Poisson-gamma model is usually preferred over traditional Poisson model
since crash data usually exhibit over-dispersion. Although the Poisson-gamma model is
popular in traffic safety analysis, this model has limitations particularly when crash data
are characterized by small sample size and low sample mean values. Also, researchers
have found that the Poisson-gamma model has difficulties in handling under-dispersed
crash data. The primary objective of this research is to evaluate the performance of the
Conway-Maxwell-Poisson (COM-Poisson) model for various situations and to examine
its application for analyzing traffic crash datasets exhibiting over- and under-dispersion.
This study makes use of various simulated and observed crash datasets for accomplishing
the objectives of this research.
Using a simulation study, it was found that the COM-Poisson model can handle under-,
equi- and over-dispersed datasets with different mean values, although the credible
intervals are found to be wider for low sample mean values. The computational burden of
its implementation is also not prohibitive. Using intersection crash data collected in
Toronto and segment crash data collected in Texas, the results show that COM-Poisson
models perform as well as Poisson-gamma models in terms of goodness-of-fit statistics and predictive performance. With the use of crash data collected at railway-highway
crossings in South Korea, several COM-Poisson models were estimated and it was found
that the COM-Poisson model can handle crash data when the modeling output shows
signs of under-dispersion. The results also show that the COM-Poisson model provides
better statistical performance than the gamma probability and traditional Poisson models.
Furthermore, it was found that the COM-Poisson model has limitations similar to that of
the Poisson-gamma model when handling data with low sample mean and small sample
size. Despite its limitations for low sample mean values for over-dispersed datasets, the
COM-Poisson is still a flexible method for analyzing crash data.
Advisors/Committee Members: Lord, Dominique (advisor), Guikema, Seth (committee member), Hart, Jeff (committee member), Sinha, Samiran (committee member).

► Climate change is one of the great challenges facing agriculture in the 21st century. The goal of this study was to produce projections of…
(more)

▼ Climate change is one of the great challenges facing agriculture in the 21st century. The goal of this study was to produce projections of crop yields for the central United States in the 2030s, 2060s, and 2090s based on the relationship between weather and yield from historical crop yields from 1980 to 2010. These projections were made across 16 states in the US, from Louisiana in the south to Minnesota in the north. They include projections for maize, soybeans, cotton, spring wheat, and winter wheat.
Simulated weather variables based on three climate scenarios were used to project future crop yields. In addition, factors of soil characteristics, topography, and fertilizer application were used in the crop production models. Two technology scenarios were used: one simulating a future in which crop technology continues to improve and the other a future in which crop technology remains similar to where it is today.
Results showed future crop yields to be responsive to both the different climate scenarios and the different technology scenarios. The effects of a changing climate regime on crop yields varied both geographically throughout the study area and from crop to crop. One broad geographic trend was greater potential for crop yield losses in the south and greater potential for gains in the north.
Whether or not new technologies enable crop yields to continue to increase as the climate becomes less favorable is a major factor in agricultural production in the coming century. Results of this study indicate the degree to which society relies on these new technologies will be largely dependent on the degree of the warming that occurs.
Continued research into the potential negative impacts of climate change on the current crop system in the United States is needed to mitigate the widespread losses in crop productivity that could result. In addition to study of negative impacts, study should be undertaken with an interest to determine any potential new opportunities for crop development with the onset of higher temperatures as a result of climate change. Studies like this one with a broad geographic range should be complemented by studies of narrower scope that can manipulate climatic variables under controlled conditions. Investment into these types of agricultural studies will give the agricultural sector in the United States greater tools with which they can mitigate the disruptive effects of a changing climate.
Advisors/Committee Members: Christopher Lant, Emily Burchfield, Justin Schoof.

Matthews-Pennanen, N. (2018). Assessment of Potential Changes in Crop Yields in the Central United States Under Climate Change Regimes. (Masters Thesis). Utah State University. Retrieved from https://digitalcommons.usu.edu/etd/7017

Chicago Manual of Style (16th Edition):

Matthews-Pennanen, Neil. “Assessment of Potential Changes in Crop Yields in the Central United States Under Climate Change Regimes.” 2018. Masters Thesis, Utah State University. Accessed January 21, 2019.
https://digitalcommons.usu.edu/etd/7017.

MLA Handbook (7th Edition):

Matthews-Pennanen, Neil. “Assessment of Potential Changes in Crop Yields in the Central United States Under Climate Change Regimes.” 2018. Web. 21 Jan 2019.

Matthews-Pennanen N. Assessment of Potential Changes in Crop Yields in the Central United States Under Climate Change Regimes. [Masters Thesis]. Utah State University; 2018. Available from: https://digitalcommons.usu.edu/etd/7017

► The estimation of variance components serves as an integral part of the evaluation of variation, and is of interest and required in a variety of…
(more)

▼ The estimation of variance components serves as an integral part of the evaluation of variation, and is of interest and required in a variety of applications (Hugo, 2012). Estimation of the among-group variance components is often desired for quantifying the variability and effectively understanding these measurements (Van Der Rijst, 2006). The methodology for determining Bayesian tolerance intervals for the one – way random effects model has originally been proposed by Wolfinger (1998) using both informative and non-informative prior distributions (Hugo, 2012). Wolfinger (1998) also provided relationships with frequentist methodologies. From a Bayesian point of view, it is important to investigate and compare the effect on coverage probabilities if negative variance components are either replaced by zero, or completely disregarded from the simulation process. This research presents a simulation-based approach for determining Bayesian tolerance intervals in variance component models when negative variance components are either replaced by zero, or completely disregarded from the simulation process. This approach handles different kinds of tolerance intervals in a straightforward fashion. It makes use of a computer-generated sample (Monte Carlo process) from the joint posterior distribution of the mean and variance parameters to construct a sample from other relevant posterior distributions. This research makes use of only non-informative Jeffreys‟ prior distributions and uses three Bayesian simulation methods. Comparative results of different tolerance intervals obtained using a method where negative variance components are either replaced by zero or completely disregarded from the simulation process, is investigated and discussed in this research.

► In this thesis, we study three different problems in financial risk management. The first one is to prove that the sample autocovariance with lag 1…
(more)

▼ In this thesis, we study three different problems in financial risk management. The first one is to prove that the sample autocovariance with lag 1 (ACV(1)) of the increments of a mixed Poisson process converges to 0 almost surely as the sample size goes to infinity. Pólya processes, as a good candidate for modeling the arrival of events that cluster in time, are also mixed Poisson processes. The sample ACV(1) of the increments of a Pólya process converges to 0, instead of the theoretical population ACV(1) of the increments, almost surely as the sample size goes to infinity, implying that one cannot use sample ACV in parameter estimation of Pólya process by methods of moments. The second problem is about comparing the power of backtest of value-at-risk at level 1 % and expected shortfall at level 2%. With simulation, we show that 2% expected shortfall performs better than 1 % value-at-risk in detecting the misspecification of the forecast model. The third problem is about representation of risk measures incorporating uncertainty in probability measures. We propose a new set of axioms for risk measures and obtain representation of the risk measures that satisfy these axioms. The representation of the new class of risk measures, called natural risk measures, provide a theoretical framework that includes the Basel II and the Basel III risk measures as special cases.

► In the first part of this thesis, we consider a simplified version of the Wealth Game, which is an agent-based financial market model with many…
(more)

▼ In the first part of this thesis, we consider a simplified version of the Wealth Game, which is an agent-based financial market model with many interesting features resembling the real stock market. Market makers are not present in the game so that the majority traders are forced to reduce the amount of stocks they trade, in order to have a balance in the supply and demand. The strategy space is also simplified so that the market is only left with strategies resembling the decisions of optimistic or pessimistic fundamentalists and trend-followers in the real stock market. A phase transition between a trendsetters' phase and a bouncing phase is discovered in the space of price sensitivity and market impact. In the second part, analysis based on a semi-empirical approach is carried out to explain the phase transition and locate the phase boundary. Another phase transition is also observed when the fraction of trend-following strategies increases, which can be explained macroscopically by matching the supply and demand of stocks. Finally, we examine the evaluation sensitivity investment scheme, which is based on local evaluation of simple strategies. The dependence on ways to switch positions, market impact and multiple-period evaluation are studied and the performance of the best combination of methods of this scheme is judged by the random agent benchmark. The study on the probability distribution of decision switching according to this scheme reveals power law characteristics of the first-passage time of market price and supports the existence of market trends instead of pure random walk.

Cheung, W. Y. (2010). Dynamics of simplified wealth game without market makers and the testing of the evaluation sensitivity investment scheme. (Thesis). Hong Kong University of Science and Technology. Retrieved from https://doi.org/10.14711/thesis-b1114579 ; http://repository.ust.hk/ir/bitstream/1783.1-6893/1/th_redirect.html

Note: this citation may be lacking information needed for this citation format:Not specified: Masters Thesis or Doctoral Dissertation

Chicago Manual of Style (16th Edition):

Cheung, Wai Yip. “Dynamics of simplified wealth game without market makers and the testing of the evaluation sensitivity investment scheme.” 2010. Thesis, Hong Kong University of Science and Technology. Accessed January 21, 2019.
https://doi.org/10.14711/thesis-b1114579 ; http://repository.ust.hk/ir/bitstream/1783.1-6893/1/th_redirect.html.

Note: this citation may be lacking information needed for this citation format:Not specified: Masters Thesis or Doctoral Dissertation

Note: this citation may be lacking information needed for this citation format:Not specified: Masters Thesis or Doctoral Dissertation

Council of Science Editors:

Cheung WY. Dynamics of simplified wealth game without market makers and the testing of the evaluation sensitivity investment scheme. [Thesis]. Hong Kong University of Science and Technology; 2010. Available from: https://doi.org/10.14711/thesis-b1114579 ; http://repository.ust.hk/ir/bitstream/1783.1-6893/1/th_redirect.html

Note: this citation may be lacking information needed for this citation format:Not specified: Masters Thesis or Doctoral Dissertation

► In the research field of Ergonomics and Human Factors, Fitts Law has been widely studied by researchers from various aspects and applications. In this study,…
(more)

▼ In the research field of Ergonomics and Human Factors, Fitts Law has been widely studied by researchers from various aspects and applications. In this study, a validation experiment is conducted to reveal how the Fitts Law works with the difference in the reaction speed of the mouse cursor (aka. the D/C Ratio or Mouse Gain) using a hand-controlled computer mouse as an input device. The time to perform a hand control movement and its endpoint variability are two important properties when the hand movement model is being investigated. In total 16 subjects participated in this study using a computer mouse, where movement time and endpoint variability are measured to verify the models. Comparing with previous research, a relatively high D/C ratio (38.6) is also investigated. Objectives of this study are to verify Fitts’ Law for both ballistic and visual controlled movements, an Endpoint Variability model for ballistic movements, and discover patterns of an Endpoint Variability model for visual controlled movements under a wide range of D/C ratio. Eventually, this study can provide supportive knowledge of hand-controlled computer input device for Human-Computer Interaction Designers. Therefore, when a computer interface or a computer game is designed, the movement time between UI objects can be estimated with a given gain, and also applied to the field of virtual reality UI and UX designs. Keywords Fitts’ Law, Ballistic Movement, Visual Controlled Movement, Human-Computer Interface, Mouse Gain, D/C Ratio;

▼ Previous literature has documented mixed evidence on estimating mutual fund size-performance relation. This thesis tests the diseconomies of scale hypothesis of Berk and Green (2004) by exploiting an exogenous variation in capital flow generated by investors attention on a mutual fund ranking list. Wall Street Journal publishes top ten performance mutual funds for every category in each quarter, which enables a clean RD design setting. Mutual funds that just make the list receive 2.4 percentage point additional capital flow in the next quarter compared with the ones that just miss the list. I find that a 10% unexpected increase in capital flow causes 0.95 percentage point of reduction in alpha during the following quarter. The RDD estimation is around seven times larger than OLS estimation, which provides strong evidence in favour of the Berk and Green hypothesis.

► Link prediction in complex networks has found applications in a wide range of real-world domains involving relational data. The goal is to predict some hidden…
(more)

▼ Link prediction in complex networks has found applications in a wide range of real-world domains involving relational data. The goal is to predict some hidden relations between individuals based on the observed relations. Existing models are unsatisfactory when more general multiple membership in latent groups can be found in the network data. Taking the nonparametric Bayesian approach, we propose a multiple membership latent group model for link prediction. Besides, we argue that existing performance evaluation methods for link prediction, which regard it as a binary classiﬁcation problem, do not satisfy the nature of the problem. As another contribution of this work, we propose a new evaluation method by regarding link prediction as ranking. Based on this new evaluation method, we compare the proposed model with two related state-of-the-art models and ﬁnd that the proposed model can learn more compact structure from the network data.

► Simulations were performed to compare two methods that detect quantitative trait loci on plant data. Karl Broman’s interval mapping algorithm which uses only one observation…
(more)

▼ Simulations were performed to compare two methods that
detect quantitative trait loci on plant data. Karl Broman’s
interval mapping algorithm which uses only one observation value
per plant line was compared to a hierarchical Bayesian model that
allows replicates into the analysis and takes into account the
variability within each plant line. The simulation study utilized
the genetic map of Bay-0 X Shahdara plant with 38 genetic markers
on 5 chromosomes. It is shown through these simulations that the
hierarchical Bayesian model and Broman’s interval mapping algorithm
are able to detect quantitative trait loci (QTL) when only a single
location was chosen, but the hierarchical model was more powerful
when two locations were chosen. This work shows that when analyzing
plant replicates the variability within each line has a strong
impact on the success of the overall analyses.; Bayesian
statistical decision theory, Genetics – Mathematical
modelsAdvisors/Committee Members: Susan Simmons (advisor).

► This dissertation presents results of research in the development, testing and application of an automated calibration and uncertainty analysis framework for distributed environmental models based…
(more)

► This dissertation presents an approach to robot programming by demonstration based on two key concepts: demonstrator intent is the most meaningful signal that the…
(more)

▼
This dissertation presents an approach to robot programming by demonstration based on two key concepts: demonstrator intent is the most meaningful signal that the robot can observe, and the robot should have a basic level of behavioral competency from which to interpret observed actions. Intent is a teleological, robust teaching signal invariant to many common sources of noise in training. The robot can use the knowledge encapsulated in sensorimotor schemas to interpret the demonstration.
Furthermore, knowledge gained in prior demonstrations can be applied to future sessions. I argue that programming by demonstration be organized into declarative and pro-cedural components. The declarative component represents a reusable outline of
underlying behavior that can be applied to many different contexts. The procedural component represents the dynamic portion of the task that is based on features observed at run time. I describe how statisticalmodels, and Bayesian methods in particular, can be used to model these components. These models have many features that are beneficial for learning in this domain, such as tolerance for uncertainty, and the ability to incorporate prior knowledge into inferences. I demonstrate this architecture through experiments on a bimanual humanoid robot using tasks from the pick and place domain. Additionally, I develop and experimentally validate a model for generating grasp candidates using visual features that is learned from demonstration data. This model is especially useful in the context of pick and place tasks.
Advisors/Committee Members: Roderic A. Grupen, Oliver Brock, Andrew H. Fagg.

► This work investigates online purchasers and how to predict such sales. Advertising as a field has long been required to pay for itself – money…
(more)

▼ This work investigates online purchasers and how to predict such sales. Advertising as a field has long been required to pay for itself – money spent reaching potential consumers will evaporate if that potential is not realized. Academic marketers look at advertising through a traditional lens, measuring input (advertising) and output (purchases) with methods from TV and print advertising. Online advertising practitioners have developed their own models for predicting purchases. Moreover, online advertising generates an enormous amount of data, long the province of statisticians. My work sits at the intersection of these three areas: marketing, statistics and computer science. Academic statisticians have approached the modeling of response to advertising through a proportional hazard framework.
We extend that work and modify the underlying software to allow estimation of voluminous online data sets. We investigate a data visualization technique that allows online advertising histories to be compared easily. We also provide a framework to use existing clustering algorithms to better understand the paths to conversion taken by consumers. We modify an existing solution to the number-of-clusters problem to allow application to mixed-variable data sets. Finally, we marry the leading edge of online advertising conversion attribution (Engagement Mapping) to the proportional hazard model, showing how this tool can be used to find optimal settings for advertiser models of conversion attribution.

Due to its well understood nature and its ability to model many phenomena in the physical world extremely well, probability theory is the method of…
(more)

▼

Due to its well understood nature and its ability to model many phenomena in the physical world extremely well, probability theory is the method of choice for dealing with uncertainty in many science and engineering disciplines. However, as a tool for building representative models of complex real world systems, probability theory has a rather recent history which starts with the introduction of Bayesian Networks (BN).
Broadly construed, the BN model of a system is the compact representation of a joint probability distribution of the variables comprising the system. Many complex real-world systems are naturally represented by hybrid models which contain both discrete and continuous variables. However, when it comes to modeling uncertainty and to performing probabilistic inferencing about hybrid systems, what BNs have to offer is quite limited. Although exact inferencing in BNs composed only of discrete variables is well understood, no exact inferencing algorithms exist for general hybrid BNs.
In this thesis we concentrate on the problem of inferencing in Hybrid Bayesian Networks (HBNs). Our focus, hence our contributions are three-fold: theoretical, algorithmic and practical. From a theoretical point of view, we provide a novel framework to implement a hybrid methodology that complements probability theory with Fuzzy Sets to perform exact inferencing with general Hybrid Bayesian Networks that is composed of both discrete and continuous variables with no graph-structural restrictions to model uncertainty in complex systems. From an algorithmic perspective, we provide a suite of inferencing algorithms for general Hybrid Bayesian Networks. The suite includes two new inferencing algorithms for the two different types of Fuzzy-Bayesian Networks introduced in this study. Finally, from a practical perspective, we apply our framework, methodology, and techniques to the task of assessing system safety risk due to the introduction of emergent Unmanned Aircraft Systems into the National Airspace System.

Ecological problems often require multivariate analyses. Ever since Bray and Curtis (1957) drew an analogy between Euclidean distance and community dissimilarity, most multivariate ecological inference…
(more)

▼

Ecological problems often require multivariate analyses. Ever since Bray and Curtis (1957) drew an analogy between Euclidean distance and community dissimilarity, most multivariate ecological inference has been based on geometric ideas. For example, ecologists routinely use distance-based ordination methods (e.g. multidimensional scaling) to enhance the interpretability of multivariate data. More recently, distance-based diversity indices that account for functional differences between species are now routinely used. But in most other areas of science, inference is based on Fisher's (1922) likelihood concept; statisticians view likelihood as an advance over purely geometric approaches. Nevertheless, likelihood-based reasoning is rare in multivariate statistical ecology. Using ordination and functional diversity as case studies, my thesis addresses the questions: Why is likelihood rare in multivariate statistical ecology? Can likelihood be of practical use in multivariate analyses of real ecological data? Should the likelihood concept replace multidimensional geometry as the foundation for multivariate statistical ecology? I trace the history of quantitative plant ecology to argue that the geometric focus of contemporary multivariate statistical ecology is a legacy of an early 20th century debate on the nature of plant communities. Using the Rao-Blackwell and Lehmann-Scheffé theorems, which both depend on the likelihood concept, I show how to reduce bias and sampling variability in estimators of functional diversity. I also show how to use likelihood-based information criteria to select among ordination methods. Using computationally intensive Markov-chain Monte Carlo methods, I demonstrate how to expand the range of likelihood-based ordination procedures that are computationally feasible. Finally, using philosophical ideas from formal measurement theory, I argue that a likelihood-based multivariate statistical ecology outperforms the geometry-based alternative by providing a stronger connection between analysis and the real world. Likelihood should be used more often in multivariate ecology.

In this thesis, we perform a survival analysis for right-censored data of populations with a cure rate. We consider two cure rate models based…
(more)

▼

In this thesis, we perform a survival analysis for right-censored data of populations with a cure rate. We consider two cure rate models based on the Geometric distribution and Poisson distribution, which are the special cases of the Conway-Maxwell distribution. The models are based on the assumption that the number of competing causes of the event of interest follows Conway-Maxwell distribution. For various sample sizes, we implement a simulation process to generate samples with a cure rate. Under this setup, we obtain the maximum likelihood estimator (MLE) of the model parameters by using the gamlss R package. Using the asymptotic distribution of the MLE as well as the parametric bootstrap method, we discuss the construction of confidence intervals for the model parameters and their performance is then assessed through Monte Carlo simulations.

► In the last ten years important breakthroughs in the understanding of the topology of complexity have been made in the framework of network science. Indeed…
(more)

▼ In the last ten years important breakthroughs in the understanding of the topology of complexity have been made in the framework of network science. Indeed it has been found that many networks belong to the universality classes called small-world networks or scale-free networks. Moreover it was found that the complex architecture of real world networks strongly affects the critical phenomena defined on these structures. Nevertheless the main focus of the research has been the characterization of single and static networks.; Recently, temporal networks and interacting networks have attracted large interest. Indeed many networks are interacting or formed by a multilayer structure. Example of these networks are found in social networks where an individual might be at the same time part of different social networks, in economic and financial networks, in physiology or in infrastructure systems. Moreover, many networks are temporal, i.e. the links appear and disappear on the fast time scale. Examples of these networks are social networks of contacts such as face-to-face interactions or mobile-phone communication, the time-dependent correlations in the brain activity and etc. Understanding the evolution of temporal and multilayer networks and characterizing critical phenomena in these systems is crucial if we want to describe, predict and control the dynamics of complex system.; In this thesis, we investigate several statistical mechanics models of temporal and interacting networks, to shed light on the dynamics of this new generation of complex networks. First, we investigate a model of temporal social networks aimed at characterizing human social interactions such as face-to-face interactions and phone-call communication. Indeed thanks to the availability of data on these interactions, we are now in the position to compare the proposed model to the real data finding good agreement.; Second, we investigate the entropy of temporal networks and growing networks to provide a new framework to quantify the information encoded in these networks and to answer a fundamental problem in network science: how complex are temporal and growing networks.; Finally, we consider two examples of critical phenomena in interacting networks. In particular , on one side we investigate the percolation of interacting networks by introducing antagonistic interactions. On the other side, we investigate a model of political election based on the percolation of antagonistic networks . The aim of this research is to show how antagonistic interactions change the physics of critical phenomena on interacting networks.; We believe that the work presented in these thesis offers the possibility to appreciate the large variability of problems that can be addressed in the new framework of temporal and interacting networks.

► In this paper, we study a new family of random variables, that arise as the distribution of extrema of a random number N of…
(more)

▼ In this paper, we study a new family of random variables, that arise as the distribution of extrema of a random number N of independent and identically distributed random variables <em>X1,X2, ..., XN</em>, where each <em>Xi</em> has a common continuous distribution with support on [0,1]. The general scheme is first outlined, and SUG and CSUG models are introduced in detail where <em>Xi</em> is distributed as U[0,1]. Some features of the proposed distributions can be studied via its mean, variance, moments and moment-generating function. Moreover, we make some other choices for the continuous random variables such as Arcsine, Topp-Leone, and N is chosen to be Geometric or Zipf. Wherever appropriate, we estimate of the parameter in the one-parameter family in question and test the hypotheses about the parameter. In the last section, two permutation distributions are introduced and studied.