Sample records for molecular ensemble based

The demonstration of strong and ultrastrong coupling regimes of cavity QED with polyatomic molecules has opened new routes to control chemical dynamics at the nanoscale. We show that strong resonant coupling of a cavity field with an electronic transition can effectively decouple collective electronic and nuclear degrees of freedom in a disordered molecularensemble, even for molecules with high-frequency quantum vibrational modes having strong electron-vibration interactions. This type of polaron decoupling can be used to control chemical reactions. We show that the rate of electron transfer reactions in a cavity can be orders of magnitude larger than in free space for a wide class of organic molecular species.

The variant of the NVE ensemble known as the molecular dynamics ensemble was recently redefined by Ray and Zhang [Phys. Rev. E 59, 4781 (1999)] to include the specification of a time invariant G (a function of phase and, explicitly, the time) in addition to the total linear momentum M. We reformulate this ensemble slightly as the NVEMR ensemble, in which R/N is the center-of-mass position, and consider the equation of state of the hard-sphere system in this ensemble through both the virial function and the Boltzmann entropy. We test the quasiergodic hypothesis by a comparison of old molecular dynamics and Monte Carlo results for the compressibility factor of the 12-particle, hard-disk systems. The virial approach, which had previously been found to support the hypothesis in the NVEM ensemble, remains unchanged in the NVEMR ensemble. The entropy S approach depends on whether S is defined through the phase integral over the energy sphere or the energy shell, the parameter straight theta being 0 or 1, respectively. The ergodic hypothesis is found to be supported for straight theta=0 but not for straight theta=1. PMID:11304233

The variant of the NVE ensemble known as the molecular dynamics ensemble was recently redefined by Ray and Zhang [Phys. Rev. E 59, 4781 (1999)] to include the specification of a time invariant G (a function of phase and, explicitly, the time) in addition to the total linear momentum M. We reformulate this ensemble slightly as the NVEMR ensemble, in which R/N is the center-of-mass position, and consider the equation of state of the hard-sphere system in this ensemble through both the virial function and the Boltzmann entropy. We test the quasiergodic hypothesis by a comparison of old molecular dynamics and Monte Carlo results for the compressibility factor of the 12-particle, hard-disk systems. The virial approach, which had previously been found to support the hypothesis in the NVEM ensemble, remains unchanged in the NVEMR ensemble. The entropy S approach depends on whether S is defined through the phase integral over the energy sphere or the energy shell, the parameter {theta} being 0 or 1, respectively. The ergodic hypothesis is found to be supported for {theta}=0 but not for {theta}=1.

Ensemble/Legacy is a toolkit extension of the Object Technology Framework (OTF) that exposes an object oriented interface for accessing and manipulating ensembles (collections of molecular conformations that share a common chemical topology) and driving Legacy programs (such as MSMS, AMBER, X-PLOR, CORMA/MARDIGRAS, Dials and Windows, and CURVES). Ensemble/Legacy provides a natural programming interface for running Legacy programs on ensembles of molecules and accessing the resulting data. Using the OTF reduces the time cost of developing a new library to store and manipulate molecular data and also allows Ensemble/Legacy to integrate into the Chimera visualization program. The extension to Chimera exposes the Legacy functionality using a graphical user interface that greatly simplifies the process of modeling and analyzing conformational ensembles. Furthermore, all the C++ functionality of the Ensemble/Legacy toolkit is "wrapped" for use in the Python programming language. More detailed documentation on using Ensemble/Legacy is available online (http:¿picasso.nmr.ucsf.edu/dek/ensemble. html). PMID:10902174

In this paper, we propose a dynamic classifier system, MSEBAG, which is characterised by searching for the 'minimum-sufficient ensemble' and bagging at the ensemble level. It adopts an 'over-generation and selection' strategy and aims to achieve a good bias-variance trade-off. In the training phase, MSEBAG first searches for the 'minimum-sufficient ensemble', which maximises the in-sample fitness with the minimal number of base classifiers. Then, starting from the 'minimum-sufficient ensemble', a backward stepwise algorithm is employed to generate a collection of ensembles. The objective is to create a collection of ensembles with a descending fitness on the data, as well as a descending complexity in the structure. MSEBAG dynamically selects the ensembles from the collection for the decision aggregation. The extended adaptive aggregation (EAA) approach, a bagging-style algorithm performed at the ensemble level, is employed for this task. EAA searches for the competent ensembles using a score function, which takes into consideration both the in-sample fitness and the confidence of the statistical inference, and averages the decisions of the selected ensembles to label the test pattern. The experimental results show that the proposed MSEBAG outperforms the benchmarks on average.

Liouville's theorem in a grand ensemble, that is for situations where a system is in equilibrium with a reservoir of energy and particles, is a subject that, to our knowledge, has not been explicitly treated in literature related to molecular simulation. Instead, Liouville's theorem, a central concept for the correct employment of molecular simulation techniques, is implicitly considered only within the framework of systems where the total number of particles is fixed. However, the pressing demand of applied science in treating open systems leads to the question of the existence and possible exact formulation of Liouville's theorem when the number of particles changes during the dynamical evolution of the system. The intention of this paper is to stimulate a debate about this crucial issue for molecular simulation.

Recently, ensemble learning methods have been widely used to improve classification performance in machine learning. In this paper, we present a novel ensemble learning method: argumentation based multi-agent joint learning (AMAJL), which integrates ideas from multi-agent argumentation, ensemble learning, and association rule mining. In AMAJL, argumentation technology is introduced as an ensemble strategy to integrate multiple base classifiers and generate a high performance ensemble classifier. We design an argumentation framework named Arena as a communication platform for knowledge integration. Through argumentation based joint learning, high quality individual knowledge can be extracted, and thus a refined global knowledge base can be generated and used independently for classification. We perform numerous experiments on multiple public datasets using AMAJL and other benchmark methods. The results demonstrate that our method can effectively extract high quality knowledge for ensemble classifier and improve the performance of classification. PMID:25966359

Schemes for implementation of CNOT gates in atomic ensembles are important for realization of quantum computing. We present here a theoretical scheme of a CNOTN gate with an ensemble of three-level atoms in the lambda configuration and a single two-level control atom. We work in the regime of Rydberg blockade for the ensemble atoms due to excitation of the Rydberg control atom. It is shown that using STIRAP, atoms from one ground state of the ensemble can be adiabatically transferred to the other ground state, depending on the state of the control atom. A thorough analysis of adiabatic conditions for this scheme and the influence of the radiative decay is provided. We show that the CNOTN process is immune to the decay rate of the excited level in ensemble atoms. This work is supported by the ARL, the IARPA LogiQ program, and the AFOSR MURI program.

Breast cancer is molecularly heterogeneous and categorized into four molecular subtypes: Luminal-A, Luminal-B, HER2-amplified and Triple-negative. In this study, we aimed to apply an ensemble decision approach to identify the ultrasound and clinical features related to the molecular subtypes. We collected ultrasound and clinical features from 1,000 breast cancer patients and performed immunohistochemistry on these samples. We used the ensemble decision approach to select unique features and to construct decision models. The decision model for Luminal-A subtype was constructed based on the presence of an echogenic halo and post-acoustic shadowing or indifference. The decision model for Luminal-B subtype was constructed based on the absence of an echogenic halo and vascularity. The decision model for HER2-amplified subtype was constructed based on the presence of post-acoustic enhancement, calcification, vascularity and advanced age. The model for Triple-negative subtype followed two rules. One was based on irregular shape, lobulate margin contour, the absence of calcification and hypovascularity, whereas the other was based on oval shape, hypovascularity and micro-lobulate margin contour. The accuracies of the models were 83.8%, 77.4%, 87.9% and 92.7%, respectively. We identified specific features of each molecular subtype and expanded the scope of ultrasound for making diagnoses using these decision models. PMID:26046791

This critical review discusses different approaches towards protection of photoactive materials based on triplet excited state ensembles against deactivation by molecular oxygen though quenching and photooxidation mechanisms. Passive protection, based on the application of barrier materials for packaging, sealing, or encapsulation of the active substances, which prevent oxygen molecules from penetration and physical contact with excited states and active protection, based on the application of oxygen scavenging species are compared. Efficiencies of different approaches together with examples and prospects of their applications are outlined. PMID:27277068

Recently, more and more machine learning techniques have been applied to microarray data analysis. The aim of this study is to propose a genetic programming (GP) based new ensemble system (named GPES), which can be used to effectively classify different types of cancers. Decision trees are deployed as base classifiers in this ensemble framework with three operators: Min, Max, and Average. Each individual of the GP is an ensemble system, and they become more and more accurate in the evolutionary process. The feature selection technique and balanced subsampling technique are applied to increase the diversity in each ensemble system. The final ensemble committee is selected by a forward search algorithm, which is shown to be capable of fitting data automatically. The performance of GPES is evaluated using five binary class and six multiclass microarray datasets, and results show that the algorithm can achieve better results in most cases compared with some other ensemble systems. By using elaborate base classifiers or applying other sampling techniques, the performance of GPES may be further improved. PMID:25810748

Protein remote homology detection is one of the central problems in bioinformatics. Although some computational methods have been proposed, the problem is still far from being solved. In this paper, an ensemble classifier for protein remote homology detection, called SVM-Ensemble, was proposed with a weighted voting strategy. SVM-Ensemble combined three basic classifiers based on different feature spaces, including Kmer, ACC, and SC-PseAAC. These features consider the characteristics of proteins from various perspectives, incorporating both the sequence composition and the sequence-order information along the protein sequences. Experimental results on a widely used benchmark dataset showed that the proposed SVM-Ensemble can obviously improve the predictive performance for the protein remote homology detection. Moreover, it achieved the best performance and outperformed other state-of-the-art methods. PMID:27294123

The CMIP5 archive contains future climate projections from over 50 models provided by dozens of modeling centers from around the world. Individual model projections, however, are subject to biases created by structural model uncertainties. As a result, ensemble averaging of multiple models is used to add value to individual model projections and construct a consensus projection. Previous reports for the IPCC establish climate change projections based on an equal-weighted average of all model projections. However, individual models reproduce certain climate processes better than other models. Should models be weighted based on performance? Unequal ensemble averages have previously been constructed using a variety of mean state metrics. What metrics are most relevant for constraining future climate projections? This project develops a framework for systematically testing metrics in models to identify optimal metrics for unequal weighting multi-model ensembles. The intention is to produce improved ("intelligent") unequal-weight ensemble averages. A unique aspect of this project is the construction and testing of climate process-based model evaluation metrics. A climate process-based metric is defined as a metric based on the relationship between two physically related climate variables—e.g., outgoing longwave radiation and surface temperature. Several climate process metrics are constructed using high-quality Earth radiation budget data from NASA's Clouds and Earth's Radiant Energy System (CERES) instrument in combination with surface temperature data sets. It is found that regional values of tested quantities can vary significantly when comparing the equal-weighted ensemble average and an ensemble weighted using the process-based metric. Additionally, this study investigates the dependence of the metric weighting scheme on the climate state using a combination of model simulations including a non-forced preindustrial control experiment, historical simulations, and

In this paper, a genetic algorithm (GA) basedensemble support vector machine (SVM) classifier built on gene pairs (GA-ESP) is proposed. The SVMs (base classifiers of the ensemble system) are trained on different informative gene pairs. These gene pairs are selected by the top scoring pair (TSP) criterion. Each of these pairs projects the original microarray expression onto a 2-D space. Extensive permutation of gene pairs may reveal more useful information and potentially lead to an ensemble classifier with satisfactory accuracy and interpretability. GA is further applied to select an optimized combination of base classifiers. The effectiveness of the GA-ESP classifier is evaluated on both binary-class and multi-class datasets. PMID:23668348

This paper develops efficient ensemble Kalman filter (EnKF) implementations based on shrinkage covariance estimation. The forecast ensemble members at each step are used to estimate the background error covariance matrix via the Rao-Blackwell Ledoit and Wolf estimator, which has been specifically developed to approximate high-dimensional covariance matrices using a small number of samples. Two implementations are considered: in the EnKF full-space (EnKF-FS) approach, the assimilation process is performed in the model space, while the EnKF reduce-space (EnKF-RS) formulation performs the analysis in the subspace spanned by the ensemble members. In the context of EnKF-RS, additional samples are taken from the normal distribution described by the background ensemble mean and the estimated background covariance matrix, in order to increase the size of the ensemble and reduce the sampling error of the filter. This increase in the size of the ensemble is obtained without running the forward model. After the assimilation step, the additional samples are discarded and only the model-basedensemble members are propagated further. Methodologies to reduce the impact of spurious correlations and under-estimation of sample variances in the context of the EnKF-FS and EnKF-RS implementations are discussed. An adjoint-free four-dimensional extension of EnKF-RS is also discussed. Numerical experiments carried out with the Lorenz-96 model and a quasi-geostrophic model show that the use of shrinkage covariance matrix estimation can mitigate the impact of spurious correlations during the assimilation process.

A molecular simulation method to study the dynamics of chemically reacting mixtures is presented. The method uses a combination of stochastic and dynamic simulation steps, allowing for the simulation of both thermodynamic and transport properties. The method couples a molecular dynamics simulation cell (termed dynamic cell) to a reaction mixture simulation cell (termed control cell) that is formulated upon the reaction ensemble Monte Carlo (RxMC) method, hence the term reaction ensemblemolecular dynamics. Thermodynamic and transport properties are calculated in the dynamic cell by using a constant-temperature molecular dynamics simulation method. RxMC forward and reverse reaction steps are performed in the control cell only, while molecular dynamics steps are performed in both the dynamic cell and the control cell. The control cell, which acts as a sink and source reservoir, is maintained at reaction equilibrium conditions via the RxMC algorithm. The reaction ensemblemolecular dynamics method is analogous to the grand canonical ensemblemolecular dynamics technique, while using some elements of the osmotic molecular dynamics method, and so simulates conditions that directly relate to real, open systems. The accuracy and stability of the method is assessed by considering the ammonia synthesis reaction N2+3H2⇔2NH3 . It is shown to be a viable method for predicting the effects of nonideal environments on the dynamic properties (particularly diffusion) as well as reaction equilibria for chemically reacting mixtures.

In this report we compare time sampling and ensemble averaging as two different methods available for phase space sampling. For the comparison, we calculate thermal conductivities of solid argon and silicon structures, using equilibrium molecular dynamics. We introduce two different schemes for the ensemble averaging approach, and show that both can reduce the total simulation time as compared to time averaging. It is also found that velocity rescaling is an efficient mechanism for phase space exploration. Although our methodology is tested using classical molecular dynamics, the ensemble generation approaches may find their greatest utility in computationally expensive simulations such asmore » first principles molecular dynamics. For such simulations, where each time step is costly, time sampling can require long simulation times because each time step must be evaluated sequentially and therefore phase space averaging is achieved through sequential operations. On the other hand, with ensemble averaging, phase space sampling can be achieved through parallel operations, since each ensemble is independent. For this reason, particularly when using massively parallel architectures, ensemble sampling can result in much shorter simulation times and exhibits similar overall computational effort.« less

In this report we compare time sampling and ensemble averaging as two different methods available for phase space sampling. For the comparison, we calculate thermal conductivities of solid argon and silicon structures, using equilibrium molecular dynamics. We introduce two different schemes for the ensemble averaging approach, and show that both can reduce the total simulation time as compared to time averaging. It is also found that velocity rescaling is an efficient mechanism for phase space exploration. Although our methodology is tested using classical molecular dynamics, the ensemble generation approaches may find their greatest utility in computationally expensive simulations such as first principles molecular dynamics. For such simulations, where each time step is costly, time sampling can require long simulation times because each time step must be evaluated sequentially and therefore phase space averaging is achieved through sequential operations. On the other hand, with ensemble averaging, phase space sampling can be achieved through parallel operations, since each ensemble is independent. For this reason, particularly when using massively parallel architectures, ensemble sampling can result in much shorter simulation times and exhibits similar overall computational effort.

The use of ensemble prediction systems allows for an uncertainty estimation of the forecast. Most end users do not require all the information contained in an ensemble and prefer the use of a single uncertainty measure. This measure is the ensemble spread which serves to forecast the forecast error. It is however unclear how best the quality of these forecasts can be performed, based on spread and forecast error only. The spread-error verification is intricate for two reasons: First for each probabilistic forecast only one observation is substantiated and second, the spread is not meant to provide an exact prediction for the error. Despite these facts several advances were recently made, all based on traditional deterministic verification of the error forecast. In particular, Grimit and Mass (2007) and Hopson (2014) considered in detail the strengths and weaknesses of the spread-error correlation, while Christensen et al (2014) developed a proper-score extension of the mean squared error. However, due to the strong variance of the error given a certain spread, the error forecast should be preferably considered as probabilistic in nature. In the present work, different probabilistic error models are proposed depending on the spread-error metrics used. Most of these models allow for the discrimination of a perfect forecast from an imperfect one, independent of the underlying ensemble distribution. The new spread-error scores are tested on the ensemble prediction system of the European Centre of Medium-range forecasts (ECMWF) over Europe and Africa. ReferencesChristensen, H. M., Moroz, I. M. and Palmer, T. N., 2014, Evaluation of ensemble forecast uncertainty using a new proper score: application to medium-range and seasonal forecasts. In press, Quarterly Journal of the Royal Meteorological Society. Grimit, E. P., and C. F. Mass, 2007: Measuring the ensemble spread-error relationship with a probabilistic approach: Stochastic ensemble results. Mon. Wea. Rev., 135, 203

Ensemble docking can be a successful virtual screening technique that addresses the innate conformational heterogeneity of macromolecular drug targets. Yet, lacking a method to identify a subset of conformational states that effectively segregates active and inactive small molecules, ensemble docking may result in the recommendation of a large number of false positives. Here, three knowledge-based methods that construct structural ensembles for virtual screening are presented. Each method selects ensembles by optimizing an objective function calculated using the receiver operating characteristic (ROC) curve: either the area under the ROC curve (AUC) or a ROC enrichment factor (EF). As the number of receptor conformations, N, becomes large, the methods differ in their asymptotic scaling. Given a set of small molecules with known activities and a collection of target conformations, the most resource intense method is guaranteed to find the optimal ensemble but scales as O(2N). A recursive approximation to the optimal solution scales as O(N2), and a more severe approximation leads to a faster method that scales linearly, O(N). The techniques are generally applicable to any system, and we demonstrate their effectiveness on the androgen nuclear hormone receptor (AR), cyclin-dependent kinase 2 (CDK2), and the peroxisome proliferator-activated receptor δ (PPAR-δ) drug targets. Conformations that consisted of a crystal structure and molecular dynamics simulation cluster centroids were used to form AR and CDK2 ensembles. Multiple available crystal structures were used to form PPAR-δ ensembles. For each target, we show that the three methods perform similarly to one another on both the training and test sets. PMID:27097522

Ensemble docking can be a successful virtual screening technique that addresses the innate conformational heterogeneity of macromolecular drug targets. Yet, lacking a method to identify a subset of conformational states that effectively segregates active and inactive small molecules, ensemble docking may result in the recommendation of a large number of false positives. Here, three knowledge-based methods that construct structural ensembles for virtual screening are presented. Each method selects ensembles by optimizing an objective function calculated using the receiver operating characteristic (ROC) curve: either the area under the ROC curve (AUC) or a ROC enrichment factor (EF). As the number of receptor conformations, N, becomes large, the methods differ in their asymptotic scaling. Given a set of small molecules with known activities and a collection of target conformations, the most resource intense method is guaranteed to find the optimal ensemble but scales as O(2(N)). A recursive approximation to the optimal solution scales as O(N(2)), and a more severe approximation leads to a faster method that scales linearly, O(N). The techniques are generally applicable to any system, and we demonstrate their effectiveness on the androgen nuclear hormone receptor (AR), cyclin-dependent kinase 2 (CDK2), and the peroxisome proliferator-activated receptor δ (PPAR-δ) drug targets. Conformations that consisted of a crystal structure and molecular dynamics simulation cluster centroids were used to form AR and CDK2 ensembles. Multiple available crystal structures were used to form PPAR-δ ensembles. For each target, we show that the three methods perform similarly to one another on both the training and test sets. PMID:27097522

Fluorescence lifetimes of thin, rhodamine 6G-doped polymer layers in front of a mirror have been determined as a function of the emitter-mirror separation and the conditions of excitation and observation. Lifetime is well known to depend on the spatial emitter-mirror separation. The explanation of experimental data needs to consider direction, polarization, and numerical aperture of the experimental system. As predicted theoretically, experimental results depend on the conditions of illumination and observation, because of the different lifetimes of emitters aligned horizontally or vertically with respect to the plane of interfaces and their selection by the experimental system. This effect is not observable when ions are used as a source of fluorescence, because ensemble averaging depends on the properties of sources.

A major challenge in understanding the cellular diversity of the brain has been linking activity during behavior with standard cellular typology. For example, it has not been possible to determine whether principal neurons in prefrontal cortex active during distinct experiences represent separable cell types, and it is not known whether these differentially active cells exert distinct causal influences on behavior. Here, we develop quantitative hydrogel-based technologies to connect activity in cells reporting on behavioral experience with measures for both brain-wide wiring and molecular phenotype. We find that positive and negative-valence experiences in prefrontal cortex are represented by cell populations that differ in their causal impact on behavior, long-range wiring, and gene expression profiles, with the major discriminant being expression of the adaptation-linked gene NPAS4. These findings illuminate cellular logic of prefrontal cortex information processing and natural adaptive behavior and may point the way to cell-type-specific understanding and treatment of disease-associated states. PMID:27238022

With the objective of understanding the usefulness of thermostats in the study of dynamic critical phenomena in fluids, we present results for transport properties in a binary Lennard-Jones fluid that exhibits liquid-liquid phase transition. Various collective transport properties, calculated from the molecular dynamics (MD) simulations in canonical ensemble, with different thermostats, are compared with those obtained from MD simulations in microcanonical ensemble. It is observed that the Nosé-Hoover and dissipative particle dynamics thermostats are useful for the calculations of mutual diffusivity and shear viscosity. The Nosé-Hoover thermostat, however, as opposed to the latter, appears inadequate for the study of bulk viscosity. PMID:26687057

Global Circulation Models (GCMs) are sophisticated tools to study the future evolution of the climate. Yet, the coarse scale of GCMs of hundreds of kilometers raises questions about the suitability for agricultural impact assessments. These assessments are often made at field level and require consideration of interactions at sub-GCM grid scale (e.g., elevation-dependent climatic changes). Regional climate models (RCMs) were developed to provide climate projections at a spatial scale of 25-50 km for limited regions, e.g. Europe (Giorgi and Mearns, 1991). Climate projections from GCMs or RCMs are available as multi-model ensembles. These ensembles are based on large data sets of simulations produced by modelling groups worldwide, who performed a set of coordinated climate experiments in which climate models were run for a common set of experiments and various emissions scenarios (Knutti et al., 2010). The use of multi-model ensembles in climate change studies is an important step in quantifying uncertainty in impact predictions, which will underpin more informed decisions for adaptation and mitigation to changing climate (Semenov and Stratonovitch, 2010). The objective of our study was to evaluate the effect of the spatial scale of climate projections on climate change impacts for cereals in Belgium. Climate scenarios were based on two multi-model ensembles, one comprising 15 GCMs of the Coupled Model Intercomparison Project phase 3 (CMIP3; Meehl et al., 2007) with spatial resolution of 200-300 km, the other comprising 9 RCMs of the EU-ENSEMBLES project (van der Linden and Mitchell, 2009) with spatial resolution of 25 km. To be useful for agricultural impact assessments, the projections of GCMs and RCMs were downscaled to the field level. Long series (240 cropping seasons) of local-scale climate scenarios were generated by the LARS-WG weather generator (Semenov et al., 2010) via statistical inference. Crop growth and development were simulated with the Aqua

Non-processive molecular motors have to work together in ensembles in order to generate appreciable levels of force or movement. In skeletal muscle, for example, hundreds of myosin II molecules cooperate in thick filaments. In non-muscle cells, by contrast, small groups with few tens of non-muscle myosin II motors contribute to essential cellular processes such as transport, shape changes, or mechanosensing. Here we introduce a detailed and analytically tractable model for this important situation. Using a three-state crossbridge model for the myosin II motor cycle and exploiting the assumptions of fast power stroke kinetics and equal load sharing between motors in equivalent states, we reduce the stochastic reaction network to a one-step master equation for the binding and unbinding dynamics (parallel cluster model) and derive the rules for ensemble movement. We find that for constant external load, ensemble dynamics is strongly shaped by the catch bond character of myosin II, which leads to an increase of the fraction of bound motors under load and thus to firm attachment even for small ensembles. This adaptation to load results in a concave force-velocity relation described by a Hill relation. For external load provided by a linear spring, myosin II ensembles dynamically adjust themselves towards an isometric state with constant average position and load. The dynamics of the ensembles is now determined mainly by the distribution of motors over the different kinds of bound states. For increasing stiffness of the external spring, there is a sharp transition beyond which myosin II can no longer perform the power stroke. Slow unbinding from the pre-power-stroke state protects the ensembles against detachment.

Non-processive molecular motors have to work together in ensembles in order to generate appreciable levels of force or movement. In skeletal muscle, for example, hundreds of myosin II molecules cooperate in thick filaments. In non-muscle cells, by contrast, small groups with few tens of non-muscle myosin II motors contribute to essential cellular processes such as transport, shape changes, or mechanosensing. Here we introduce a detailed and analytically tractable model for this important situation. Using a three-state crossbridge model for the myosin II motor cycle and exploiting the assumptions of fast power stroke kinetics and equal load sharing between motors in equivalent states, we reduce the stochastic reaction network to a one-step master equation for the binding and unbinding dynamics (parallel cluster model) and derive the rules for ensemble movement. We find that for constant external load, ensemble dynamics is strongly shaped by the catch bond character of myosin II, which leads to an increase of the fraction of bound motors under load and thus to firm attachment even for small ensembles. This adaptation to load results in a concave force-velocity relation described by a Hill relation. For external load provided by a linear spring, myosin II ensembles dynamically adjust themselves towards an isometric state with constant average position and load. The dynamics of the ensembles is now determined mainly by the distribution of motors over the different kinds of bound states. For increasing stiffness of the external spring, there is a sharp transition beyond which myosin II can no longer perform the power stroke. Slow unbinding from the pre-power-stroke state protects the ensembles against detachment.

Polarimetric SAR image interpretation has become one of the most interesting topics, in which the construction of the reasonable and effective technique of image classification is of key importance. Sparse representation represents the data using the most succinct sparse atoms of the over-complete dictionary and the advantages of sparse representation also have been confirmed in the field of PolSAR classification. However, it is not perfect, like the ordinary classifier, at different aspects. So ensemble learning is introduced to improve the issue, which makes a plurality of different learners training and obtained the integrated results by combining the individual learner to get more accurate and ideal learning results. Therefore, this paper presents a polarimetric SAR image classification method based on the ensemble learning of sparse representation to achieve the optimal classification.

MERRA-2 is the latest Aerosol Reanalysis produced at NASA's Global Modeling Assimilation Office (GMAO) from 1979 to present. This reanalysis is based on a version of the GEOS-5 model radiatively coupled to GOCART aerosols and includes assimilation of bias corrected Aerosol Optical Depth (AOD) from AVHRR over ocean, MODIS sensors on both Terra and Aqua satellites, MISR over bright surfaces and AERONET data. In order to assimilate lidar profiles of aerosols, we are updating the aerosol component of our assimilation system to an Ensemble Kalman Filter (EnKF) type of scheme using ensembles generated routinely by the meteorological assimilation. Following the work performed with the first NASA's aerosol reanalysis (MERRAero), we first validate the vertical structure of MERRA-2 aerosol assimilated fields using CALIOP data over regions of particular interest during 2008.

The identification of bound conformations, namely, conformations adopted by ligands when binding their target is critical for target-based and ligand-based drug design. Bound conformations could be obtained computationally from unbound conformational ensembles generated by conformational search tools. However, these tools also generate many nonrelevant conformations thus requiring a focusing mechanism. To identify such a mechanism, this work focuses on a comparison of energies and structural properties of bound and unbound conformations for a set of FDA approved drugs whose complexes are available in the PDB. Unbound conformational ensembles were initially obtained with three force fields. These were merged, clustered, and reminimized using the same force fields and four QM methods. Bound conformations of all ligands were represented by their crystal structures or by approximations to these structures. Energy differences were calculated between global minima of the unbound state or the Boltzmann averaged energies of the unbound ensemble and the approximated bound conformations. Ligand conformations which resemble the X-ray conformation (RMSD < 1.0 Å) were obtained in 91%-97% and 96%-98% of the cases using the ensembles generated by the individual force fields and the reminimized ensembles, respectively, yet only in 52%-56% (original ensembles) and 47%-65% (reminimized ensembles) as global energy minima. The energy window within which the different methods identified the bound conformation (approximated by its closest local energy minimum) was found to be at 4-6 kcal/mol with respect to the global minimum and marginally lower with respect to a Boltzmann averaged energy of the unbound ensemble. Better approximations to the bound conformation obtained with a constrained minimization using the crystallographic B-factors or with a newly developed Knee Point Detection (KPD) method gave lower values (2-5 kcal/mol). Overall, QM methods gave lower energy differences than

Motivation: Comparing protein tertiary structures is a fundamental procedure in structural biology and protein bioinformatics. Structure comparison is important particularly for evaluating computational protein structure models. Most of the model structure evaluation methods perform rigid body superimposition of a structure model to its crystal structure and measure the difference of the corresponding residue or atom positions between them. However, these methods neglect intrinsic flexibility of proteins by treating the native structure as a rigid molecule. Because different parts of proteins have different levels of flexibility, for example, exposed loop regions are usually more flexible than the core region of a protein structure, disagreement of a model to the native needs to be evaluated differently depending on the flexibility of residues in a protein. Results: We propose a score named FlexScore for comparing protein structures that consider flexibility of each residue in the native state of proteins. Flexibility information may be extracted from experiments such as NMR or molecular dynamics simulation. FlexScore considers an ensemble of conformations of a protein described as a multivariate Gaussian distribution of atomic displacements and compares a query computational model with the ensemble. We compare FlexScore with other commonly used structure similarity scores over various examples. FlexScore agrees with experts’ intuitive assessment of computational models and provides information of practical usefulness of models. Availability and implementation: https://bitbucket.org/mjamroz/flexscore Contact: dkihara@purdue.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27307633

The extended system Hamiltonian for carrying out grand canonical ensemblemolecular dynamics simulations is reformulated. This new Hamiltonian includes a generalized treatment of the reference state partition function of the total chemical potential that reproduces the ideal gas behavior and various previous partitionings of ideal and excess terms. Initial calculations are performed on a system of Lennard-Jones particles near the triple point and on liquid water at room temperature.

The electrocardiogram (ECG) has been used extensively for detection of heart disease. Frequently the signal is corrupted by various kinds of noise such as muscle noise, electromyogram (EMG) interference, instrument noise etc. In this paper, a new ECG denoising method is proposed based on the recently developed ensemble empirical mode decomposition (EEMD). Noisy ECG signal is decomposed into a series of intrinsic mode functions (IMFs). The statistically significant information content is build by the empirical energy model of IMFs. Noisy ECG signal collected from clinic recording is processed using the method. The results show that on contrast with traditional methods, the novel denoising method can achieve the optimal denoising of the ECG signal.

Gene expression deviates from its normal composition in case a patient has cancer. This variation can be used as an effective tool to find cancer. In this study, we propose a novel gene expressions based colon classification scheme (GECC) that exploits the variations in gene expressions for classifying colon gene samples into normal and malignant classes. Novelty of GECC is in two complementary ways. First, to cater overwhelmingly larger size of gene based data sets, various feature extraction strategies, like, chi-square, F-Score, principal component analysis (PCA) and minimum redundancy and maximum relevancy (mRMR) have been employed, which select discriminative genes amongst a set of genes. Second, a majority voting basedensemble of support vector machine (SVM) has been proposed to classify the given gene based samples. Previously, individual SVM models have been used for colon classification, however, their performance is limited. In this research study, we propose an SVM-ensemblebased new approach for gene based classification of colon, wherein the individual SVM models are constructed through the learning of different SVM kernels, like, linear, polynomial, radial basis function (RBF), and sigmoid. The predicted results of individual models are combined through majority voting. In this way, the combined decision space becomes more discriminative. The proposed technique has been tested on four colon, and several other binary-class gene expression data sets, and improved performance has been achieved compared to previously reported gene based colon cancer detection techniques. The computational time required for the training and testing of 208 × 5,851 data set has been 591.01 and 0.019 s, respectively. PMID:26357050

Grand canonical ensemblemolecular dynamics simulation is employed to calculate the solubility of water in polyamide-6,6. It is shown that performing two separate simulations, one in the polymeric phase and one in the gaseous phase, is sufficient to find the phase coexistence point. In this method, the chemical potential of water in the polymer phase is expanded as a first-order Taylor series in terms of pressure. Knowing the chemical potential of water in the polymer phase in terms of pressure, another simulation for water in the gaseous phase, in the grand canonical ensemble, is done in which the target chemical potential is set in terms of pressure in the gas phase. The phase coexistence point can easily be calculated from the results of these two independent simulations. Our calculated sorption isotherms and solubility coefficients of water in polyamide-6,6, over a wide range of temperatures and pressures, agree with experimental data. PMID:21031194

In this paper, we propose a new and accurate technique for uncertainty analysis and uncertainty visualization based on fiber orientation distribution function (ODF) glyphs, associated with high angular resolution diffusion imaging (HARDI). Our visualization applies volume rendering techniques to an ensemble of 3D ODF glyphs, which we call SIP functions of diffusion shapes, to capture their variability due to underlying uncertainty. This rendering elucidates the complex heteroscedastic structural variation in these shapes. Furthermore, we quantify the extent of this variation by measuring the fraction of the volume of these shapes, which is consistent across all noise levels, the certain volume ratio. Our uncertainty analysis and visualization framework is then applied to synthetic data, as well as to HARDI human-brain data, to study the impact of various image acquisition parameters and background noise levels on the diffusion shapes. PMID:24466504

This paper describes the development of glucose biosensors based on carbon nanotube (CNT) nanoelectrode ensembles (NEEs) for the selective detection of glucose. Glucose oxidase was covalently immobilized on CNT NEEs via carbodiimide chemistry by forming amide linkages between their amine residues and carboxylic acid groups on the CNT tips. The catalytic reduction of hydrogen peroxide liberated from the enzymatic reaction of glucose oxidase upon the glucose and oxygen on CNT NEEs leads to the selective detection of glucose. The biosensor effectively performs selective electrochemical analysis of glucose in the presence of common interferents (e.g. acetaminophen, uric and ascorbic acids), avoiding the generation of an overlapping signal from such interferents. Such an operation eliminates the need for permselective membrane barriers or artificial electron mediators, thus greatly simplifying the sensor design and fabrication.

The rapidly expanding body of available genomic and protein structural data provides a rich resource for understanding protein dynamics with biomolecular simulation. While computational infrastructure has grown rapidly, simulations on an omics scale are not yet widespread, primarily because software infrastructure to enable simulations at this scale has not kept pace. It should now be possible to study protein dynamics across entire (super)families, exploiting both available structural biology data and conformational similarities across homologous proteins. Here, we present a new tool for enabling high-throughput simulation in the genomics era. Ensembler takes any set of sequences-from a single sequence to an entire superfamily-and shepherds them through various stages of modeling and refinement to produce simulation-ready structures. This includes comparative modeling to all relevant PDB structures (which may span multiple conformational states of interest), reconstruction of missing loops, addition of missing atoms, culling of nearly identical structures, assignment of appropriate protonation states, solvation in explicit solvent, and refinement and filtering with molecular simulation to ensure stable simulation. The output of this pipeline is an ensemble of structures ready for subsequent molecular simulations using computer clusters, supercomputers, or distributed computing projects like Folding@home. Ensembler thus automates much of the time-consuming process of preparing protein models suitable for simulation, while allowing scalability up to entire superfamilies. A particular advantage of this approach can be found in the construction of kinetic models of conformational dynamics-such as Markov state models (MSMs)-which benefit from a diverse array of initial configurations that span the accessible conformational states to aid sampling. We demonstrate the power of this approach by constructing models for all catalytic domains in the human tyrosine kinase

The rapidly expanding body of available genomic and protein structural data provides a rich resource for understanding protein dynamics with biomolecular simulation. While computational infrastructure has grown rapidly, simulations on an omics scale are not yet widespread, primarily because software infrastructure to enable simulations at this scale has not kept pace. It should now be possible to study protein dynamics across entire (super)families, exploiting both available structural biology data and conformational similarities across homologous proteins. Here, we present a new tool for enabling high-throughput simulation in the genomics era. Ensembler takes any set of sequences—from a single sequence to an entire superfamily—and shepherds them through various stages of modeling and refinement to produce simulation-ready structures. This includes comparative modeling to all relevant PDB structures (which may span multiple conformational states of interest), reconstruction of missing loops, addition of missing atoms, culling of nearly identical structures, assignment of appropriate protonation states, solvation in explicit solvent, and refinement and filtering with molecular simulation to ensure stable simulation. The output of this pipeline is an ensemble of structures ready for subsequent molecular simulations using computer clusters, supercomputers, or distributed computing projects like Folding@home. Ensembler thus automates much of the time-consuming process of preparing protein models suitable for simulation, while allowing scalability up to entire superfamilies. A particular advantage of this approach can be found in the construction of kinetic models of conformational dynamics—such as Markov state models (MSMs)—which benefit from a diverse array of initial configurations that span the accessible conformational states to aid sampling. We demonstrate the power of this approach by constructing models for all catalytic domains in the human tyrosine

The interactions of transmembrane (TM) α-helices with the phospholipid membrane and with one another are central to understanding the structure and stability of integral membrane proteins. These interactions may be analysed via coarse-grained molecular dynamics (CGMD) simulations. To obtain statistically meaningful analysis of TM helix interactions, large (N ca. 100) ensembles of CGMD simulations are needed. To facilitate the running and analysis of such ensembles of simulations we have developed Sidekick, an automated pipeline software for performing high throughput CGMD simulations of α-helical peptides in lipid bilayer membranes. Through an end-to-end approach, which takes as input a helix sequence and outputs analytical metrics derived from CGMD simulations, we are able to predict the orientation and likelihood of insertion into a lipid bilayer of a given helix of family of helix sequences. We illustrate this software via analysis of insertion into a membrane of short hydrophobic TM helices containing a single cationic arginine residue positioned at different positions along the length of the helix. From analysis of these ensembles of simulations we estimate apparent energy barriers to insertion which are comparable to experimentally determined values. In a second application we use CGMD simulations to examine self-assembly of dimers of TM helices from the ErbB1 receptor tyrosine kinase, and analyse the numbers of simulation repeats necessary to obtain convergence of simple descriptors of the mode of packing of the two helices within a dimer. Our approach offers proof-of-principle platform for the further employment of automation in large ensemble CGMD simulations of membrane proteins. PMID:26580541

In recent years, Smart Water Grid (SWG) concept has globally emerged over the last decade and also gained significant recognition in South Korea. Especially, there has been growing interest in water demand forecast and optimal pump operation and this has led to various studies regarding energy saving and improvement of water supply reliability. Existing water demand forecasting models are categorized into two groups in view of modeling and predicting their behavior in time series. One is to consider embedded patterns such as seasonality, periodicity and trends, and the other one is an autoregressive model that is using short memory Markovian processes (Emmanuel et al., 2012). The main disadvantage of the abovementioned model is that there is a limit to predictability of water demands of about sub-daily scale because the system is nonlinear. In this regard, this study aims to develop a nonlinear ensemble model for hourly water demand forecasting which allow us to estimate uncertainties across different model classes. The proposed model is consist of two parts. One is a multi-model scheme that is based on combination of independent prediction model. The other one is a cross validation scheme named Bagging approach introduced by Brieman (1996) to derive weighting factors corresponding to individual models. Individual forecasting models that used in this study are linear regression analysis model, polynomial regression, multivariate adaptive regression splines(MARS), SVM(support vector machine). The concepts are demonstrated through application to observed from water plant at several locations in the South Korea. Keywords: water demand, non-linear model, the ensemble forecasting model, uncertainty. Acknowledgements This subject is supported by Korea Ministry of Environment as "Projects for Developing Eco-Innovation Technologies (GT-11-G-02-001-6)

The Ensembl project (http://www.ensembl.org) provides genome information for sequenced chordate genomes with a particular focus on human, mouse, zebrafish and rat. Our resources include evidenced-based gene sets for all supported species; large-scale whole genome multiple species alignments across vertebrates and clade-specific alignments for eutherian mammals, primates, birds and fish; variation data resources for 17 species and regulation annotations based on ENCODE and other data sets. Ensembl data are accessible through the genome browser at http://www.ensembl.org and through other tools and programmatic interfaces. PMID:23203987

The Gibbs ensemblemolecular dynamics algorithm introduced in the preceding paper (paper I) [C. Bratschi and H. Huber, J. Chem. Phys. v126, 164104 (2007)] is applied to two recently published CO2 ab initio pair potentials, the Bock-Bich-Vogel and symmetry-adapted perturbation theory site-site potentials. The critical properties of these potentials are calculated for the first time. Critical values and points in the single and two-phase zones are compared with Monte Carlo results to demonstrate the accuracy of the molecular dynamics algorithm, and are compared with experiment to test the accuracy of the potentials. Pressure calculations in the liquid, gas, and supercritical states are carried out and are used to explain potential-related effects and systematic discrepancies. The best ab initio potential yields results in good agreement with experiment. PMID:17477587

The Gibbs ensemblemolecular dynamics algorithm introduced in the preceding paper (paper I) [C. Bratschi and H. Huber, J. Chem. Phys. v126, 164104 (2007)] is applied to two recently published CO2 ab initio pair potentials, the Bock-Bich-Vogel and symmetry-adapted perturbation theory site-site potentials. The critical properties of these potentials are calculated for the first time. Critical values and points in the single and two-phase zones are compared with Monte Carlo results to demonstrate the accuracy of the molecular dynamics algorithm, and are compared with experiment to test the accuracy of the potentials. Pressure calculations in the liquid, gas, and supercritical states are carried out and are used to explain potential-related effects and systematic discrepancies. The best ab initio potential yields results in good agreement with experiment.

Huaihe catchment plays a very important role in the political, economic, and cultural development in China. However, hydrological disasters frequently occur in Huaihe catchment, and thus hydrological simulation in this area has very important significance. The Variable Infiltration Capacity(VIC)model, a macroscale distributed hydrological model is applied to the upper Huaihe Catchment, to simulate the discharge of the basin outlet Bengbu station from 1970 to 1999. The uncertainty in the calibration of VIC model parameters has been analyzed, and the best set of parameters in the training period of 1970~1993 is achieved using the Generalized Likelihood Uncertainty Estimation (GLUE) method. The study also addresses the influence of different likelihood functions for the parameter sensitivity as well as the uncertainty of discharge simulation. Results show that among the six chosen parameters, the soil thickness of the second layer (d2) is the most sensitive one, followed by the saturation capacity curve shape parameter (B). Moreover, the parameter selection is sensitive to different likelihood functions. For example, the soil thickness of the third layer (d3) is sensitive when using Nash coefficient as the likelihood function, while d3 is not sensitive when using relative error as the likelihood function. With the 95% confidence interval, the coverage rate of the simulated discharge versus the observed discharge is small (around 0.4), indicating that the uncertainty in the model is large. The coverage rate of selecting relative error as the likelihood function is bigger than that of selecting Nash coefficient. Based on the calibration and sensitivity studies, hydrological ensemble forecasts have been established using multiple parameter sets. The ensemble mean forecasts show better simulations than the control forecast (i.e. the simulation using the best set of parameters) for the long-term trend of discharge, while the control forecast is better in the simulation of

Ensemble sensitivity can reveal important weather features early in a forecast window relevant to the predictability of high-impact events later in time. Sensitivity has been shown on synoptic scales with simulated observations to be useful in identifying ensemble subsets that are more likely than the full ensemble mean, which may potentially add value to operational guidance of high-impact events. On convective scales, with highly nonlinear ensemble perturbation evolution and very non-Gaussian distributions of severe weather responses (e.g., simulated reflectivity above some threshold), it becomes more difficult to apply linear-basedensemble sensitivity to improve predictability of severe events. Here we test the ability of ensemble sensitivity to improve predictability of a severe convective event through identifying errors in sensitive regions of different members early in a forecast period using radar and surface-based observations. In this case, through the inspection of a number of operational models, an overnight mesoscale convective system (MCS) and its associated cold pool appeared to strongly influence whether or not severe convection would occur the following afternoon. Since both the overnight MCS and next-day convection are associated with strong nonlinearity and non-Gaussian distributions in the ensemble, this case allows a rigid test of using ensemble sensitivity and related techniques with observations for convective events. The performance of the sensitivity-based technique will be presented, and integration into an operational tool for severe convection will be discussed.

It is a challenging task to automatically segment glioblastoma multiforme (GBM) brain tumors on T1w post-contrast isotropic MR images. A semi-automated system using fuzzy connectedness has recently been developed for computing the tumor volume that reduces the cost of manual annotation. In this study, we propose a an ensemble method that combines multiple segmentation results into a final ensemble one. The method is evaluated on a dataset of 20 cases from a multi-center pharmaceutical drug trial and compared to the fuzzy connectedness method. Three individual methods were used in the framework: fuzzy connectedness, GrowCut, and voxel classification. The combination method is a confidence map averaging (CMA) method. The CMA method shows an improved ROC curve compared to the fuzzy connectedness method (p < 0.001). The CMA ensemble result is more robust compared to the three individual methods.

Molecular Dynamics (MD) and Monte Carlo (MC) simulations are the most popular simulation techniques for many-particle systems. Although they are often applied to similar systems, it is unclear to which extent one has to expect quantitative agreement of the two simulation techniques. In this work, we present a quantitative comparison of MD and MC simulations in the microcanonical ensemble. For three test examples, we study first- and second-order phase transitions with a focus on liquid-gas like transitions. We present MD analysis techniques to compensate for conservation law effects due to linear and angular momentum conservation. Additionally, we apply the weighted histogram analysis method to microcanonical histograms reweighted from MD simulations. By this means, we are able to estimate the density of states from many microcanonical simulations at various total energies. This further allows us to compute estimates of canonical expectation values. PMID:26450299

Identifying the most appropriate hydrological model for a given problem is more than fitting the parameters of a fixed model structure to reproduce the measured hydrograph. Defining the most appropriate model structure is dependent of the modeling objective, the characteristics of the system under investigation and the available data. To be able to adapt to the different conditions and to propose different hypotheses of the underlying system, a flexible model structure is preferred in combination with a rejectionist analysis based on different diagnostics supporting the model objective. By confronting the model structures with the model diagnostics, an identification of the dominant processes is attempted. In the presented work, a set of 24 model structures was constructed, by combining interchangeable components representing different hypotheses of the system under study, the Nete catchment in Belgium. To address the effect of different model diagnostics on the performance of the selected model structures, an optimization of the model structures was performed to identify the parameter sets minimizing specific objective functions, focusing on low or high flow conditions. Furthermore, the different model structures are compared simultaneously within the Generalized Likelihood Uncertainty Estimation (GLUE) approach. The rejection of inadequate model structures by specifying limits of acceptance and weighting of the accepted ones is the basis of the GLUE approach. Multiple measures are combined to give guidance about the suitability of the different structures and information about the identifiability and uncertainty of the parameters is extracted from the ensemble of selected structures. The results of the optimization demonstrate the relationship between the selected objective function and the behaviour of the model structures, but also the compensation for structural differences by different parameter values resulting in similar performance. The optimization gives

Background The ability to accurately forecast census counts in hospital departments has considerable implications for hospital resource allocation. In recent years several different methods have been proposed forecasting census counts, however many of these approaches do not use available patient-specific information. Methods In this paper we present an ensemble-based methodology for forecasting the census under a framework that simultaneously incorporates both (i) arrival trends over time and (ii) patient-specific baseline and time-varying information. The proposed model for predicting census has three components, namely: current census count, number of daily arrivals and number of daily departures. To model the number of daily arrivals, we use a seasonality adjusted Poisson Autoregressive (PAR) model where the parameter estimates are obtained via conditional maximum likelihood. The number of daily departures is predicted by modeling the probability of departure from the census using logistic regression models that are adjusted for the amount of time spent in the census and incorporate both patient-specific baseline and time varying patient-specific covariate information. We illustrate our approach using neonatal intensive care unit (NICU) data collected at Women & Infants Hospital, Providence RI, which consists of 1001 consecutive NICU admissions between April 1st 2008 and March 31st 2009. Results Our results demonstrate statistically significant improved prediction accuracy for 3, 5, and 7 day census forecasts and increased precision of our forecasting model compared to a forecasting approach that ignores patient-specific information. Conclusions Forecasting models that utilize patient-specific baseline and time-varying information make the most of data typically available and have the capacity to substantially improve census forecasts. PMID:23721123

The ensemble Kalman filter (EnKF) and ensemble square root filter (ESRF) are data assimilation methods used to combine high dimensional, nonlinear dynamical models with observed data. Despite their widespread usage in climate science and oil reservoir simulation, very little is known about the long-time behavior of these methods and why they are effective when applied with modest ensemble sizes in large dimensional turbulent dynamical systems. By following the basic principles of energy dissipation and controllability of filters, this paper establishes a simple, systematic and rigorous framework for the nonlinear analysis of EnKF and ESRF with arbitrary ensemble size, focusing on the dynamical properties of boundedness and geometric ergodicity. The time uniform boundedness guarantees that the filter estimate will not diverge to machine infinity in finite time, which is a potential threat for EnKF and ESQF known as the catastrophic filter divergence. Geometric ergodicity ensures in addition that the filter has a unique invariant measure and that initialization errors will dissipate exponentially in time. We establish these results by introducing a natural notion of observable energy dissipation. The time uniform bound is achieved through a simple Lyapunov function argument, this result applies to systems with complete observations and strong kinetic energy dissipation, but also to concrete examples with incomplete observations. With the Lyapunov function argument established, the geometric ergodicity is obtained by verifying the controllability of the filter processes; in particular, such analysis for ESQF relies on a careful multivariate perturbation analysis of the covariance eigen-structure.

The self-guided Langevin dynamics (SGLD) is a method to accelerate conformational searching. This method is unique in the way that it selectively enhances and suppresses molecular motions based on their frequency to accelerate conformational searching without modifying energy surfaces or raising temperatures. It has been applied to studies of many long time scale events, such as protein folding. Recent progress in the understanding of the conformational distribution in SGLD simulations makes SGLD also an accurate method for quantitative studies. The SGLD partition function provides a way to convert the SGLD conformational distribution to the canonical ensemble distribution and to calculate ensemble average properties through reweighting. Based on the SGLD partition function, this work presents a force-momentum-based self-guided Langevin dynamics (SGLDfp) simulation method to directly sample the canonical ensemble. This method includes interaction forces in its guiding force to compensate the perturbation caused by the momentum-based guiding force so that it can approximately sample the canonical ensemble. Using several example systems, we demonstrate that SGLDfp simulations can approximately maintain the canonical ensemble distribution and significantly accelerate conformational searching. With optimal parameters, SGLDfp and SGLD simulations can cross energy barriers of more than 15 kT and 20 kT, respectively, at similar rates for LD simulations to cross energy barriers of 10 kT. The SGLDfp method is size extensive and works well for large systems. For studies where preserving accessible conformational space is critical, such as free energy calculations and protein folding studies, SGLDfp is an efficient approach to search and sample the conformational space.

We show direct formal relationships between the Wang-Landau iteration [PRL 86, 2050 (2001)], metadynamics [PNAS 99, 12562 (2002)] and statistical temperature molecular dynamics [PRL 97, 050601 (2006)], the major Monte Carlo and molecular dynamics work horses for sampling from a generalized, multicanonical ensemble. We aim at helping to consolidate the developments in the different areas by indicating how methodological advancements can be transferred in a straightforward way, avoiding the parallel, largely independent, developments tracks observed in the past.

Ensembles of forecasts are obtained from multiple runs of numerical weather forecasting models with different initial conditions and typically employed to account for forecast uncertainties. However, biases and dispersion errors often occur in forecast ensembles, they are usually under-dispersive and uncalibrated and require statistical post-processing. We present an Ensemble Model Output Statistics (EMOS) method for calibration of wind speed forecasts based on the log-normal (LN) distribution, and we also show a regime-switching extension of the model which combines the previously studied truncated normal (TN) distribution with the LN. Both presented models are applied to wind speed forecasts of the eight-member University of Washington mesoscale ensemble, of the fifty-member ECMWF ensemble and of the eleven-member ALADIN-HUNEPS ensemble of the Hungarian Meteorological Service, and their predictive performances are compared to those of the TN and general extreme value (GEV) distribution based EMOS methods and to the TN-GEV mixture model. The results indicate improved calibration of probabilistic and accuracy of point forecasts in comparison to the raw ensemble and to climatological forecasts. Further, the TN-LN mixture model outperforms the traditional TN method and its predictive performance is able to keep up with the models utilizing the GEV distribution without assigning mass to negative values.

Ensembles are a well established machine learning paradigm, leading to accurate and robust models, predominantly applied to predictive modeling tasks. Ensemble models comprise a finite set of diverse predictive models whose combined output is expected to yield an improved predictive performance as compared to an individual model. In this paper, we propose a new method for learning ensembles of process-based models of dynamic systems. The process-based modeling paradigm employs domain-specific knowledge to automatically learn models of dynamic systems from time-series observational data. Previous work has shown that ensemblesbased on sampling observational data (i.e., bagging and boosting), significantly improve predictive performance of process-based models. However, this improvement comes at the cost of a substantial increase of the computational time needed for learning. To address this problem, the paper proposes a method that aims at efficiently learning ensembles of process-based models, while maintaining their accurate long-term predictive performance. This is achieved by constructing ensembles with sampling domain-specific knowledge instead of sampling data. We apply the proposed method to and evaluate its performance on a set of problems of automated predictive modeling in three lake ecosystems using a library of process-based knowledge for modeling population dynamics. The experimental results identify the optimal design decisions regarding the learning algorithm. The results also show that the proposed ensembles yield significantly more accurate predictions of population dynamics as compared to individual process-based models. Finally, while their predictive performance is comparable to the one of ensembles obtained with the state-of-the-art methods of bagging and boosting, they are substantially more efficient. PMID:27078633

Ensembles are a well established machine learning paradigm, leading to accurate and robust models, predominantly applied to predictive modeling tasks. Ensemble models comprise a finite set of diverse predictive models whose combined output is expected to yield an improved predictive performance as compared to an individual model. In this paper, we propose a new method for learning ensembles of process-based models of dynamic systems. The process-based modeling paradigm employs domain-specific knowledge to automatically learn models of dynamic systems from time-series observational data. Previous work has shown that ensemblesbased on sampling observational data (i.e., bagging and boosting), significantly improve predictive performance of process-based models. However, this improvement comes at the cost of a substantial increase of the computational time needed for learning. To address this problem, the paper proposes a method that aims at efficiently learning ensembles of process-based models, while maintaining their accurate long-term predictive performance. This is achieved by constructing ensembles with sampling domain-specific knowledge instead of sampling data. We apply the proposed method to and evaluate its performance on a set of problems of automated predictive modeling in three lake ecosystems using a library of process-based knowledge for modeling population dynamics. The experimental results identify the optimal design decisions regarding the learning algorithm. The results also show that the proposed ensembles yield significantly more accurate predictions of population dynamics as compared to individual process-based models. Finally, while their predictive performance is comparable to the one of ensembles obtained with the state-of-the-art methods of bagging and boosting, they are substantially more efficient. PMID:27078633

The current study proposes an integrated uncertainty and ensemble-based data assimilation framework (ICEA) and evaluates its viability in providing operational streamflow predictions via assimilating snow water equivalent (SWE) data. This step-wise framework applies a parameter uncertainty analysis algorithm (ISURF) to identify the uncertainty structure of sensitive model parameters, which is subsequently formulated into an Ensemble Kalman Filter (EnKF) to generate updated snow states for streamflow prediction. The framework is coupled to the US National Weather Service (NWS) snow and rainfall-runoff models. Its applicability is demonstrated for an operational basin of a western River Forecast Center (RFC) of the NWS. Performance of the framework is evaluated against existing operational baseline (RFC predictions), the stand-alone ISURF and the stand-alone EnKF. Results indicate that the ensemble-mean prediction of ICEA considerably outperforms predictions from the other three scenarios investigated, particularly in the context of predicting high flows (top 5th percentile). The ICEA streamflow ensemble predictions capture the variability of the observed streamflow well, however the ensemble is not wide enough to consistently contain the range of streamflow observations in the study basin. Our findings indicate that the ICEA has the potential to supplement the current operational (deterministic) forecasting method in terms of providing improved single-valued (e.g., ensemble mean) streamflow predictions as well as meaningful ensemble predictions.

The current study proposes an integrated uncertainty and ensemble-based data assimilation framework (ICEA) and evaluates its viability in providing operational streamflow predictions via assimilating snow water equivalent (SWE) data. This step-wise framework applies a parameter uncertainty analysis algorithm (ISURF) to identify the uncertainty structure of sensitive model parameters, which is subsequently formulated into an Ensemble Kalman Filter (EnKF) to generate updated snow states for streamflow prediction. The framework is coupled to the US National Weather Service (NWS) snow and rainfall-runoff models. Its applicability is demonstrated for an operational basin of a western River Forecast Center (RFC) of the NWS. Performance of the framework is evaluated against existing operational baseline (RFC predictions), the stand-alone ISURF, and the stand-alone EnKF. Results indicate that the ensemble-mean prediction of ICEA considerably outperforms predictions from the other three scenarios investigated, particularly in the context of predicting high flows (top 5th percentile). The ICEA streamflow ensemble predictions capture the variability of the observed streamflow well, however the ensemble is not wide enough to consistently contain the range of streamflow observations in the study basin. Our findings indicate that the ICEA has the potential to supplement the current operational (deterministic) forecasting method in terms of providing improved single-valued (e.g., ensemble mean) streamflow predictions as well as meaningful ensemble predictions.

Long-lived quantum memories are essential components of a long-standing goal of remote distribution of entanglement in quantum networks. These can be realized by storing the quantum states of light as single-spin excitations in atomic ensembles. However, spin states are often subjected to different dephasing processes that limit the storage time, which in principle could be overcome using spin-echo techniques. Theoretical studies suggest this to be challenging due to unavoidable spontaneous emission noise in ensemble-based quantum memories. Here, we demonstrate spin-echo manipulation of a mean spin excitation of 1 in a large solid-state ensemble, generated through storage of a weak optical pulse. After a storage time of about 1 ms we optically read-out the spin excitation with a high signal-to-noise ratio. Our results pave the way for long-duration optical quantum storage using spin-echo techniques for any ensemble-based memory.

Casein kinase-1 (CK1) isoforms actively participate in the down-regulation of canonical Wnt signaling pathway; however recent studies have shown their active roles in oncogenesis of various tissues through this pathway. Functional loss of two isoforms (CK1-α/ε) has been shown to activate the carcinogenic pathway which involves the stabilization of of cytoplasmic β-catenin. Development of anticancer therapeutics is very laborious task and depends upon the structural and conformational details of the target. This study focuses on, how the structural dynamics and conformational changes of two CK1 isoforms are synchronized in carcinogenic pathway. The conformational dynamics in kinases is the responsible for their action as has been supported by the molecular docking experiments. PMID:26788877

Tool condition monitoring (TCM) plays an important role in improving machining efficiency and guaranteeing workpiece quality. In order to realize reliable recognition of the tool condition, a robust classifier needs to be constructed to depict the relationship between tool wear states and sensory information. However, because of the complexity of the machining process and the uncertainty of the tool wear evolution, it is hard for a single classifier to fit all the collected samples without sacrificing generalization ability. In this paper, heterogeneous ensemble learning is proposed to realize tool condition monitoring in which the support vector machine (SVM), hidden Markov model (HMM) and radius basis function (RBF) are selected as base classifiers and a stacking ensemble strategy is further used to reflect the relationship between the outputs of these base classifiers and tool wear states. Based on the heterogeneous ensemble learning classifier, an online monitoring system is constructed in which the harmonic features are extracted from force signals and a minimal redundancy and maximal relevance (mRMR) algorithm is utilized to select the most prominent features. To verify the effectiveness of the proposed method, a titanium alloy milling experiment was carried out and samples with different tool wear states were collected to build the proposed heterogeneous ensemble learning classifier. Moreover, the homogeneous ensemble learning model and majority voting strategy are also adopted to make a comparison. The analysis and comparison results show that the proposed heterogeneous ensemble learning classifier performs better in both classification accuracy and stability. PMID:25405514

Tool condition monitoring (TCM) plays an important role in improving machining efficiency and guaranteeing workpiece quality. In order to realize reliable recognition of the tool condition, a robust classifier needs to be constructed to depict the relationship between tool wear states and sensory information. However, because of the complexity of the machining process and the uncertainty of the tool wear evolution, it is hard for a single classifier to fit all the collected samples without sacrificing generalization ability. In this paper, heterogeneous ensemble learning is proposed to realize tool condition monitoring in which the support vector machine (SVM), hidden Markov model (HMM) and radius basis function (RBF) are selected as base classifiers and a stacking ensemble strategy is further used to reflect the relationship between the outputs of these base classifiers and tool wear states. Based on the heterogeneous ensemble learning classifier, an online monitoring system is constructed in which the harmonic features are extracted from force signals and a minimal redundancy and maximal relevance (mRMR) algorithm is utilized to select the most prominent features. To verify the effectiveness of the proposed method, a titanium alloy milling experiment was carried out and samples with different tool wear states were collected to build the proposed heterogeneous ensemble learning classifier. Moreover, the homogeneous ensemble learning model and majority voting strategy are also adopted to make a comparison. The analysis and comparison results show that the proposed heterogeneous ensemble learning classifier performs better in both classification accuracy and stability. PMID:25405514

Ensembl (http://www.ensembl.org) is a genomic interpretation system providing the most up-to-date annotations, querying tools and access methods for chordates and key model organisms. This year we released updated annotation (gene models, comparative genomics, regulatory regions and variation) on the new human assembly, GRCh38, although we continue to support researchers using the GRCh37.p13 assembly through a dedicated site (http://grch37.ensembl.org). Our Regulatory Build has been revamped to identify regulatory regions of interest and to efficiently highlight their activity across disparate epigenetic data sets. A number of new interfaces allow users to perform large-scale comparisons of their data against our annotations. The REST server (http://rest.ensembl.org), which allows programs written in any language to query our databases, has moved to a full service alongside our upgraded website tools. Our online Variant Effect Predictor tool has been updated to process more variants and calculate summary statistics. Lastly, the WiggleTools package enables users to summarize large collections of data sets and view them as single tracks in Ensembl. The Ensembl code base itself is more accessible: it is now hosted on our GitHub organization page (https://github.com/Ensembl) under an Apache 2.0 open source license. PMID:25352552

A limitation of traditional molecular dynamics (MD) is that reaction rates are difficult to compute. This is due to the rarity of observing transitions between metastable states since high energy barriers trap the system in these states. Recently the weighted ensemble (WE) family of methods have emerged which can flexibly and efficiently sample conformational space without being trapped and allow calculation of unbiased rates. However, while WE can sample correctly and efficiently, a scalable implementation applicable to interesting biomolecular systems is not available. We provide here a GPLv2 implementation called AWE-WQ of a WE algorithm using the master/worker distributed computing WorkQueue (WQ) framework. AWE-WQ is scalable to thousands of nodes and supports dynamic allocation of computer resources, heterogeneous resource usage (such as central processing units (CPU) and graphical processing units (GPUs) concurrently), seamless heterogeneous cluster usage (i.e., campus grids and cloud providers), and support for arbitrary MD codes such as GROMACS, while ensuring that all statistics are unbiased. We applied AWE-WQ to a 34 residue protein which simulated 1.5 ms over 8 months with peak aggregate performance of 1000 ns/h. Comparison was done with a 200 μs simulation collected on a GPU over a similar timespan. The folding and unfolded rates were of comparable accuracy. PMID:25207854

Lung cancer is one of the leading causes of death worldwide. There are three major types of lung cancers, non-small cell lung cancer (NSCLC), small cell lung cancer (SCLC) and carcinoid. NSCLC is further classified into lung adenocarcinoma (LADC), squamous cell lung cancer (SQCLC) as well as large cell lung cancer. Many previous studies demonstrated that DNA methylation has emerged as potential lung cancer-specific biomarkers. However, whether there exists a set of DNA methylation markers simultaneously distinguishing such three types of lung cancers remains elusive. In the present study, ROC (Receiving Operating Curve), RFs (Random Forests) and mRMR (Maximum Relevancy and Minimum Redundancy) were proposed to capture the unbiased, informative as well as compact molecular signatures followed by machine learning methods to classify LADC, SQCLC and SCLC. As a result, a panel of 16 DNA methylation markers exhibits an ideal classification power with an accuracy of 86.54%, 84.6% and a recall 84.37%, 85.5% in the leave-one-out cross-validation (LOOCV) and independent data set test experiments, respectively. Besides, comparison results indicate that ensemble-based feature selection methods outperform individual ones when combined with the incremental feature selection (IFS) strategy in terms of the informative and compact property of features. Taken together, results obtained suggest the effectiveness of the ensemble-based feature selection approach and the possible existence of a common panel of DNA methylation markers among such three types of lung cancer tissue, which would facilitate clinical diagnosis and treatment. PMID:25512221

A P300-based brain-computer interface (BCI) enables a wide range of people to control devices that improve their quality of life. Ensemble classifiers with naive partitioning were recently applied to the P300-based BCI and these classification performances were assessed. However, they were usually trained on a large amount of training data (e.g., 15300). In this study, we evaluated ensemble linear discriminant analysis (LDA) classifiers with a newly proposed overlapped partitioning method using 900 training data. In addition, the classification performances of the ensemble classifier with naive partitioning and a single LDA classifier were compared. One of three conditions for dimension reduction was applied: the stepwise method, principal component analysis (PCA), or none. The results show that an ensemble stepwise LDA (SWLDA) classifier with overlapped partitioning achieved a better performance than the commonly used single SWLDA classifier and an ensemble SWLDA classifier with naive partitioning. This result implies that the performance of the SWLDA is improved by overlapped partitioning and the ensemble classifier with overlapped partitioning requires less training data than that with naive partitioning. This study contributes towards reducing the required amount of training data and achieving better classification performance. PMID:24695550

Monte Carlo computational methods have been introduced into data assimilation for nonlinear systems in order to alleviate the computational burden of updating and propagating the full probability distribution. By propagating an ensemble of representative states, algorithms like the ensemble Kalman filter (EnKF) and the resampled particle filter (RPF) rely on the existing modeling infrastructure to approximate the distribution based on the evolution of this ensemble. This work presents an ensemble-based smoother that is applicable to the Monte Carlo filtering schemes like EnKF and RPF. At the minor cost of retrospectively updating a set of weights for ensemble members, this smoother has demonstrated superior capabilities in state tracking for two highly nonlinear problems: the double-well potential and trivariate Lorenz systems. The algorithm does not require retrospective adaptation of the ensemble members themselves, and it is thus suited to a streaming operational mode. The accuracy of the proposed backward-update scheme in estimating non-Gaussian distributions is evaluated by comparison to the more accurate estimates provided by a Markov chain Monte Carlo algorithm.

We present a new ensemble system for stock market returns prediction where continuous wavelet transform (CWT) is used to analyze return series and backpropagation neural networks (BPNNs) for processing CWT-based coefficients, determining the optimal ensemble weights, and providing final forecasts. Particle swarm optimization (PSO) is used for finding optimal weights and biases for each BPNN. To capture symmetry/asymmetry in the underlying data, three wavelet functions with different shapes are adopted. The proposed ensemble system was tested on three Asian stock markets: The Hang Seng, KOSPI, and Taiwan stock market data. Three statistical metrics were used to evaluate the forecasting accuracy; including, mean of absolute errors (MAE), root mean of squared errors (RMSE), and mean of absolute deviations (MADs). Experimental results showed that our proposed ensemble system outperformed the individual CWT-ANN models each with different wavelet function. In addition, the proposed ensemble system outperformed the conventional autoregressive moving average process. As a result, the proposed ensemble system is suitable to capture symmetry/asymmetry in financial data fluctuations for better prediction accuracy.

Ensemble techniques are widely used in the modelling community, combining different modelling results in order to reduce uncertainties. This approach could be also adapted to satellite measurements. Aerosol_cci is an ESA funded project, where most of the European aerosol retrieval groups work together. The different algorithms are homogenized as far as it makes sense, but remain essentially different. Datasets are compared with ground based measurements and between each other. Three AATSR algorithms (Swansea university aerosol retrieval, ADV aerosol retrieval by FMI and Oxford aerosol retrieval ORAC) provide within this project 17 year global aerosol records. Each of these algorithms provides also uncertainty information on pixel level. Within the presented work, an ensembles of the three AATSR algorithms is performed. The advantage over each single algorithm is the higher spatial coverage due to more measurement pixels per gridbox. A validation to ground based AERONET measurements shows still a good correlation of the ensemble, compared to the single algorithms. Annual mean maps show the global aerosol distribution, based on a combination of the three aerosol algorithms. In addition, pixel level uncertainties of each algorithm are used for weighting the contributions, in order to reduce the uncertainty of the ensemble. Results of different versions of the ensembles for aerosol optical depth will be presented and discussed. The results are validated against ground based AERONET measurements. A higher spatial coverage on daily basis allows better results in annual mean maps. The benefit of using pixel level uncertainties is analysed.

Wide-angle x-ray scattering (WAXS) experiments of biomolecules in solution have become increasingly popular because of technical advances in light sources and detectors. However, the structural interpretation of WAXS profiles is problematic, partly because accurate calculations of WAXS profiles from structural models have remained challenging. In this work, we present the calculation of WAXS profiles from explicit-solvent molecular dynamics (MD) simulations of five different proteins. Using only a single fitting parameter that accounts for experimental uncertainties because of the buffer subtraction and dark currents, we find excellent agreement to experimental profiles both at small and wide angles. Because explicit solvation eliminates free parameters associated with the solvation layer or the excluded solvent, which would require fitting to experimental data, we minimize the risk of overfitting. We further find that the influence from water models and protein force fields on calculated profiles are insignificant up to q≈15nm−1. Using a series of simulations that allow increasing flexibility of the proteins, we show that incorporating thermal fluctuations into the calculations significantly improves agreement with experimental data, demonstrating the importance of protein dynamics in the interpretation of WAXS profiles. In addition, free MD simulations up to one microsecond suggest that the calculated profiles are highly sensitive with respect to minor conformational rearrangements of proteins, such as an increased flexibility of a loop or an increase of the radius of gyration by < 1%. The present study suggests that quantitative comparison between MD simulations and experimental WAXS profiles emerges as an accurate tool to validate solution ensembles of biomolecules. PMID:25028885

The vast majority of microscopic life on earth consists of microbes that do not grow in laboratory culture. To profile the microbial diversity in environmental and clinical samples, we have devised and employed molecular probe technology, which detects and identifies bacteria that do and do not grow in culture. The only requirement is a short sequence of contiguous bases (currently 60 bases) unique to the genome of the organism of interest. The procedure is relatively fast, inexpensive, customizable, robust, and culture independent and uses commercially available reagents and instruments. In this communication, we report improving the specificity of the molecular probes substantially and increasing the complexity of the molecular probe set by over an order of magnitude (>1,200 probes) and introduce a new final readout method based upon Illumina sequencing. In addition, we employed molecular probes to identify the bacteria from vaginal swabs and demonstrate how a deliberate selection of molecular probes can identify less abundant bacteria even in the presence of much more abundant species. PMID:24795371

In multiple instance learning, objects are sets (bags) of feature vectors (instances) rather than individual feature vectors. In this paper, we address the problem of how these bags can best be represented. Two standard approaches are to use (dis)similarities between bags and prototype bags, or between bags and prototype instances. The first approach results in a relatively low-dimensional representation, determined by the number of training bags, whereas the second approach results in a relatively high-dimensional representation, determined by the total number of instances in the training set. However, an advantage of the latter representation is that the informativeness of the prototype instances can be inferred. In this paper, a third, intermediate approach is proposed, which links the two approaches and combines their strengths. Our classifier is inspired by a random subspace ensemble, and considers subspaces of the dissimilarity space, defined by subsets of instances, as prototypes. We provide insight into the structure of some popular multiple instance problems and show state-of-the-art performances on these data sets. PMID:27214351

To improve the accuracy in prediction, Genetic Algorithm based Adaptive Neural Network Ensemble (GA-ANNE) is presented. Intersections are allowed between different training sets based on the fuzzy clustering analysis, which ensures the diversity as well as the accuracy of individual Neural Networks (NNs). Moreover, to improve the accuracy of the adaptive weights of individual NNs, GA is used to optimize the cluster centers. Empirical results in predicting carbon flux of Duke Forest reveal that GA-ANNE can predict the carbon flux more accurately than Radial Basis Function Neural Network (RBFNN), Bagging NN ensemble, and ANNE. ?? 2007 IEEE.

Multi-sensor systems (MSS) have been increasingly applied in pattern classification while searching for the optimal classification framework is still an open problem. The development of the classifier ensemble seems to provide a promising solution. The classifier ensemble is a learning paradigm where many classifiers are jointly used to solve a problem, which has been proven an effective method for enhancing the classification ability. In this paper, by introducing the concept of Meta-feature (MF) and Trans-function (TF) for describing the relationship between the nature and the measurement of the observed phenomenon, classification in a multi-sensor system can be unified in the classifier ensemble framework. Then an approach called Genetic Algorithm based Classifier Ensemble in Multi-sensor system (GACEM) is presented, where a genetic algorithm is utilized for optimization of both the selection of features subset and the decision combination simultaneously. GACEM trains a number of classifiers based on different combinations of feature vectors at first and then selects the classifiers whose weight is higher than the pre-set threshold to make up the ensemble. An empirical study shows that, compared with the conventional feature-level voting and decision-level voting, not only can GACEM achieve better and more robust performance, but also simplify the system markedly.

Accurate forecasts of regional climate, including temperature and precipitation, have significant implications for human activities, not just economically but socially. Sub Saharan Africa is a region that has displayed an exceptional propensity for devastating civil wars. Recent research in political economy has revealed a strong statistical relationship between year to year fluctuations in precipitation and civil conflict in this region in the 1980s and 1990s. To investigate how climate change may modify the regional risk of civil conflict in the future requires a probabilistic regional forecast that explicitly accounts for the community's uncertainty in the evolution of rainfall under anthropogenic forcing. We approach the regional climate prediction aspect of this question through the application of a recently demonstrated method called generalized scalar prediction (Leroy et al. 2009), which predicts arbitrary scalar quantities of the climate system. This prediction method can predict change in any variable or linear combination of variables of the climate system averaged over a wide range spatial scales, from regional to hemispheric to global. Generalized scalar prediction utilizes an ensemble of model predictions to represent the community's uncertainty range in climate modeling in combination with a timeseries of any type of observational data that exhibits sensitivity to the scalar of interest. It is not necessary to prioritize models in deriving with the final prediction. We present the results of the application of generalized scalar prediction for regional forecasts of temperature and precipitation and Sub Saharan Africa. We utilize the climate predictions along with the established statistical relationship between year-to-year rainfall variability in Sub Saharan Africa to investigate the potential impact of climate change on civil conflict within that region.

Conceptualization of the fracture network in a disposal site is important for the safety assessment of a subsurface repository for radioactive waste. To consider the uncertainty of the stochastically conceptualized discrete fracture networks (DFNs), the ensemble variability of equivalent permeability was evaluated by defining different network structures with various fracture densities and characterization levels, and analyzing the ensemble mean and variability of the equivalent permeability of the networks, where the characterization level was defined as the ratio of the number of deterministically conceptualized fractures to the total number of fractures in the domain. The results show that the hydraulic property of the generated fractures were similar among the ensembles when the fracture density was larger than the specific fracture density where the domain size was equal to the correlation length of a given fracture network. In a sparsely fracture network where the fracture density was smaller than the specific fracture density, the ensemble variability was too large to ensure the consistent property from the stochastic DFN modeling. Deterministic information for a portion of a fracture network could reduce the uncertainty of the hydraulic property only when the fracture density was larger than the specific fracture density. Based on these results, the DFN modeling domain size for KAERI's (Korea Atomic Energy Research Institute) URT (Underground Research Tunnel) site to guarantee a less variable hydraulic property of the fracture network was determined by calculating the correlation length, and verified by evaluating the ensemble variability of the equivalent permeability.

The National Weather Service (NWS) has federal responsibility for issuing public flood warnings in the United States. Additionally, the NWS has been engaged in longer range water resources forecasts for many years, particularly in the Western U.S. In the past twenty years, longer range forecasts have increasingly incorporated ensemble techniques. Ensemble techniques are attractive because they allow a great deal of flexibility, both temporally and in content. This technique also provides for the influence of additional forcings (i.e. ENSO), through either pre or post processing techniques. More recently, attention has turned to the use of ensemble techniques in the short-term streamflow forecasting process. While considerably more difficult, the development of reliable short-term probabilistic streamflow forecasts has clear application and value for many NWS customers and partners. During flood episodes, expensive mitigation actions are initialed or withheld and critical reservoir management decisions are made in the absence of uncertainty and risk information. Limited emergency services resources and the optimal use of water resources facilities necessitates the development of a risk-based decision making process. The development of reliable short-term probabilistic streamflow forecasts are an essential ingredient in the decision making process. This paper addresses the utility of short-term ensemble streamflow forecasts and the considerations that must be addressed as techniques and operational capabilities are developed. Verification and validation information are discussed from both a scientific and customer perspective. Education and training related to the interpretation and use of ensemble products are also addressed.

The CMIP5 archive contains future climate projections from over 50 models provided by dozens of modeling centers from around the world. Individual model projections, however, are subject to biases created by structural model uncertainties. As a result, ensemble averaging of multiple models is often used to add value to model projections: consensus projections have been shown to consistently outperform individual models. Previous reports for the IPCC establish climate change projections based on an equal-weighted average of all model projections. However, certain models reproduce climate processes better than other models. Should models be weighted based on performance? Unequal ensemble averages have previously been constructed using a variety of mean state metrics. What metrics are most relevant for constraining future climate projections? This project develops a framework for systematically testing metrics in models to identify optimal metrics for unequal weighting multi-model ensembles. A unique aspect of this project is the construction and testing of climate process-based model evaluation metrics. A climate process-based metric is defined as a metric based on the relationship between two physically related climate variables—e.g., outgoing longwave radiation and surface temperature. Metrics are constructed using high-quality Earth radiation budget data from NASA's Clouds and Earth's Radiant Energy System (CERES) instrument and surface temperature data sets. It is found that regional values of tested quantities can vary significantly when comparing weighted and unweighted model ensembles. For example, one tested metric weights the ensemble by how well models reproduce the time-series probability distribution of the cloud forcing component of reflected shortwave radiation. The weighted ensemble for this metric indicates lower simulated precipitation (up to .7 mm/day) in tropical regions than the unweighted ensemble: since CMIP5 models have been shown to

We focus on developing a pattern recognition method suitable for performing supervised analysis tasks on molecular data resulting from microarray experiments. Molecular characterization of tissue samples using microarray gene expression profiling is expected to uncover fundamental aspects related to cancer diagnosis and drug discovery. There is therefore a need for reliable, accurate classification methods. With this study we propose a framework for constructing an ensemble of individually trained SVM classifiers, each of them specialized on subsets of the input space. The fuzzy approach used for partitioning the data produces overlapping subsets of the input space that facilitates subsequent classification tasks. PMID:17946338

We analyze the magnetic dipole coupling of an ensemble of spins to a superconducting microwave stripline structure, incorporating a Josephson junction based transmon qubit. We show that this system is described by an embedded Jaynes-Cummings model: in the strong coupling regime, collective spin-wave excitations of the ensemble of spins pick up the nonlinearity of the cavity mode, such that the two lowest eigenstates of the coupled spin wave-microwave cavity-Josephson junction system define a hybrid two-level system. The proposal described here enables new avenues for nonlinear optics using optical photons coupled to spin ensembles via Raman transitions. The possibility of strong coupling cavity QED with magnetic dipole transitions also opens up the possibility of extending quantum information processing protocols to spins in silicon or graphene, without the need for single-spin confinement.

An ensemble performs well when the component classifiers are diverse yet accurate, so that the failure of one is compensated for by others. A number of methods have been investigated for constructing ensemble in which some of them train classifiers with the generated patterns. This study investigates a new technique of training pattern generation. The method alters input feature values of some patterns using the values of other patterns to generate different patterns for different classifiers. The effectiveness of neural network ensemblebased on the proposed technique was evaluated using a suite of 25 benchmark classification problems, and was found to achieve performance better than or competitive with related conventional methods. Experimental investigation of different input values alteration techniques finds that alteration with pattern values in the same class is better for generalization, although other alteration techniques may offer more diversity. PMID:22262526

Data assimilation (DA) techniques, like the local ensemble transform Kalman filter (LETKF) not only offer the opportunity to update model predictions by assimilating new measurement data in real time, but also provide an improved basis for real-time (DA-based) control. This study focuses on the optimization of real-time irrigation scheduling for fields of citrus trees near Picassent (Spain). For three selected fields the irrigation was optimized with DA-based control, and for other fields irrigation was optimized on the basis of a more traditional approach where reference evapotranspiration for citrus trees was estimated using the FAO-method. The performance of the two methods is compared for the year 2013. The DA-based real-time control approach is based on ensemble predictions of soil moisture profiles, using the Community Land Model (CLM). The uncertainty in the model predictions is introduced by feeding the model with weather predictions from an ensemble prediction system (EPS) and uncertain soil hydraulic parameters. The model predictions are updated daily by assimilating soil moisture data measured by capacitance probes. The measurement data are assimilated with help of LETKF. The irrigation need was calculated for each of the ensemble members, averaged, and logistic constraints (hydraulics, energy costs) were taken into account for the final assigning of irrigation in space and time. For the operational scheduling based on this approach only model states and no model parameters were updated by the model. Other, non-operational simulation experiments for the same period were carried out where (1) neither ensemble weather forecast nor DA were used (open loop), (2) Only ensemble weather forecast was used, (3) Only DA was used, (4) also soil hydraulic parameters were updated in data assimilation and (5) both soil hydraulic and plant specific parameters were updated. The FAO-based and DA-based real-time irrigation control are compared in terms of soil moisture

Objective. This study aims to establish a model to analyze clinical experience of TCM veteran doctors. We propose an ensemble learning based framework to analyze clinical records with ICD-10 labels information for effective diagnosis and acupoints recommendation. Methods. We propose an ensemble learning framework for the analysis task. A set of base learners composed of decision tree (DT) and support vector machine (SVM) are trained by bootstrapping the training dataset. The base learners are sorted by accuracy and diversity through nondominated sort (NDS) algorithm and combined through a deep ensemble learning strategy. Results. We evaluate the proposed method with comparison to two currently successful methods on a clinical diagnosis dataset with manually labeled ICD-10 information. ICD-10 label annotation and acupoints recommendation are evaluated for three methods. The proposed method achieves an accuracy rate of 88.2% ± 2.8% measured by zero-one loss for the first evaluation session and 79.6% ± 3.6% measured by Hamming loss, which are superior to the other two methods. Conclusion. The proposed ensemble model can effectively model the implied knowledge and experience in historic clinical data records. The computational cost of training a set of base learners is relatively low. PMID:26504897

Forecasts from seven air quality models and surface ozone data collected over the eastern USA and southern Canada during July and August 2004 provide a unique opportunity to assess benefits of ensemble-based ozone forecasting and devise methods to improve ozone forecasts. In this investigation, past forecasts from the ensemble of models and hourly surface ozone measurements at over 350 sites are used to issue deterministic 24-h forecasts using a method based on dynamic linear regression. Forecasts of hourly ozone concentrations as well as maximum daily 8-h and 1-h averaged concentrations are considered. It is shown that the forecasts issued with the application of this method have reduced bias and root mean square error and better overall performance scores than any of the ensemble members and the ensemble average. Performance of the method is similar to another method based on linear regression described previously by Pagowski et al., but unlike the latter, the current method does not require measurements from multiple monitors since it operates on individual time series. Improvement in the forecasts can be easily implemented and requires minimal computational cost.

Molecular density functional theory (MDFT) offers an efficient implicit-solvent method to estimate molecule solvation free-energies, whereas conserving a fully molecular representation of the solvent. Even within a second-order approximation for the free-energy functional, the so-called homogeneous reference fluid approximation, we show that the hydration free-energies computed for a data set of 500 organic compounds are of similar quality as those obtained from molecular dynamics free-energy perturbation simulations, with a computer cost reduced by 2-3 orders of magnitude. This requires to introduce the proper partial volume correction to transform the results from the grand canonical to the isobaric-isotherm ensemble that is pertinent to experiments. We show that this correction can be extended to 3D-RISM calculations, giving a sound theoretical justification to empirical partial molar volume corrections that have been proposed recently. PMID:26273876

The Ensembl (http://www.ensembl.org/) project provides a comprehensive and integrated source of annotation of chordate genome sequences. Over the past year the number of genomes available from Ensembl has increased from 15 to 33, with the addition of sites for the mammalian genomes of elephant, rabbit, armadillo, tenrec, platypus, pig, cat, bush baby, common shrew, microbat and european hedgehog; the fish genomes of stickleback and medaka and the second example of the genomes of the sea squirt (Ciona savignyi) and the mosquito (Aedes aegypti). Some of the major features added during the year include the first complete gene sets for genomes with low-sequence coverage, the introduction of new strain variation data and the introduction of new orthology/paralog annotations based on gene trees. PMID:17148474

Long-lived quantum memories are essential components of a long-standing goal of remote distribution of entanglement in quantum networks. These can be realized by storing the quantum states of light as single-spin excitations in atomic ensembles. However, spin states are often subjected to different dephasing processes that limit the storage time, which in principle could be overcome using spin-echo techniques. Theoretical studies suggest this to be challenging due to unavoidable spontaneous emission noise in ensemble-based quantum memories. Here, we demonstrate spin-echo manipulation of a mean spin excitation of 1 in a large solid-state ensemble, generated through storage of a weak optical pulse. After a storage time of about 1 ms we optically read-out the spin excitation with a high signal-to-noise ratio. Our results pave the way for long-duration optical quantum storage using spin-echo techniques for any ensemble-based memory. PMID:26196785

Mackerel is an infravalored fish captured by European fishing vessels. A manner to add value to this specie can be achieved by trying to classify it attending to its sex. Colour measurements were performed on Mackerel females and males (fresh and defrozen) extracted gonads to obtain differences between sexes. Several linear and non linear classifiers such as Support Vector Machines (SVM), k Nearest Neighbors (k-NN) or Diagonal Linear Discriminant Analysis (DLDA) can been applied to this problem. However, theyare usually based on Euclidean distances that fail to reflect accurately the sample proximities. Classifiers based on non-Euclidean dissimilarities misclassify a different set of patterns. We combine different kind of dissimilarity based classifiers. The diversity is induced considering a set of complementary dissimilarities for each model. The experimental results suggest that our algorithm helps to improve classifiers based on a single dissimilarity.

Intrinsically disordered proteins (IDPs) and regions are highly prevalent in eukaryotic proteomes, and like folded proteins, they perform essential biological functions. Interaction sites in folded proteins are generally formed by tertiary structures, whereas IDPs use short segments called linear motifs (LMs). Despite their short length and lack of stable structure, LMs may have considerable structural propensities, which often resemble bound-state conformations with targets. Structural data is crucial for understanding the molecular basis of protein interactions and development of targeted pharmaceuticals, but IDPs present considerable challenges to experimental techniques. As a result, IDPs are largely underrepresented in the Protein Data Bank. In the face of experimental challenges, molecular dynamics (MD) simulations have proven to be a useful tool for structural characterization of IDPs. Here, the free state ensemble of the nuclear receptor corepressor 1 (NCOR1) CoRNR box 3 motif, which is important for binding to nuclear receptors to control gene expression, is studied using MD simulations of a total of 8 μs. Transitions between disordered and α-helical conformations resembling a bound-state structure were observed throughout the trajectory, indicating that the motif may have a natural conformational bias toward bound-state structures. The data shows that the disordered and folded populations are separated by a low energy (4-6 kJ/mol) barrier, and the presence of off-pathway intermediates, leading to a C-terminally folded species that cannot efficiently transition into a completely folded conformation. Structural transitions and folding pathways within the free state ensemble were well-described by principal component analysis (PCA) of the peptide backbone dihedral angles, with the analysis providing insight for increasing structural homogeneity of the ensemble. PMID:26794929

Each piece of Western clothing has a unique temperature rating (TR); however, based on different wearing ways, one Tibetan robe ensemble can be used in various environments of the Tibetan plateau. To explain this environmental adaptation, thermal insulations and TR values of Tibetan robe ensembles in three typical wearing ways were measured by manikin testing and wearing trials, respectively. The TR prediction models for Tibetan robe ensembles were built in this research. The results showed that the thermal insulations of Tibetan robe ensembles changed from 0.26 clo to 0.91 clo; the corresponding TRs ranged from 9.90 °C to 16.86 °C because of different wearing ways. Not only the thermal insulation, but also the ways of wearing Tibetan robes was important to determining their TR values. The three TR models and a triangle area for each piece of Tibetan clothing explained its positive adaptation into the environment; this was different from the current TR models for Western clothing. PMID:22321946

In this study, we report new classes of potent tyrosinase inhibitors identified by enhanced structure-based virtual screening prediction; the enzyme and melanin content assays were also confirmed. Tyrosinase, a type-3 copper protein, participates in two distinct reactions, hydroxylation of tyrosine to DOPA and conversion of DOPA to dopaquinone, in melanin biosynthesis. Although numerous inhibitors of this reaction have been reported, there is a lag in the discovery of the new functional moieties. In order to improve the performance of virtual screening, we first produced an ensemble of 10,000 structures using molecular dynamics simulation. Quantum mechanical calculation was used to determine the partial charges of catalytic copper ions based on the met and deoxy states. Second, we selected a structure showing an optimal receiver operating characteristic (ROC) curve with known direct binders and their physicochemically matched decoys. The structure revealed more than 10-fold higher enrichment at 1% of the ROC curve than those observed in X-ray structures. Third, high-throughput virtual screening with DOCK 3.6 was performed using a library consisting of approximately 400,000 small molecules derived from the ZINC database. Fourth, we obtained the top 60 molecules and tested their inhibition of mushroom tyrosinase. The extended assays included 21 analogs of the 21 initial hits to test their inhibition properties. Here, the moieties of tetrazole and triazole were identified as new binding cores interacting with the dicopper catalytic center. All 42 inhibitors showed inhibitory constant, Ki, values ranging from 11.1 nM and 33.4 μM, with a tetrazole compound exhibiting the strongest activity. Among the 42 molecules, five displayed more than 30% reduction in melanin production when treated in B16F10 melanoma cells; cell viability was >90% at 20 μM. Particularly, a thiosemicarbazone-containing compound reduced melanin content by 55%. PMID:26750991

Microarray data has small samples and high dimension, and it contains a significant amount of irrelevant and redundant genes. This paper proposes a hybrid ensemble method based on double disturbance to improve classification performance. Firstly, original genes are ranked through reliefF algorithm and part of the genes are selected from the original genes set, and then a new training set is generated from the original training set according to the previously selected genes. Secondly, D bootstrap training subsets are produced from the previously generated training set by bootstrap technology. Thirdly, an attribute reduction method based on neighborhood mutual information with a different radius is used to reduce genes on each bootstrap training subset to produce new training subsets. Each new training subset is applied to train a base classifier. Finally, a part of the base classifiers are selected based on the teaching-learning-based optimization to build an ensemble by weighted voting. Experimental results on six benchmark cancer microarray datasets showed proposed method decreased ensemble size and obtained higher classification performance compared with Bagging, AdaBoost, and Random Forest. PMID:26405970

In this paper, we present a design methodology for integrating heterogeneous classifier ensembles by employing a diversity-based hybrid classifier fusion approach, whose aggregator module consists of two classifier combiners, to achieve an improved classification performance for motor unit potential classification during electromyographic (EMG) signal decomposition. Following the so-called overproduce and choose strategy to classifier ensemble combination, the developed system allows the construction of a large set of base classifiers, and then automatically chooses subsets of classifiers to form candidate classifier ensembles for each combiner. The system exploits kappa statistic diversity measure to design classifier teams through estimating the level of agreement between base classifier outputs. The pool of base classifiers consists of different kinds of classifiers: the adaptive certainty-based, the adaptive fuzzy k -NN, and the adaptive matched template filter classifiers; and utilizes different types of features. Performance of the developed system was evaluated using real and simulated EMG signals, and was compared with the performance of the constituent base classifiers. Across the EMG signal datasets used, the developed system had better average classification performance overall, especially in terms of reducing classification errors. For simulated signals of varying intensity, the developed system had an average correct classification rate CCr of 93.8% and an error rate Er of 2.2% compared to 93.6% and 3.2%, respectively, for the best base classifier in the ensemble. For simulated signals with varying amounts of shape and/or firing pattern variability, the developed system had a CCr of 89.1% with an Er of 4.7% compared to 86.3% and 5.6%, respectively, for the best classifier. For real signals, the developed system had a CCr of 89.4% with an Er of 3.9% compared to 84.6% and 7.1%, respectively, for the best classifier. PMID:19171524

Timely and cost-effective analytics over social network has emerged as a key ingredient for success in many businesses and government endeavors. Community detection is an active research area of relevance to analyze online social network. The problem of selecting a particular community detection algorithm is crucial if the aim is to unveil the community structure of a network. The choice of a given methodology could affect the outcome of the experiments because different algorithms have different advantages and depend on tuning specific parameters. In this paper, we propose a community division model based on the notion of game theory, which can combine advantages of previous algorithms effectively to get a better community classification result. By making experiments on some standard dataset, it verifies that our community detection model based on game theory is valid and better.

Machine learning techniques have been widely applied to solve the problem of predicting protein secondary structure from the amino acid sequence. They have gained substantial success in this research area. Many methods have been used including k-Nearest Neighbors (k-NNs), Hidden Markov Models (HMMs), Artificial Neural Networks (ANNs) and Support Vector Machines (SVMs), which have attracted attention recently. Today, the main goal remains to improve the prediction quality of the secondary structure elements. The prediction accuracy has been continuously improved over the years, especially by using hybrid or ensemble methods and incorporating evolutionary information in the form of profiles extracted from alignments of multiple homologous sequences. In this paper, we investigate how best to combine k-NNs, ANNs and Multi-class SVMs (M-SVMs) to improve secondary structure prediction of globular proteins. An ensemble method which combines the outputs of two feed-forward ANNs, k-NN and three M-SVM classifiers has been applied. Ensemble members are combined using two variants of majority voting rule. An heuristic based filter has also been applied to refine the prediction. To investigate how much improvement the general ensemble method can give rather than the individual classifiers that make up the ensemble, we have experimented with the proposed system on the two widely used benchmark datasets RS126 and CB513 using cross-validation tests by including PSI-BLAST position-specific scoring matrix (PSSM) profiles as inputs. The experimental results reveal that the proposed system yields significant performance gains when compared with the best individual classifier. PMID:22058650

We study the ensemble velocity of non-processive motor proteins, described with multiple chemical states. In particular, we discuss the velocity as a function of ATP concentration. Even a simple model which neglects the strain dependence of transition rates, reverse transition rates and nonlinearities in the elasticity can show interesting functional dependencies, which deviate significantly from the frequently assumed Michaelis–Menten form. We discuss how the order of events in the duty cycle can be inferred from the measured dependence. The model also predicts the possibility of velocity reversal at a certain ATP concentration if the duty cycle contains several conformational changes of opposite directionalities. PMID:25485083

In the ensemble-based sequential data assimilation, the probability density function (PDF) at each time step is represented by ensemble members. These ensemble members are usually assumed to be Monte Carlo samples drawn from the PDF, and the probability density is associated with the concentration of the ensemble members. On the basis of the Monte Carlo approximation, the forecast ensemble, which is obtained by applying the dynamical model to each ensemble member, provides an approximation of the forecast PDF on the basis of the Chapman-Kolmogorov integral. In practical cases, however, the ensemble size is limited by available computational resources, and it is typically much less than the system dimension. In such situations, the Monte Carlo approximation would not well work. When the ensemble size is less than the system dimension, the ensemble would form a simplex in a subspace. The simplex can not represent the third or higher-order moments of the PDF, but it can represent only the Gaussian features of the PDF. As noted by Wang et al. (2004), the forecast ensemble, which is obtained by applying the dynamical model to each member of the simplex ensemble, provides an approximation of the mean and covariance of the forecast PDF where the Taylor expansion of the dynamical model up to the second-order is considered except that the uncertainties which can not represented by the ensemble members are ignored. Since the third and higher-order nonlinearity is discarded, the forecast ensemble would provide some bias to the forecast. Using a small nonlinear model, the Lorenz 63 model, we also performed the experiment of the state estimation with both the simplex representation and the Monte Carlo representation, which corresponds to the limited-sized ensemble case and the large-sized ensemble case, respectively. If we use the simplex representation, it is found that the estimates tend to have some bias which is likely to be caused by the nonlinearity of the system rather

Application of sensor-based technology within activity monitoring systems is becoming a popular technique within the smart environment paradigm. Nevertheless, the use of such an approach generates complex constructs of data, which subsequently requires the use of intricate activity recognition techniques to automatically infer the underlying activity. This paper explores a cluster-basedensemble method as a new solution for the purposes of activity recognition within smart environments. With this approach activities are modelled as collections of clusters built on different subsets of features. A classification process is performed by assigning a new instance to its closest cluster from each collection. Two different sensor data representations have been investigated, namely numeric and binary. Following the evaluation of the proposed methodology it has been demonstrated that the cluster-basedensemble method can be successfully applied as a viable option for activity recognition. Results following exposure to data collected from a range of activities indicated that the ensemble method had the ability to perform with accuracies of 94.2% and 97.5% for numeric and binary data, respectively. These results outperformed a range of single classifiers considered as benchmarks. PMID:25014095

Application of sensor-based technology within activity monitoring systems is becoming a popular technique within the smart environment paradigm. Nevertheless, the use of such an approach generates complex constructs of data, which subsequently requires the use of intricate activity recognition techniques to automatically infer the underlying activity. This paper explores a cluster-basedensemble method as a new solution for the purposes of activity recognition within smart environments. With this approach activities are modelled as collections of clusters built on different subsets of features. A classification process is performed by assigning a new instance to its closest cluster from each collection. Two different sensor data representations have been investigated, namely numeric and binary. Following the evaluation of the proposed methodology it has been demonstrated that the cluster-basedensemble method can be successfully applied as a viable option for activity recognition. Results following exposure to data collected from a range of activities indicated that the ensemble method had the ability to perform with accuracies of 94.2% and 97.5% for numeric and binary data, respectively. These results outperformed a range of single classifiers considered as benchmarks. PMID:25014095

This paper presents a new supervised method for segmentation of blood vessels in retinal photographs. This method uses an ensemble system of bagged and boosted decision trees and utilizes a feature vector based on the orientation analysis of gradient vector field, morphological transformation, line strength measures, and Gabor filter responses. The feature vector encodes information to handle the healthy as well as the pathological retinal image. The method is evaluated on the publicly available DRIVE and STARE databases, frequently used for this purpose and also on a new public retinal vessel reference dataset CHASE_DB1 which is a subset of retinal images of multiethnic children from the Child Heart and Health Study in England (CHASE) dataset. The performance of the ensemble system is evaluated in detail and the incurred accuracy, speed, robustness, and simplicity make the algorithm a suitable tool for automated retinal image analysis. PMID:22736688

Satellite communication antennas are key devices of a measurement ship to support voice, data, fax and video integration services. Condition monitoring of mechanical equipment from the vibration measurement data is significant for guaranteeing safe operation and avoiding the unscheduled breakdown. So, condition monitoring system for ship-based satellite communication antennas is designed and developed. Planetary gearboxes play an important role in the transmission train of satellite communication antenna. However, condition monitoring of planetary gearbox still faces challenges due to complexity and weak condition feature. This paper provides a possibility for planetary gearbox condition monitoring by proposing ensemble a multiwavelet analysis method. Benefit from the property on multi-resolution analysis and the multiple wavelet basis functions, multiwavelet has the advantage over characterizing the non-stationary signal. In order to realize the accurate detection of the condition feature and multi-resolution analysis in the whole frequency band, adaptive multiwavelet basis function is constructed via increasing multiplicity and then vibration signal is processed by the ensemble multiwavelet transform. Finally, normalized ensemble multiwavelet transform information entropy is computed to describe the condition of planetary gearbox. The effectiveness of proposed method is first validated through condition monitoring of experimental planetary gearbox. Then this method is used for planetary gearbox condition monitoring of ship-based satellite communication antennas and the results support its feasibility.

The incorporation of numerical weather predictions (NWP) into a flood forecasting system can increase forecast lead times from a few hours to a few days. A single NWP forecast from a single forecast centre, however, is insufficient as it involves considerable non-predictable uncertainties and lead to a high number of false alarms. The availability of global ensemble numerical weather prediction systems through the THORPEX Interactive Grand Global Ensemble' (TIGGE) offers a new opportunity for flood forecast. The Grid-Xinanjiang distributed hydrological model, which is based on the Xinanjiang model theory and the topographical information of each grid cell extracted from the Digital Elevation Model (DEM), is coupled with ensemble weather predictions based on the TIGGE database (CMC, CMA, ECWMF, UKMO, NCEP) for flood forecast. This paper presents a case study using the coupled flood forecasting model on the Xixian catchment (a drainage area of 8826 km2) located in Henan province, China. A probabilistic discharge is provided as the end product of flood forecast. Results show that the association of the Grid-Xinanjiang model and the TIGGE database gives a promising tool for an early warning of flood events several days ahead.

This paper presents a new approach to detect and segment liver tumors. The detection and segmentation of liver tumors can be formulized as novelty detection or two-class classification problem. Each voxel is characterized by a rich feature vector, and a classifier using random feature subspace ensemble is trained to classify the voxels. Since Extreme Learning Machine (ELM) has advantages of very fast learning speed and good generalization ability, it is chosen to be the base classifier in the ensemble. Besides, majority voting is incorporated for fusion of classification results from the ensemble of base classifiers. In order to further increase testing accuracy, ELM autoencoder is implemented as a pre-training step. In automatic liver tumor detection, ELM is trained as a one-class classifier with only healthy liver samples, and the performance is compared with two-class ELM. In liver tumor segmentation, a semi-automatic approach is adopted by selecting samples in 3D space to train the classifier. The proposed method is tested and evaluated on a group of patients' CT data and experiment show promising results. PMID:25571035

Surrogate models are widely used to develop computationally efficient simulation-optimization models to solve complex groundwater management problems. Artificial intelligence based models are most often used for this purpose where they are trained using predictor-predictand data obtained from a numerical simulation model. Most often this is implemented with the assumption that the parameters and boundary conditions used in the numerical simulation model are perfectly known. However, in most practical situations these values are uncertain. Under these circumstances the application of such approximation surrogates becomes limited. In our study we develop a surrogate model based coupled simulation optimization methodology for determining optimal pumping strategies for coastal aquifers considering parameter uncertainty. An ensemble surrogate modeling approach is used along with multiple realization optimization. The methodology is used to solve a multi-objective coastal aquifer management problem considering two conflicting objectives. Hydraulic conductivity and the aquifer recharge are considered as uncertain values. Three dimensional coupled flow and transport simulation model FEMWATER is used to simulate the aquifer responses for a number of scenarios corresponding to Latin hypercube samples of pumping and uncertain parameters to generate input-output patterns for training the surrogate models. Non-parametric bootstrap sampling of this original data set is used to generate multiple data sets which belong to different regions in the multi-dimensional decision and parameter space. These data sets are used to train and test multiple surrogate models based on genetic programming. The ensemble of surrogate models is then linked to a multi-objective genetic algorithm to solve the pumping optimization problem. Two conflicting objectives, viz, maximizing total pumping from beneficial wells and minimizing the total pumping from barrier wells for hydraulic control of

Super Ensemble (ensemble of ten turbulence metrics from time-lagged ensemble members of weather forecast data)-based Aviation Turbulence Guidance (SEATG) is developed using Weather Research and Forecasting (WRF) model and in-situ eddy dissipation rate (EDR) observations equipped on commercial aircraft over the contiguous United States. SEATG is a sequence of five procedures including weather modeling, calculating turbulence metrics, mapping EDR-scale, evaluating metrics, and producing final SEATG forecast. This uses similar methodology to the operational Graphic Turbulence Guidance (GTG) with three major improvements. First, SEATG use a higher resolution (3-km) WRF model to capture cloud-resolving scale phenomena. Second, SEATG computes turbulence metrics for multiple forecasts that are combined at the same valid time resulting in an time-lagged ensemble of multiple turbulence metrics. Third, SEATG provides both deterministic and probabilistic turbulence forecasts to take into account weather uncertainties and user demands. It is found that the SEATG forecasts match well with observed radar reflectivity along a surface front as well as convectively induced turbulence outside the clouds on 7-8 Sep 2012. And, overall performance skill of deterministic SEATG against the observed EDR data during this period is superior to any single turbulence metrics. Finally, probabilistic SEATG is used as an example application of turbulence forecast for air-traffic management. In this study, a simple Wind-Optimal Route (WOR) passing through the potential areas of probabilistic SEATG and Lateral Turbulence Avoidance Route (LTAR) taking into account the SEATG are calculated at z = 35000 ft (z = 12 km) from Los Angeles to John F. Kennedy international airports. As a result, WOR takes total of 239 minutes with 16 minutes of SEATG areas for 40% of moderate turbulence potential, while LTAR takes total of 252 minutes travel time that 5% of fuel would be additionally consumed to entirely

The short and medium range operational forecasts, warning and alarm of the severe weather are one of the most important activities of the Hungarian Meteorological Service. Our study provides comprehensive summary of newly developed methods based on ECMWF ensemble forecasts to assist successful prediction of the convective weather situations. . In the first part of the study a brief overview is given about the components of atmospheric convection, which are the atmospheric lifting force, convergence and vertical wind shear. The atmospheric instability is often used to characterize the so-called instability index; one of the most popular and often used indexes is the convective available potential energy. Heavy convective events, like intensive storms, supercells and tornadoes are needed the vertical instability, adequate moisture and vertical wind shear. As a first step statistical studies of these three parameters are based on nine years time series of 51-member ensemble forecasting model based on convective summer time period, various statistical analyses were performed. Relationship of the rate of the convective and total precipitation and above three parameters was studied by different statistical methods. Four new visualization methods were applied for supporting successful forecasts of severe weathers. Two of the four visualization methods the ensemble meteogram and the ensemble vertical profiles had been available at the beginning of our work. Both methods show probability of the meteorological parameters for the selected location. Additionally two new methods have been developed. First method provides probability map of the event exceeding predefined values, so the incident of the spatial uncertainty is well-defined. The convective weather events are characterized by the incident of space often rhapsodic occurs rather have expected the event area can be selected so that the ensemble forecasts give very good support. Another new visualization tool shows time

The Ensembl project (http://www.ensembl.org) provides genome resources for chordate genomes with a particular focus on human genome data as well as data for key model organisms such as mouse, rat and zebrafish. Five additional species were added in the last year including gibbon (Nomascus leucogenys) and Tasmanian devil (Sarcophilus harrisii) bringing the total number of supported species to 61 as of Ensembl release 64 (September 2011). Of these, 55 species appear on the main Ensembl website and six species are provided on the Ensembl preview site (Pre!Ensembl; http://pre.ensembl.org) with preliminary support. The past year has also seen improvements across the project. PMID:22086963

The development of magnetoresistive sensors based on magnetic nanoparticles which are immersed in conductive gel matrices requires detailed information about the corresponding magnetoresistive properties in order to obtain optimal sensor sensitivities. Here, crucial parameters are the particle concentration, the viscosity of the gel matrix and the particle structure. Experimentally, it is not possible to obtain detailed information about the magnetic microstructure, i.e., orientations of the magnetic moments of the particles that define the magnetoresistive properties, however, by using numerical simulations one can study the magnetic microstructure theoretically, although this requires performing classical spin dynamics and molecular dynamics simulations simultaneously. Here, we present such an approach which allows us to calculate the orientation and the trajectory of every single magnetic nanoparticle. This enables us to study not only the static magnetic microstructure, but also the dynamics of the structuring process in the gel matrix itself. With our hybrid approach, arbitrary sensor configurations can be investigated and their magnetoresistive properties can be optimized. PMID:26580623

This work investigates the added value of ensembles constructed from seventeen lumped hydrological models against their simple average counterparts. It is thus hypothesized that there is more information provided by all the outputs of these models than by their single aggregated predictors. For all available 1061 catchments, results showed that the mean continuous ranked probability score of the ensemble simulations were better than the mean average error of the aggregated simulations, confirming the added value of retaining all the components of the model outputs. Reliability of the simulation ensembles is also achieved for about 30% of the catchments, as assessed by rank histograms and reliability plots. Nonetheless this imperfection, the ensemble simulations were shown to have better skills than the deterministic simulations at discriminating between events and non-events, as confirmed by relative operating characteristic scores especially for larger streamflows. From 7 to 10 models are deemed sufficient to construct ensembles with improved performance, based on a genetic algorithm search optimizing the continuous ranked probability score. In fact, many model subsets were found improving the performance of the reference ensemble. This is thus not essential to implement as much as seventeen lumped hydrological models. The gain in performance of the optimized subsets is accompanied by some improvement of the ensemble reliability in most cases. Nonetheless, a calibration of the predictive distribution is still needed for many catchments.

This work investigates the added value of ensembles constructed from seventeen lumped hydrological models against their simple average counterparts. It is thus hypothesized that there is more information provided by all the outputs of these models than by their single aggregated predictors. For all available 1061 catchments, results showed that the mean continuous ranked probability score of the ensemble simulations were better than the mean average error of the aggregated simulations, confirming the added value of retaining all the components of the model outputs. Reliability of the simulation ensembles is also achieved for about 30% of the catchments, as assessed by rank histograms and reliability plots. Nonetheless this imperfection, the ensemble simulations were shown to have better skills than the deterministic simulations at discriminating between events and non-events, as confirmed by relative operating characteristic scores especially for larger streamflows. From 7 to 10 models are deemed sufficient to construct ensembles with improved performance, based on a genetic algorithm search optimizing the continuous ranked probability score. In fact, many model subsets were found improving the performance of the reference ensemble. This is thus not essential to implement as much as seventeen lumped hydrological models. The gain in performance of the optimized subsets is accompanied by some improvement of the ensemble reliability in most cases. Nonetheless, a calibration of the predictive distribution is still needed for many catchments.

This article investigates an ensemble-based technique called Bayesian Model Averaging (BMA) to improve the performance of protein amino acid pKa predictions. Structure-based pKa calculations play an important role in the mechanistic interpretation of protein structure and are also used to determine a wide range of protein properties. A diverse set of methods currently exist for pKa prediction, ranging from empirical statistical models to ab initio quantum mechanical approaches. However, each of these methods are based on a set of conceptual assumptions that can effect a model's accuracy and generalizability for pKa prediction in complicated biomolecular systems. We use BMA to combine eleven diverse prediction methods that each estimate pKa values of amino acids in staphylococcal nuclease. These methods are based on work conducted for the pKa Cooperative and the pKa measurements are based on experimental work conducted by the García-Moreno lab. Our cross-validation study demonstrates that the aggregated estimate obtained from BMA outperforms all individual prediction methods with improvements ranging from 45 to 73% over other method classes. This study also compares BMA's predictive performance to other ensemble-based techniques and demonstrates that BMA can outperform these approaches with improvements ranging from 27 to 60%. This work illustrates a new possible mechanism for improving the accuracy of pKa prediction and lays the foundation for future work on aggregate models that balance computational cost with prediction accuracy. PMID:23946048

This paper presents a simple quantum memory method for efficient storage and retrieve of light. The technique is based on the principle of controlled reversible inhomogeneous broadening for which the information of the quantum state light is imprinted in a two-level atoms ensemble and recalled by flipping the external nonuniform electric field. In present work, the induced Stark shift varied linearly with position, and a numerical analysis for this protocol has been studied. It shows that the storage efficiency can nearly reach 100% with a large enough optical depth, and the optimal broadening for a given pulse width is also analyzed.

We propose a scheme to realize optical quantum memories in an ensemble of nitrogen-vacancy centers in diamond that are coupled to a microcavity. The scheme is based on off-resonant Raman coupling, which allows one to circumvent optical inhomogeneous broadening and store optical photons in the electronic spin coherence. This approach promises a storage time of order 1 s and a time-bandwidth product of order 107. We include all possible optical transitions in a nine-level configuration, numerically evaluate the efficiencies, and discuss the requirements for achieving high efficiency and fidelity.

The risks and damages associated with coastal flooding that are naturally associated with an increase in the magnitude of extreme storm surges are one of the largest concerns of countries with extensive low-lying nearshore areas. The relevant risks are even more contrast for semi-enclosed water bodies such as the Baltic Sea where subtidal (weekly-scale) variations in the water volume of the sea substantially contribute to the water level and lead to large spreading of projections of future extreme water levels. We explore the options for using large ensembles of projections to more reliably evaluate return periods of extreme water levels. Single projections of the ensemble are constructed by means of fitting several sets of block maxima with various extreme value distributions. The ensemble is based on two simulated data sets produced in the Swedish Meteorological and Hydrological Institute. A hindcast by the Rossby Centre Ocean model is sampled with a resolution of 6 h and a similar hindcast by the circulation model NEMO with a resolution of 1 h. As the annual maxima of water levels in the Baltic Sea are not always uncorrelated, we employ maxima for calendar years and for stormy seasons. As the shape parameter of the Generalised Extreme Value distribution changes its sign and substantially varies in magnitude along the eastern coast of the Baltic Sea, the use of a single distribution for the entire coast is inappropriate. The ensemble involves projections based on the Generalised Extreme Value, Gumbel and Weibull distributions. The parameters of these distributions are evaluated using three different ways: maximum likelihood method and method of moments based on both biased and unbiased estimates. The total number of projections in the ensemble is 40. As some of the resulting estimates contain limited additional information, the members of pairs of projections that are highly correlated are assigned weights 0.6. A comparison of the ensemble-based projection of

Reverse engineering approaches to constructing gene regulatory networks (GRNs) based on genome-wide mRNA expression data have led to significant biological findings, such as the discovery of novel drug targets. However, the reliability of the reconstructed GRNs needs to be improved. Here, we propose an ensemble-based network aggregation approach to improving the accuracy of network topologies constructed from mRNA expression data. To evaluate the performances of different approaches, we created dozens of simulated networks from combinations of gene-set sizes and sample sizes and also tested our methods on three Escherichia coli datasets. We demonstrate that the ensemble-based network aggregation approach can be used to effectively integrate GRNs constructed from different studies – producing more accurate networks. We also apply this approach to building a network from epithelial mesenchymal transition (EMT) signature microarray data and identify hub genes that might be potential drug targets. The R code used to perform all of the analyses is available in an R package entitled “ENA”, accessible on CRAN (http://cran.r-project.org/web/packages/ENA/). PMID:25390635

A neural network basedensemble methodology was presented in this study to improve the accuracy of meteorological input fields for regional air quality modeling. Through nonlinear integration of simulation results from two meteorological models (MM5 and WRF), the ensemble approach focused on the optimization of meteorological variable values (temperature, surface air pressure, and wind field) in the vertical layer near ground. To illustrate the proposed approach, a case study in northern China during two selected air pollution events, in 2006, was conducted. The performances of the MM5, the WRF, and the ensemble approach were assessed using different statistical measures. The results indicated that the ensemble approach had a higher simulation accuracy than the MM5 and the WRF model. Performance was improved by more than 12.9% for temperature, 18.7% for surface air pressure field, and 17.7% for wind field. The atmospheric PM(10) concentrations in the study region were also simulated by coupling the air quality model CMAQ with the MM5 model, the WRF model, and the ensemble model. It was found that the modeling accuracy of the ensemble-CMAQ model was improved by more than 7.0% and 17.8% when compared to the MM5-CMAQ and the WRF-CMAQ models, respectively. The proposed neural network based meteorological modeling approach holds great potential for improving the performance of regional air quality modeling. PMID:23000477

Cyclogenesis and long-fetched winds along the southeastern coast of South America may lead to floods in populated areas, as the Buenos Aires Province, with important economic and social impacts. A numerical model (SMARA) has already been implemented in the region to forecast storm surges. The propagation time of the surge in such extensive and shallow area allows the detection of anomalies based on observations from several hours up to the order of a day prior to the event. Here, we investigate the impact and potential benefit of storm surge level data assimilation into the SMARA model, with the objective of improving the forecast. In the experiments, the surface wind stress from an ensemble prediction system drives a storm surge model ensemble, based on the operational 2-D depth-averaged SMARA model. A 4-D Local Ensemble Transform Kalman Filter (4D-LETKF) initializes the ensemble in a 6-h cycle, assimilating the very few tide gauge observations available along the northern coast and satellite altimeter data. The sparse coverage of the altimeters is a challenge to data assimilation; however, the 4D-LETKF evolving covariance of the ensemble perturbations provides realistic cross-track analysis increments. Improvements on the forecast ensemble mean show the potential of an effective use of the sparse satellite altimeter and tidal gauges observations in the data assimilation prototype. Furthermore, the effects of the localization scale and of the observational errors of coastal altimetry and tidal gauges in the data assimilation approach are assessed.

In order to cope with the steady decline of the number of in situ gauges worldwide, there is a growing need for alternative methods to estimate runoff. We present an Ensemble Kalman Filter based approach that allows us to conclude on runoff for poorly or irregularly gauged basins. The approach focuses on the application of publicly available global hydrometeorological data sets for precipitation (GPCC, GPCP, CRU, UDEL), evapotranspiration (MODIS, FLUXNET, GLEAM, ERA interim, GLDAS), and water storage changes (GRACE, WGHM, GLDAS, MERRA LAND). Furthermore, runoff data from the GRDC and satellite altimetry derived estimates are used. We follow a least squares prediction that exploits the joint temporal and spatial auto- and cross-covariance structures of precipitation, evapotranspiration, water storage changes and runoff. We further consider time-dependent uncertainty estimates derived from all data sets. Our in-depth analysis comprises of 29 large river basins of different climate regions, with which runoff is predicted for a subset of 16 basins. Six configurations are analyzed: the Ensemble Kalman Filter (Smoother) and the hard (soft) Constrained Ensemble Kalman Filter (Smoother). Comparing the predictions to observed monthly runoff shows correlations larger than 0.5, percentage biases lower than ± 20%, and NSE-values larger than 0.5. A modified NSE-metric, stressing the difference to the mean annual cycle, shows an improvement of runoff predictions for 14 of the 16 basins. The proposed method is able to provide runoff estimates for nearly 100 poorly gauged basins covering an area of more than 11,500,000 km2 with a freshwater discharge, in volume, of more than 125,000 m3/s.

A global ocean data assimilation system based on the ensemble optimum interpolation (EnOI) has been under development as the Chinese contribution to the Global Ocean Data Assimilation Experiment. The system uses a global ocean general circulation model, which is eddy permitting, developed by the Institute of Atmospheric Physics of the Chinese Academy of Sciences. In this paper, the implementation of the system is described in detail. We describe the sampling strategy to generate the stationary ensembles for EnOI. In addition, technical methods are introduced to deal with the requirement of massive memory space to hold the stationary ensembles of the global ocean. The system can assimilate observations such as satellite altimetry, sea surface temperature (SST), in situ temperature and salinity from Argo, XBT, Tropical Atmosphere Ocean (TAO), and other sources in a straightforward way. As a first step, an assimilation experiment from 1997 to 2001 is carried out by assimilating the sea level anomaly (SLA) data from TOPEX/Poseidon. We evaluate the performance of the system by comparing the results with various types of observations. We find that SLA assimilation shows very positive impact on the modeled fields. The SST and sea surface height fields are clearly improved in terms of both the standard deviation and the root mean square difference. In addition, the assimilation produces some improvements in regions where mesoscale processes cannot be resolved with the horizontal resolution of this model. Comparisons with TAO profiles in the Pacific show that the temperature and salinity fields have been improved to varying degrees in the upper ocean. The biases with respect to the independent TAO profiles are reduced with a maximum magnitude of about 0.25°C and 0.1 psu for the time-averaged temperature and salinity. The improvements on temperature and salinity also lead to positive impact on the subsurface currents. The equatorial under current is enhanced in the Pacific

One important method to obtain the continuous surfaces of soil properties from point samples is spatial interpolation. In this paper, we propose a method that combines ensemble learning with ancillary environmental information for improved interpolation of soil properties (hereafter, EL-SP). First, we calculated the trend value for soil potassium contents at the Qinghai Lake region in China based on measured values. Then, based on soil types, geology types, land use types, and slope data, the remaining residual was simulated with the ensemble learning model. Next, the EL-SP method was applied to interpolate soil potassium contents at the study site. To evaluate the utility of the EL-SP method, we compared its performance with other interpolation methods including universal kriging, inverse distance weighting, ordinary kriging, and ordinary kriging combined geographic information. Results show that EL-SP had a lower mean absolute error and root mean square error than the data produced by the other models tested in this paper. Notably, the EL-SP maps can describe more locally detailed information and more accurate spatial patterns for soil potassium content than the other methods because of the combined use of different types of environmental information; these maps are capable of showing abrupt boundary information for soil potassium content. Furthermore, the EL-SP method not only reduces prediction errors, but it also compliments other environmental information, which makes the spatial interpolation of soil potassium content more reasonable and useful. PMID:25928138

Combining randomization methods with ensemble prediction is emerging as an effective option to balance accuracy and computational efficiency in data-driven modelling. In this paper, we investigate the prediction capability of extremely randomized trees (Extra-Trees), in terms of accuracy, explanation ability and computational efficiency, in a streamflow modelling exercise. Extra-Trees are a totally randomized tree-basedensemble method that (i) alleviates the poor generalisation property and tendency to overfitting of traditional standalone decision trees (e.g. CART); (ii) is computationally efficient; and, (iii) allows to infer the relative importance of the input variables, which might help in the ex-post physical interpretation of the model. The Extra-Trees potential is analysed on two real-world case studies - Marina catchment (Singapore) and Canning River (Western Australia) - representing two different morphoclimatic contexts. The evaluation is performed against other tree-based methods (CART and M5) and parametric data-driven approaches (ANNs and multiple linear regression). Results show that Extra-Trees perform comparatively well to the best of the benchmarks (i.e. M5) in both the watersheds, while outperforming the other approaches in terms of computational requirement when adopted on large datasets. In addition, the ranking of the input variable provided can be given a physically meaningful interpretation.

Combining randomization methods with ensemble prediction is emerging as an effective option to balance accuracy and computational efficiency in data-driven modeling. In this paper we investigate the prediction capability of extremely randomized trees (Extra-Trees), in terms of accuracy, explanation ability and computational efficiency, in a streamflow modeling exercise. Extra-Trees are a totally randomized tree-basedensemble method that (i) alleviates the poor generalization property and tendency to overfitting of traditional standalone decision trees (e.g. CART); (ii) is computationally very efficient; and, (iii) allows to infer the relative importance of the input variables, which might help in the ex-post physical interpretation of the model. The Extra-Trees potential is analyzed on two real-world case studies (Marina catchment (Singapore) and Canning River (Western Australia)) representing two different morphoclimatic contexts comparatively with other tree-based methods (CART and M5) and parametric data-driven approaches (ANNs and multiple linear regression). Results show that Extra-Trees perform comparatively well to the best of the benchmarks (i.e. M5) in both the watersheds, while outperforming the other approaches in terms of computational requirement when adopted on large datasets. In addition, the ranking of the input variable provided can be given a physically meaningful interpretation.

Based on the combination of uninformative variable elimination (UVE), bootstrap and mutual information (MI), a simple ensemble algorithm, named ESPLS, is proposed for spectral multivariate calibration (MVC). In ESPLS, those uninformative variables are first removed; and then a preparatory training set is produced by bootstrap, on which a MI spectrum of retained variables is calculated. The variables that exhibit higher MI than a defined threshold form a subspace on which a candidate partial least-squares (PLS) model is constructed. This process is repeated. After a number of candidate models are obtained, a small part of models is picked out to construct an ensemble model by simple/weighted average. Four near/mid-infrared (NIR/MIR) spectral datasets concerning the determination of six components are used to verify the proposed ESPLS. The results indicate that ESPLS is superior to UVEPLS and its combination with MI-based variable selection (SPLS) in terms of both the accuracy and robustness. Besides, from the perspective of end-users, ESPLS does not increase the complexity of a calibration when enhancing its performance.

One important method to obtain the continuous surfaces of soil properties from point samples is spatial interpolation. In this paper, we propose a method that combines ensemble learning with ancillary environmental information for improved interpolation of soil properties (hereafter, EL-SP). First, we calculated the trend value for soil potassium contents at the Qinghai Lake region in China based on measured values. Then, based on soil types, geology types, land use types, and slope data, the remaining residual was simulated with the ensemble learning model. Next, the EL-SP method was applied to interpolate soil potassium contents at the study site. To evaluate the utility of the EL-SP method, we compared its performance with other interpolation methods including universal kriging, inverse distance weighting, ordinary kriging, and ordinary kriging combined geographic information. Results show that EL-SP had a lower mean absolute error and root mean square error than the data produced by the other models tested in this paper. Notably, the EL-SP maps can describe more locally detailed information and more accurate spatial patterns for soil potassium content than the other methods because of the combined use of different types of environmental information; these maps are capable of showing abrupt boundary information for soil potassium content. Furthermore, the EL-SP method not only reduces prediction errors, but it also compliments other environmental information, which makes the spatial interpolation of soil potassium content more reasonable and useful. PMID:25928138

CLARIS-LPB was an EU FP7 financed Europe-South America Network for Climate Change Assessment and Impact Studies in La Plata Basin. CLARIS-LPB has created the first ensemble ever of RCM downscalings over South America. Here we present the climate change scenarios for a near future period (2011-2040) and for a far future period (2071-2100). The ensemble is based on seven RCMs driven by three CMIP3 GCMs for emission scenario SRES A1B. The RCM model domains cover all of South America, with a horizontal resolution of approximately 50 km, but project focus has been on results over the La Plata Basin. The ensemble mean for temperature change shows more warming over tropical South America than over the southern part of the continent. During summer (DJF) the Low-Parana and Uruguay regions show less warming than the surrounding regions. For the ensemble mean of precipitation changes the patterns are almost the same for near and far future but with larger values for far future. Thus overall trends do not change with time. The near future shows in general small changes over large areas (less than ±10%). For JJA a dry tendency is seen over eastern Brazil that becomes stronger and extends geographically with time. In near future most models show a drying trend over this area. In far future almost all models agree on the drying. For DJF a wet tendency is seen over the La Plata basin area which becomes stronger with time. In near future almost all downscalings agree on this wet tendency and in far future all downscalings agree on the sign. The RCM ensemble is unbalanced with respect to forcing GCMs. 6 out of 11(10) simulations use ECHAM5 for the near(far) future period while 4(3) use HadCM3 and only one IPSL. Thus, all ensemble mean values will be tilted towards ECHAM5. It is of course possible to compensate for this imbalance among GCMs by some weighting but no such weighting has been applied for the current analysis. The north-south gradient in warming is in general stronger in

DEER (double electron-electron resonance) is a powerful pulsed ESR (electron spin resonance) technique allowing the determination of distance histograms between pairs of nitroxide spin-labels linked to a protein in a native-like solution environment. However, exploiting the huge amount of information provided by ESR/DEER histograms to refine structural models is extremely challenging. In this study, a restrained ensemble (RE) molecular dynamics (MD) simulation methodology is developed to address this issue. In RE simulation, the spin-spin distance distribution histograms calculated from a multiple-copy MD simulation are enforced, via a global ensemble-based energy restraint, to match those obtained from ESR/DEER experiments. The RE simulation is applied to 51 ESR/DEER distance histogram data from spin-labels inserted at 37 different positions in T4 lysozyme (T4L). The rotamer population distribution along the five dihedral angles connecting the nitroxide ring to the protein backbone is determined and shown to be consistent with available information from X-ray crystallography. For the purpose of structural refinement, the concept of a simplified nitroxide dummy spin-label is designed and parametrized on the basis of these all-atom RE simulations with explicit solvent. It is demonstrated that RE simulations with the dummy nitroxide spin-labels imposing the ESR/DEER experimental distance distribution data are able to systematically correct and refine a series of distorted T4L structures, while simple harmonic distance restraints are unsuccessful. This computationally efficient approach allows experimental restraints from DEER experiments to be incorporated into RE simulations for efficient structural refinement. PMID:23510103

Subsurface aquifer characterization often involves high parameter dimensionality and requires tremendous computational resources if employing a full Bayesian approach. Ensemble-based data assimilation techniques, including filtering and smoothing, are computationally efficient alternatives. Despite the increasing number of applications of ensemble-based methods in assimilating flow and transport related data for subsurface aquifer charaterization, most are limited to either synthetic studies or two-dimensional problems. In this study, we applied ensemble-based techniques for assimilating field tracer experimental data obtained from the Integrated Field Research Challenge (IFRC) site at the Hanford 300 Area. The forward problem was simulated using the massively-parallel three-dimensional flow and transport code PFLOTRAN to effectively deal with the highly transient flow boundary conditions at the site and to meet the computational demands of ensemble-based methods. This study demonstrates the effectiveness of ensemble-based methods for characterizing a heterogeneous aquifer by sequentially assimilating multiple types of data. The necessity of employing high performance computing is shown to enable increasingly mechanistic non-linear forward simulations to be performed within the data assimilation framework for a complex system with reasonable turnaround time.

In biology, nucleic acids are carriers of molecular information: DNA's base sequence stores and imparts genetic instructions, while RNA's sequence plays the role of a messenger and a regulator of gene expression. As biopolymers, nucleic acids also have exciting physicochemical properties, which can be rationally influenced by the base sequence in myriad ways. Consequently, in recent years nucleic acids have also become important building blocks for bottom-up nanotechnology: as molecules for the self-assembly of molecular nanostructures and also as a material for building machinelike nanodevices. In this Review we will cover the most important developments in this growing field of nucleic acid nanodevices. We also provide an overview of the biochemical and biophysical background of this field and the major "historical" influences that shaped its development. Particular emphasis is laid on DNA molecular motors, molecular robotics, molecular information processing, and applications of nucleic acid nanodevices in biology. PMID:21432950

This paper investigates an ensemble-based technique called Bayesian Model Averaging (BMA) to improve the performance of protein amino acid pKa predictions. Structure-based pKa calculations play an important role in the mechanistic interpretation of protein structure and are also used to determine a wide range of protein properties. A diverse set of methods currently exist for pKa prediction, ranging from empirical statistical models to ab initio quantum mechanical approaches. However, each of these methods are based on a set of conceptual assumptions that can effect a model’s accuracy and generalizability for pKa prediction in complicated biomolecular systems. We use BMA to combine eleven diverse prediction methods that each estimate pKa values of amino acids in staphylococcal nuclease. These methods are based on work conducted for the pKa Cooperative and the pKa measurements are based on experimental work conducted by the García-Moreno lab. Our cross-validation study demonstrates that the aggregated estimate obtained from BMA outperforms all individual prediction methods with improvements ranging from 45-73% over other method classes. This study also compares BMA’s predictive performance to other ensemble-based techniques and demonstrates that BMA can outperform these approaches with improvements ranging from 27-60%. This work illustrates a new possible mechanism for improving the accuracy of pKa prediction and lays the foundation for future work on aggregate models that balance computational cost with prediction accuracy. PMID:23946048

Ensemble is an open architecture for the development, integration, and deployment of mission operations software. Fundamentally, it is an adaptation of the Eclipse Rich Client Platform (RCP), a widespread, stable, and supported framework for component-based application development. By capitalizing on the maturity and availability of the Eclipse RCP, Ensemble offers a low-risk, politically neutral path towards a tighter integration of operations tools. The Ensemble project is a highly successful, ongoing collaboration among NASA Centers. Since 2004, the Ensemble project has supported the development of mission operations software for NASA's Exploration Systems, Science, and Space Operations Directorates.

Epidemiologic studies utilizing source apportionment (SA) of fine particulate matter have shown that particles from certain sources might be more detrimental to health than others; however, it is difficult to quantify the uncertainty associated with a given SA approach. In the present study, we examined associations between source contributions of fine particulate matter and emergency department visits for pediatric asthma in Atlanta, Georgia (2002–2010) using a novel ensemble-based SA technique. Six daily source contributions from 4 SA approaches were combined into an ensemble source contribution. To better account for exposure uncertainty, 10 source profiles were sampled from their posterior distributions, resulting in 10 time series with daily SA concentrations. For each of these time series, Poisson generalized linear models with varying lag structures were used to estimate the health associations for the 6 sources. The rate ratios for the source-specific health associations from the 10 imputed source contribution time series were combined, resulting in health associations with inflated confidence intervals to better account for exposure uncertainty. Adverse associations with pediatric asthma were observed for 8-day exposure to particles generated from diesel-fueled vehicles (rate ratio = 1.06, 95% confidence interval: 1.01, 1.10) and gasoline-fueled vehicles (rate ratio = 1.10, 95% confidence interval: 1.04, 1.17). PMID:25776011

Proton Exchange Membrane Fuel Cell (PEMFC) is considered the most versatile among available fuel cell technologies, which qualify for diverse applications. However, the large-scale industrial deployment of PEMFCs is limited due to their short life span and high exploitation costs. Therefore, ensuring fuel cell service for a long duration is of vital importance, which has led to Prognostics and Health Management of fuel cells. More precisely, prognostics of PEMFC is major area of focus nowadays, which aims at identifying degradation of PEMFC stack at early stages and estimating its Remaining Useful Life (RUL) for life cycle management. This paper presents a data-driven approach for prognostics of PEMFC stack using an ensemble of constraint based Summation Wavelet- Extreme Learning Machine (SW-ELM) models. This development aim at improving the robustness and applicability of prognostics of PEMFC for an online application, with limited learning data. The proposed approach is applied to real data from two different PEMFC stacks and compared with ensembles of well known connectionist algorithms. The results comparison on long-term prognostics of both PEMFC stacks validates our proposition.

Background Recent biochemical advances have led to inexpensive, time-efficient production of massive volumes of raw genomic data. Traditional machine learning approaches to genome annotation typically rely on large amounts of labeled data. The process of labeling data can be expensive, as it requires domain knowledge and expert involvement. Semi-supervised learning approaches that can make use of unlabeled data, in addition to small amounts of labeled data, can help reduce the costs associated with labeling. In this context, we focus on the problem of predicting splice sites in a genome using semi-supervised learning approaches. This is a challenging problem, due to the highly imbalanced distribution of the data, i.e., small number of splice sites as compared to the number of non-splice sites. To address this challenge, we propose to use ensembles of semi-supervised classifiers, specifically self-training and co-training classifiers. Results Our experiments on five highly imbalanced splice site datasets, with positive to negative ratios of 1-to-99, showed that the ensemble-based semi-supervised approaches represent a good choice, even when the amount of labeled data consists of less than 1% of all training data. In particular, we found that ensembles of co-training and self-training classifiers that dynamically balance the set of labeled instances during the semi-supervised iterations show improvements over the corresponding supervised ensemble baselines. Conclusions In the presence of limited amounts of labeled data, ensemble-based semi-supervised approaches can successfully leverage the unlabeled data to enhance supervised ensembles learned from highly imbalanced data distributions. Given that such distributions are common for many biological sequence classification problems, our work can be seen as a stepping stone towards more sophisticated ensemble-based approaches to biological sequence annotation in a semi-supervised framework. PMID:26356316

The folding and unfolding of protein domains is an apparently cooperative process, but transient intermediates have been detected in some cases. Such (un)folding intermediates are challenging to investigate structurally as they are typically not long-lived and their role in the (un)folding reaction has often been questioned. One of the most well studied (un)folding pathways is that of Drosophila melanogaster Engrailed homeodomain (EnHD): this 61-residue protein forms a three helix bundle in the native state and folds via a helical intermediate. Here we used molecular dynamics simulations to derive sample conformations of EnHD in the native, intermediate, and unfolded states and selected the relevant structural clusters by comparing to small/wide angle X-ray scattering data at four different temperatures. The results are corroborated using residual dipolar couplings determined by NMR spectroscopy. Our results agree well with the previously proposed (un)folding pathway. However, they also suggest that the fully unfolded state is present at a low fraction throughout the investigated temperature interval, and that the (un)folding intermediate is highly populated at the thermal midpoint in line with the view that this intermediate can be regarded to be the denatured state under physiological conditions. Further, the combination of ensemble structural techniques with MD allows for determination of structures and populations of multiple interconverting structures in solution. PMID:25946337

Ab initio molecular dynamics simulations in isobaric-isothermal ensemble have been performed to study the low- and the high-temperature crystalline and liquid phases of cryolite. The temperature induced transitions from the low-temperature solid (α) to the high-temperature solid phase (β) and from the phase β to the liquid phase have been simulated using a series of MD runs performed at gradually increasing temperature. The structure of crystalline and liquid phases is analysed in detail and our computational approach is shown to reliably reproduce the available experimental data for a wide range of temperatures. Relatively frequent reorientations of the AlF6 octahedra observed in our simulation of the phase β explain the thermal disorder in positions of the F- ions observed in X-ray diffraction experiments. The isolated AlF63-, AlF52-, AlF4-, as well as the bridged Al 2 Fm 6 - m ionic entities have been identified as the main constituents of cryolite melt. In accord with the previous high-temperature NMR and Raman spectroscopic experiments, the compound AlF5 2 - has been shown to be the most abundant Al-containing species formed in the melt. The characteristic vibrational frequencies for the AlFn 3 - n species in realistic environment have been determined and the computed values have been found to be in a good agreement with experiment.

Ab initio molecular dynamics simulations in isobaric-isothermal ensemble have been performed to study the low- and the high-temperature crystalline and liquid phases of cryolite. The temperature induced transitions from the low-temperature solid (α) to the high-temperature solid phase (β) and from the phase β to the liquid phase have been simulated using a series of MD runs performed at gradually increasing temperature. The structure of crystalline and liquid phases is analysed in detail and our computational approach is shown to reliably reproduce the available experimental data for a wide range of temperatures. Relatively frequent reorientations of the AlF6 octahedra observed in our simulation of the phase β explain the thermal disorder in positions of the F(-) ions observed in X-ray diffraction experiments. The isolated AlF6(3-), AlF5(2-), AlF4(-), as well as the bridged Al2Fm(6-m) ionic entities have been identified as the main constituents of cryolite melt. In accord with the previous high-temperature NMR and Raman spectroscopic experiments, the compound AlF5(2-) has been shown to be the most abundant Al-containing species formed in the melt. The characteristic vibrational frequencies for the AlFn(3-n) species in realistic environment have been determined and the computed values have been found to be in a good agreement with experiment. PMID:26874492

DNA base extrusion is a crucial component of many biomolecular processes. Elucidating how bases are selectively extruded from the interiors of double-strand DNAs is pivotal to accurately understanding and efficiently sampling this general type of conformational transitions. In this work, the on-the-path random walk (OTPRW) method, which is the first generalized ensemble sampling scheme designed for finite-temperature-string path optimizations, was improved and applied to obtain the minimum free energy path (MFEP) and the free energy profile of a classical B-DNA major-groove base extrusion pathway. Along the MFEP, an intermediate state and the corresponding transition state were located and characterized. The MFEP result suggests that a base-plane-elongation event rather than the commonly focused base-flipping event is dominant in the transition state formation portion of the pathway; and the energetic penalty at the transition state is mainly introduced by the stretching of the Watson-Crick base pair. Moreover to facilitate the essential base-plane-elongation dynamics, the surrounding environment of the flipped base needs to be intimately involved. Further taking the advantage of the extended-dynamics nature of the OTPRW Hamiltonian, an equilibrium generalized ensemble simulation was performed along the optimized path; and based on the collected samples, several base-flipping (opening) angle collective variables were evaluated. In consistence with the MFEP result, the collective variable analysis result reveals that none of these commonly employed flipping (opening) angles alone can adequately represent the base extrusion pathway, especially in the pre-transition-state portion. As further revealed by the collective variable analysis, the base-pairing partner of the extrusion target undergoes a series of in-plane rotations to facilitate the base-plane-elongation dynamics. A base-plane rotation angle is identified to be a possible reaction coordinate to represent

Paramagnetic NMR is a useful technique to study proteins and protein complexes and the use of paramagnetic relaxation enhancement (PRE) for this purpose has become wide-spread. PREs are commonly generated using paramagnetic spin labels (SLs) that contain an unpaired electron in the form of a nitroxide radical, with 1-oxyl-2,2,5,5-tetramethyl-2,5-dihydropyrrol-3-ylmethyl methane thiosulfonate (MTSL) being the most popular tag. The inherent flexibility of the SL causes sampling of several conformations in solution, which can be problematic as over- or underestimation of the spatial distribution of the unpaired electron in structural calculations will lead to errors in the distance restraints. We investigated the effect of this mobility on the accuracy of protein-protein docking calculations using intermolecular PRE data by comparing MTSL and the less mobile 3-methanesulfonilthiomethyl-4-(pyridin-3-yl)-2,2,5,5-tetramethyl-2,5-dihydro-1H-pyrrol-1-yloxyl (pyMTSL) on the dynamic complex of cytochrome c and cytochrome c peroxidase. No significant differences were found between the two SLs. Docking was performed using either single or multiple conformers and either fixed or flexible SLs. It was found that mobility of the SLs is the limiting factor for obtaining accurate solutions. Optimization of SL conformer orientations using intra-molecular PRE improves the accuracy of docking. PMID:26356049

Complex RNA structures are constructed from helical segments connected by flexible loops that move spontaneously and in response to binding of small molecule ligands and proteins. Understanding the conformational variability of RNA requires the characterization of the coupled time evolution of interconnected flexible domains. To elucidate the collective molecular motions and explore the conformational landscape of the HIV-1 TAR RNA, we describe a new methodology that utilizes energy-minimized structures generated by the program “Fragment Assembly of RNA with Full-Atom Refinement (FARFAR)”. We apply structural filters in the form of experimental residual dipolar couplings (RDCs) to select a subset of discrete energy-minimized conformers and carry out principal component analyses (PCA) to corroborate the choice of the filtered subset. We use this subset of structures to calculate solution T1 and T1ρ relaxation times for 13C spins in multiple residues in different domains of the molecule using two simulation protocols that we previously published. We match the experimental T1 times to within 2% and the T1ρ times to within less than 10% for helical residues. These results introduce a protocol to construct viable dynamic trajectories for RNA molecules that accord well with experimental NMR data and support the notion that the motions of the helical portions of this small RNA can be described by a relatively small number of discrete conformations exchanging over time scales longer than 1 μs. PMID:24479561

In the context of a national energy company (EDF : Electricité de France), hydro-meteorological forecasts are necessary to ensure safety and security of installations, meet environmental standards and improve water ressources management and decision making. Hydrological ensemble forecasts allow a better representation of meteorological and hydrological forecasts uncertainties and improve human expertise of hydrological forecasts, which is essential to synthesize available informations, coming from different meteorological and hydrological models and human experience. An operational hydrological ensemble forecasting chain has been developed at EDF since 2008 and is being used since 2010 on more than 30 watersheds in France. This ensemble forecasting chain is characterized ensemble pre-processing (rainfall and temperature) and post-processing (streamflow), where a large human expertise is solicited. The aim of this paper is to compare 2 hydrological ensemble post-processing methods developed at EDF in order improve ensemble forecasts reliability (similar to Monatanari &Brath, 2004; Schaefli et al., 2007). The aim of the post-processing methods is to dress hydrological ensemble forecasts with hydrological model uncertainties, based on perfect forecasts. The first method (called empirical approach) is based on a statistical modelisation of empirical error of perfect forecasts, by streamflow sub-samples of quantile class and lead-time. The second method (called dynamical approach) is based on streamflow sub-samples of quantile class and streamflow variation, and lead-time. On a set of 20 watersheds used for operational forecasts, results show that both approaches are necessary to ensure a good post-processing of hydrological ensemble, allowing a good improvement of reliability, skill and sharpness of ensemble forecasts. The comparison of the empirical and dynamical approaches shows the limits of the empirical approach which is not able to take into account hydrological

In this paper, a human electrocardiogram (ECG) identification system based on ensemble empirical mode decomposition (EEMD) is designed. A robust preprocessing method comprising noise elimination, heartbeat normalization and quality measurement is proposed to eliminate the effects of noise and heart rate variability. The system is independent of the heart rate. The ECG signal is decomposed into a number of intrinsic mode functions (IMFs) and Welch spectral analysis is used to extract the significant heartbeat signal features. Principal component analysis is used reduce the dimensionality of the feature space, and the K-nearest neighbors (K-NN) method is applied as the classifier tool. The proposed human ECG identification system was tested on standard MIT-BIH ECG databases: the ST change database, the long-term ST database, and the PTB database. The system achieved an identification accuracy of 95% for 90 subjects, demonstrating the effectiveness of the proposed method in terms of accuracy and robustness. PMID:23698274

We propose a new and computationally efficient data-worth analysis and quantification framework keyed to the characterization of target state variables in groundwater systems. We focus on dynamically evolving plumes of dissolved chemicals migrating in randomly heterogeneous aquifers. An accurate prediction of the detailed features of solute plumes requires collecting a substantial amount of data. Otherwise, constraints dictated by the availability of financial resources and ease of access to the aquifer system suggest the importance of assessing the expected value of data before these are actually collected. Data-worth analysis is targeted to the quantification of the impact of new potential measurements on the expected reduction of predictive uncertainty based on a given process model. Integration of the Ensemble Kalman Filter method within a data-worth analysis framework enables us to assess data worth sequentially, which is a key desirable feature for monitoring scheme design in a contaminant transport scenario. However, it is remarkably challenging because of the (typically) high computational cost involved, considering that repeated solutions of the inverse problem are required. As a computationally efficient scheme, we embed in the data-worth analysis framework a modified version of the Probabilistic Collocation Method-basedEnsemble Kalman Filter proposed by Zeng et al. (2011) so that we take advantage of the ability to assimilate data sequentially in time through a surrogate model constructed via the polynomial chaos expansion. We illustrate our approach on a set of synthetic scenarios involving solute migrating in a two-dimensional random permeability field. Our results demonstrate the computational efficiency of our approach and its ability to quantify the impact of the design of the monitoring network on the reduction of uncertainty associated with the characterization of a migrating contaminant plume.

In complex geological systems such as fluvial aquifers, carbonate systems and naturally fractured aquifers, multiple-point statistics (MPS) based modeling methods are required to characterize complex, curvilinear features. History matching with MPS calls for an effective inverse method that can not only honor the observed dynamic data, but also preserve the curvilinear geologic features that impact the aquifer remediation. We introduce a novel pattern matching based approach to history matching that uses an ensemble of prior of models capturing the prior uncertainty in geology. In the developed method, multiple point pattern-search is implemented not only to identify the pattern of conductivity variability in the neighborhood of a simulation node but also the pattern of state (for example, piezometric head) variables. The unknown parameter and state values are simultaneously and sequentially simulated by pattern searching through an ensemble of realizations rather than by optimizing an objective function. In order to accelerate the computational efficiency, pattern-search is applied only at the predefined pilot point locations. Subsequently, a fast MPS method is employed to extrapolate the spatial patterns away from the pilot points. The pattern search algorithm also utilizes a flexible search radius that can be optimized for the estimation of either large-scale or short-scale structures. The algorithm is evaluated for both categorical and continuous conductivity fields by continuous conditioning to the observed dynamic data. The results show that the measured conductivity and head data can be updated in a continuous fashion as dynamic data becomes available and flow predictions are more accurate. Furthermore, curvilinear geologic structures are preserved after data integration. The significant advantages of this method are: (1) parameter and state variable do not have to be modeled using the multi-Gaussian distribution; (2) the relationship between parameters and

Ensemble data assimilation techniques, including the Ensemble Transform Kalman Filter (ETKF), have been successfully used to improve prediction skill in cases where a numerical model for forecasting has been developed. These techniques for systems for which no model exists are developed using the reconstruction of phase space from time series data. For many natural systems, the complete set of equations governing their evolution are not known and observational data of only a limited number of physical variables are available However, for a dissipative system in which the variables are coupled nonlinearly, the dimensionality of the phase space is greatly reduced, and it is possible to reconstruct the details of the phase space from a single scalar time series of observations. A combination of the phase phase reconstruction with ETKF yields a new technique of forecasting using only time series data. This technique is used to forecast magnetic field variations in th magnetosphere, which exhibits low dimensional behavior on the substorm time scale. The time series data of the magnetic field variations monitored by the network of groundbased magnetometers in the auroral region are used for forecasting at two stages.. In the first stage, the auroral electrojet indices computed from the data from the magnetometers are used for forecasting and yields forecasts that are better than persistence. In the second stage, the multivariate time series from several auroral region magnetometers is used to reconstruct the phase space of the magnetosphere-solar wind system using Multi-channel Singular Spectrum Analysis. The ETKF is applied to ensemble forecasts made using model data constructed from long time series of the data from each magnetometer and observations of the magnetometer measurements. The improved prediction skill, e.g., with respect to persistence, is achieved from the use of the dynamical behavior of nearby trajectories. The near-real time forecasts of space weather

We present the application of interactive 3-D visualization of ensemble weather predictions to forecasting warm conveyor belt situations during aircraft-based atmospheric research campaigns. Motivated by forecast requirements of the T-NAWDEX-Falcon 2012 campaign, a method to predict 3-D probabilities of the spatial occurrence of warm conveyor belts has been developed. Probabilities are derived from Lagrangian particle trajectories computed on the forecast wind fields of the ECMWF ensemble prediction system. Integration of the method into the 3-D ensemble visualization tool Met.3D, introduced in the first part of this study, facilitates interactive visualization of WCB features and derived probabilities in the context of the ECMWF ensemble forecast. We investigate the sensitivity of the method with respect to trajectory seeding and forecast wind field resolution. Furthermore, we propose a visual analysis method to quantitatively analyse the contribution of ensemble members to a probability region and, thus, to assist the forecaster in interpreting the obtained probabilities. A case study, revisiting a forecast case from T-NAWDEX-Falcon, illustrates the practical application of Met.3D and demonstrates the use of 3-D and uncertainty visualization for weather forecasting and for planning flight routes in the medium forecast range (three to seven days before take-off).

Accurate forecasts of the track, intensity and structure of a landfalling hurricane can save lives and mitigate social impacts. Over the last two decades, significant improvements have been achieved for hurricane forecasts. However, only a few of studies have emphasized landfalling hurricanes. Specifically, there are difficulties in predicting hurricane landfall due to the uncertainties in representing the atmospheric near-surface conditions in numerical weather prediction models, the complicated interaction between the atmosphere and the ocean, and the multiple-scale dynamical and physical processes accompanying storm development. In this study, the impact of the assimilation of conventional and satellite observations on the predictability of landfalling hurricanes is examined by using a mesoscale community Weather Research and Forecasting (WRF) model and an ensemble Kalman filter developed by NCAR Data Assimilation Research Testbed (DART). Hurricane Katrina (2005) was chosen as a case study since it was one of the deadliest disasters in US history. The minimum sea level pressure from the best track, QuikScat ocean surface wind vectors, surface mesonet observations, airborne Doppler radar derived wind components and available conventional observations are assimilated in a series of experiments to examine the data impacts on the predictability of Hurricane Katrina. The analyses and forecasts show that ensemble-based data assimilation significantly improves the forecast of Hurricane Katrina. The assimilation improves the track forecast through modifying the storm structures and related environmental fields. Cyclonic increments are clearly seen in vorticity and wind analyses. Temperature and humidity fields are also modified by the data assimilation. The changes in relevant fields help organize the structure of the storm, intensify the circulation, and result in a positive impact on the evolution of the storm in both analyses and forecasts. The forecasts in the

Physicochemical description of numerous cell processes is fundamentally based on the energy landscapes of protein molecules involved. Although the whole energy landscape is difficult to reconstruct, increased attention to particular targets has provided enough structures for mapping functionally important subspaces associated with the unbound and bound protein structures. The subspace mapping produces a discrete representation of the landscape, further called energy spectrum. We compiled and characterized ensembles of bound and unbound conformations of six small proteins and explored their spectra in implicit solvent. First, the analysis of the unbound-to-bound changes points to conformational selection as the binding mechanism for four proteins. Second, results show that bound and unbound spectra often significantly overlap. Moreover, the larger the overlap the smaller the root mean square deviation (RMSD) between the bound and unbound conformational ensembles. Third, the center of the unbound spectrum has a higher energy than the center of the corresponding bound spectrum of the dimeric and multimeric states for most of the proteins. This suggests that the unbound states often have larger entropy than the bound states. Fourth, the exhaustively long minimization, making small intrarotamer adjustments (all-atom RMSD ≤ 0.7 Å), dramatically reduces the distance between the centers of the bound and unbound spectra as well as the spectra extent. It condenses unbound and bound energy levels into a thin layer at the bottom of the energy landscape with the energy spacing that varies between 0.8–4.6 and 3.5–10.5 kcal/mol for the unbound and bound states correspondingly. Finally, the analysis of protein energy fluctuations showed that protein vibrations itself can excite the interstate transitions, including the unbound-to-bound ones. PMID:23526684

Ensemble methods are statistical and computational learning procedures reminiscent of the human social learning behavior of seeking several opinions before making any crucial decision. The idea of combining the opinions of different "experts" to obtain an overall “ensemble” decision is rooted in our culture at least from the classical age of ancient Greece, and it has been formalized during the Enlightenment with the Condorcet Jury Theorem[45]), which proved that the judgment of a committee is superior to those of individuals, provided the individuals have reasonable competence. Ensembles are sets of learning machines that combine in some way their decisions, or their learning algorithms, or different views of data, or other specific characteristics to obtain more reliable and more accurate predictions in supervised and unsupervised learning problems [48,116]. A simple example is represented by the majority vote ensemble, by which the decisions of different learning machines are combined, and the class that receives the majority of “votes” (i.e., the class predicted by the majority of the learning machines) is the class predicted by the overall ensemble [158]. In the literature, a plethora of terms other than ensembles has been used, such as fusion, combination, aggregation, and committee, to indicate sets of learning machines that work together to solve a machine learning problem [19,40,56,66,99,108,123], but in this chapter we maintain the term ensemble in its widest meaning, in order to include the whole range of combination methods. Nowadays, ensemble methods represent one of the main current research lines in machine learning [48,116], and the interest of the research community on ensemble methods is witnessed by conferences and workshops specifically devoted to ensembles, first of all the multiple classifier systems (MCS) conference organized by Roli, Kittler, Windeatt, and other researchers of this area [14,62,85,149,173]. Several theories have been

Ensemble methods are statistical and computational learning procedures reminiscent of the human social learning behavior of seeking several opinions before making any crucial decision. The idea of combining the opinions of different "experts" to obtain an overall “ensemble” decision is rooted in our culture at least from the classical age of ancient Greece, and it has been formalized during the Enlightenment with the Condorcet Jury Theorem[45]), which proved that the judgment of a committee is superior to those of individuals, provided the individuals have reasonable competence. Ensembles are sets of learning machines that combine in some way their decisions, or their learning algorithms, or different views of data, or other specific characteristics to obtain more reliable and more accurate predictions in supervised and unsupervised learning problems [48,116]. A simple example is represented by the majority vote ensemble, by which the decisions of different learning machines are combined, and the class that receives the majority of “votes” (i.e., the class predicted by the majority of the learning machines) is the class predicted by the overall ensemble [158]. In the literature, a plethora of terms other than ensembles has been used, such as fusion, combination, aggregation, and committee, to indicate sets of learning machines that work together to solve a machine learning problem [19,40,56,66,99,108,123], but in this chapter we maintain the term ensemble in its widest meaning, in order to include the whole range of combination methods. Nowadays, ensemble methods represent one of the main current research lines in machine learning [48,116], and the interest of the research community on ensemble methods is witnessed by conferences and workshops specifically devoted to ensembles, first of all the multiple classifier systems (MCS) conference organized by Roli, Kittler, Windeatt, and other researchers of this area [14,62,85,149,173]. Several theories have been

This paper investigates an ensemble-based technique called Bayesian Model Averaging (BMA) to improve the performance of protein amino acid p$K_a$ predictions. Structure-based p$K_a$ calculations play an important role in the mechanistic interpretation of protein structure and are also used to determine a wide range of protein properties. A diverse set of methods currently exist for p$K_a$ prediction, ranging from empirical statistical models to {\\it ab initio} quantum mechanical approaches. However, each of these methods are based on a set of assumptions that have inherent bias and sensitivities that can effect a model's accuracy and generalizability for p$K_a$ prediction in complicated biomolecular systems. We use BMA to combine eleven diverse prediction methods that each estimate pKa values of amino acids in staphylococcal nuclease. These methods are based on work conducted for the pKa Cooperative and the pKa measurements are based on experimental work conducted by the Garc{\\'i}a-Moreno lab. Our study demonstrates that the aggregated estimate obtained from BMA outperforms all individual prediction methods in our cross-validation study with improvements from 40-70\\% over other method classes. This work illustrates a new possible mechanism for improving the accuracy of p$K_a$ prediction and lays the foundation for future work on aggregate models that balance computational cost with prediction accuracy.

Multi-camera networks have gained great interest in video-based surveillance systems for security monitoring, access control, etc. Person re-identification is an essential and challenging task in multi-camera networks, which aims to determine if a given individual has already appeared over the camera network. Individual recognition often uses faces as a trial and requires a large number of samples during the training phrase. This is difficult to fulfill due to the limitation of the camera hardware system and the unconstrained image capturing conditions. Conventional face recognition algorithms often encounter the “small sample size” (SSS) problem arising from the small number of training samples compared to the high dimensionality of the sample space. To overcome this problem, interest in the combination of multiple base classifiers has sparked research efforts in ensemble methods. However, existing ensemble methods still open two questions: (1) how to define diverse base classifiers from the small data; (2) how to avoid the diversity/accuracy dilemma occurring during ensemble. To address these problems, this paper proposes a novel generic learning-basedensemble framework, which augments the small data by generating new samples based on a generic distribution and introduces a tailored 0–1 knapsack algorithm to alleviate the diversity/accuracy dilemma. More diverse base classifiers can be generated from the expanded face space, and more appropriate base classifiers are selected for ensemble. Extensive experimental results on four benchmarks demonstrate the higher ability of our system to cope with the SSS problem compared to the state-of-the-art system. PMID:25494350

Multi-camera networks have gained great interest in video-based surveillance systems for security monitoring, access control, etc. Person re-identification is an essential and challenging task in multi-camera networks, which aims to determine if a given individual has already appeared over the camera network. Individual recognition often uses faces as a trial and requires a large number of samples during the training phrase. This is difficult to fulfill due to the limitation of the camera hardware system and the unconstrained image capturing conditions. Conventional face recognition algorithms often encounter the "small sample size" (SSS) problem arising from the small number of training samples compared to the high dimensionality of the sample space. To overcome this problem, interest in the combination of multiple base classifiers has sparked research efforts in ensemble methods. However, existing ensemble methods still open two questions: (1) how to define diverse base classifiers from the small data; (2) how to avoid the diversity/accuracy dilemma occurring during ensemble. To address these problems, this paper proposes a novel generic learning-basedensemble framework, which augments the small data by generating new samples based on a generic distribution and introduces a tailored 0-1 knapsack algorithm to alleviate the diversity/accuracy dilemma. More diverse base classifiers can be generated from the expanded face space, and more appropriate base classifiers are selected for ensemble. Extensive experimental results on four benchmarks demonstrate the higher ability of our system to cope with the SSS problem compared to the state-of-the-art system. PMID:25494350

With the rapid development of mobile devices and pervasive computing technologies, acceleration-based human activity recognition, a difficult yet essential problem in mobile apps, has received intensive attention recently. Different acceleration signals for representing different activities or even a same activity have different attributes, which causes troubles in normalizing the signals. We thus cannot directly compare these signals with each other, because they are embedded in a nonmetric space. Therefore, we present a nonmetric scheme that retains discriminative and robust frequency domain information by developing a novel ensemble manifold rank preserving (EMRP) algorithm. EMRP simultaneously considers three aspects: 1) it encodes the local geometry using the ranking order information of intraclass samples distributed on local patches; 2) it keeps the discriminative information by maximizing the margin between samples of different classes; and 3) it finds the optimal linear combination of the alignment matrices to approximate the intrinsic manifold lied in the data. Experiments are conducted on the South China University of Technology naturalistic 3-D acceleration-based activity dataset and the naturalistic mobile-devices based human activity dataset to demonstrate the robustness and effectiveness of the new nonmetric scheme for acceleration-based human activity recognition. PMID:25265635

Artificial neural network (ANN) based hydrologic models have gained lot of attention among water resources engineers and scientists, owing to their potential for accurate prediction of flood flows as compared to conceptual or physics based hydrologic models. The ANN approximates the non-linear functional relationship between the complex hydrologic variables in arriving at the river flow forecast values. Despite a large number of applications, there is still some criticism that ANN's point prediction lacks in reliability since the uncertainty of predictions are not quantified, and it limits its use in practical applications. A major concern in application of traditional uncertainty analysis techniques on neural network framework is its parallel computing architecture with large degrees of freedom, which makes the uncertainty assessment a challenging task. Very limited studies have considered assessment of predictive uncertainty of ANN based hydrologic models. In this study, a novel method is proposed that help construct the prediction interval of ANN flood forecasting model during calibration itself. The method is designed to have two stages of optimization during calibration: at stage 1, the ANN model is trained with genetic algorithm (GA) to obtain optimal set of weights and biases vector, and during stage 2, the optimal variability of ANN parameters (obtained in stage 1) is identified so as to create an ensemble of predictions. During the 2nd stage, the optimization is performed with multiple objectives, (i) minimum residual variance for the ensemble mean, (ii) maximum measured data points to fall within the estimated prediction interval and (iii) minimum width of prediction interval. The method is illustrated using a real world case study of an Indian basin. The method was able to produce an ensemble that has an average prediction interval width of 23.03 m3/s, with 97.17% of the total validation data points (measured) lying within the interval. The derived

Ensemble forecasting has been used for operational numerical weather prediction in the United States and Europe since the early 1990s. An ensemble of weather or climate forecasts is used to characterize the two main sources of uncertainty in computer models of physical systems: ...

The Ensembl (http://www.ensembl.org/) project provides a comprehensive and integrated source of annotation of large genome sequences. Over the last year the number of genomes available from the Ensembl site has increased by 7 to 16, with the addition of the six vertebrate genomes of chimpanzee, dog, cow, chicken, tetraodon and frog and the insect genome of honeybee. The majority have been annotated automatically using the Ensembl gene build system, showing its flexibility to reliably annotate a wide variety of genomes. With the increased number of vertebrate genomes, the comparative analysis provided to users has been greatly improved, with new website interfaces allowing annotation of different genomes to be directly compared. The Ensembl software system is being increasingly widely reused in different projects showing the benefits of a completely open approach to software development and distribution. PMID:15608235

Climate analogues, also denoted Space-For-Time, may be used to identify regions where the present climatic conditions resemble conditions of a past or future state of another location or region based on robust climate variable statistics in combination with projections of how these statistics change over time. The study focuses on assessing climate analogues for Denmark based on current climate data set (E-OBS) observations as well as the ENSEMBLES database of future climates with the aim of projecting future precipitation extremes. The local present precipitation extremes are assessed by means of intensity-duration-frequency curves for urban drainage design for the relevant locations being France, the Netherlands, Belgium, Germany, the United Kingdom, and Denmark. Based on this approach projected increases of extreme precipitation by 2100 of 9 and 21% are expected for 2 and 10 year return periods, respectively. The results should be interpreted with caution as the best region to represent future conditions for Denmark is the coastal areas of Northern France, for which only little information is available with respect to present precipitation extremes. PMID:25714642

Biology is meaningful and important to identify cytokines and investigate their various functions and biochemical mechanisms. However, several issues remain, including the large scale of benchmark datasets, serious imbalance of data, and discovery of new gene families. In this paper, we employ the machine learning approach based on a novel ensemble classifier to predict cytokines. We directly selected amino acids sequences as research objects. First, we pretreated the benchmark data accurately. Next, we analyzed the physicochemical properties and distribution of whole amino acids and then extracted a group of 120-dimensional (120D) valid features to represent sequences. Third, in the view of the serious imbalance in benchmark datasets, we utilized a sampling approach based on the synthetic minority oversampling technique algorithm and K-means clustering undersampling algorithm to rebuild the training set. Finally, we built a library for dynamic selection and circulating combination based on clustering (LibD3C) and employed the new training set to realize cytokine classification. Experiments showed that the geometric mean of sensitivity and specificity obtained through our approach is as high as 93.3%, which proves that our approach is effective for identifying cytokines. PMID:24027761

We develop efficient handling of solvation forces in the multiscale method of multiple time step molecular dynamics (MTS-MD) of a biomolecule steered by the solvation free energy (effective solvation forces) obtained from the 3D-RISM-KH molecular theory of solvation (three-dimensional reference interaction site model complemented with the Kovalenko-Hirata closure approximation). To reduce the computational expenses, we calculate the effective solvation forces acting on the biomolecule by using advanced solvation force extrapolation (ASFE) at inner time steps while converging the 3D-RISM-KH integral equations only at large outer time steps. The idea of ASFE consists in developing a discrete non-Eckart rotational transformation of atomic coordinates that minimizes the distances between the atomic positions of the biomolecule at different time moments. The effective solvation forces for the biomolecule in a current conformation at an inner time step are then extrapolated in the transformed subspace of those at outer time steps by using a modified least square fit approach applied to a relatively small number of the best force-coordinate pairs. The latter are selected from an extended set collecting the effective solvation forces obtained from 3D-RISM-KH at outer time steps over a broad time interval. The MTS-MD integration with effective solvation forces obtained by converging 3D-RISM-KH at outer time steps and applying ASFE at inner time steps is stabilized by employing the optimized isokinetic Nosé-Hoover chain (OIN) ensemble. Compared to the previous extrapolation schemes used in combination with the Langevin thermostat, the ASFE approach substantially improves the accuracy of evaluation of effective solvation forces and in combination with the OIN thermostat enables a dramatic increase of outer time steps. We demonstrate on a fully flexible model of alanine dipeptide in aqueous solution that the MTS-MD/OIN/ASFE/3D-RISM-KH multiscale method of molecular dynamics

Classification-based pedestrian detection systems (PDSs) are currently a hot research topic in the field of intelligent transportation. A PDS detects pedestrians in real time on moving vehicles. A practical PDS demands not only high detection accuracy but also high detection speed. However, most of the existing classification-based approaches mainly seek for high detection accuracy, while the detection speed is not purposely optimized for practical application. At the same time, the performance, particularly the speed, is primarily tuned based on experiments without theoretical foundations, leading to a long training procedure. This paper starts with measuring and optimizing detection speed, and then a practical classification-based pedestrian detection solution with high detection speed and training speed is described. First, an extended classification/detection speed metric, named feature-per-object (fpo), is proposed to measure the detection speed independently from execution. Then, an fpo minimization model with accuracy constraints is formulated based on a tree classifier ensemble, where the minimum fpo can guarantee the highest detection speed. Finally, the minimization problem is solved efficiently by using nonlinear fitting based on radial basis function neural networks. In addition, the optimal solution is directly used to instruct classifier training; thus, the training speed could be accelerated greatly. Therefore, a rapid and accurate classification-based detection technique is proposed for the PDS. Experimental results on urban traffic videos show that the proposed method has a high detection speed with an acceptable detection rate and a false-alarm rate for onboard detection; moreover, the training procedure is also very fast. PMID:20457550

Knock is one of the major constraints to improve the performance and thermal efficiency of spark ignition (SI) engines. It can also result in severe permanent engine damage under certain operating conditions. Based on the ensemble empirical mode decomposition (EEMD), this paper proposes a new approach to determine the knock characteristics in SI engines. By adding a uniformly distributed and finite white Gaussian noise, the EEMD can preserve signal continuity in different scales and therefore alleviates the mode-mixing problem occurring in the classic empirical mode decomposition (EMD). The feasibilities of applying the EEMD to detect the knock signatures of a test SI engine via the pressure signal measured from combustion chamber and the vibration signal measured from cylinder head are investigated. Experimental results show that the EEMD-based method is able to detect the knock signatures from both the pressure signal and vibration signal, even in initial stage of knock. Finally, by comparing the application results with those obtained by short-time Fourier transform (STFT), Wigner-Ville distribution (WVD) and discrete wavelet transform (DWT), the superiority of the EEMD method in determining knock characteristics is demonstrated.

Flooding is a wide spread and devastating natural disaster worldwide. Floods that took place in the last decade in China were ranked the worst amongst recorded floods worldwide in terms of the number of human fatalities and economic losses (Munich Re-Insurance). Rapid economic development and population expansion into low lying flood plains has worsened the situation. Current conventional flood prediction systems in China are neither suited to the perceptible climate variability nor the rapid pace of urbanization sweeping the country. Flood prediction, from short-term (a few hours) to medium-term (a few days), needs to be revisited and adapted to changing socio-economic and hydro-climatic realities. The latest technology requires implementation of multiple numerical weather prediction systems. The availability of twelve global ensemble weather prediction systems through the ‘THORPEX Interactive Grand Global Ensemble' (TIGGE) offers a good opportunity for an effective state-of-the-art early forecasting system. A prototype of a Novel Flood Early Warning System (NEWS) using the TIGGE database is tested in the Huai River basin in east-central China. It is the first early flood warning system in China that uses the massive TIGGE database cascaded with river catchment models, the Xinanjiang hydrologic model and a 1-D hydraulic model, to predict river discharge and flood inundation. The NEWS algorithm is also designed to provide web-based services to a broad spectrum of end-users. The latter presents challenges as both databases and proprietary codes reside in different locations and converge at dissimilar times. NEWS will thus make use of a ready-to-run grid system that makes distributed computing and data resources available in a seamless and secure way. An ability to run or function on different operating systems and provide an interface or front that is accessible to broad spectrum of end-users is additional requirement. The aim is to achieve robust interoperability

We consider the problem of adaptively combining the 'multi-model ensemble' of General Circulation Models (GCMs) that inform the Intergovernmental Panel on Climate Change (IPCC), drawn from major laboratories around the world. This problem can be treated as an expert tracking problem in the online setting, where an algorithm maintains a set of weights over the experts (here the GCMs are the experts). At each time interval these weights are used to make a combined projection, and then the weights can be updated based on the performance of experts. In this work we focus on tracking the GCMs at different geographic locations and effectively incorporating spatial influence and correlations between these locations. We approach this multi-model ensemble problem using a pairwise Markov Random Field (MRF), where the state of each hidden variable is the identity of the best GCM at a specific location. Our MRF takes the form of a lattice over the Earth, with links between neighboring locations. To establish reasonable energy functions for the MRF, we first show that the Fixed-Share algorithm for expert tracking over time can be expressed as a simple MRF. By expressing Fixed-Share as an MRF, we identify the energy function that corresponds to the switching dynamics (how the best expert switches over time). Since an MRF is an undirected graph, this 'switching' energy function can be naturally applied to spatial links between variables as well. To calculate the marginal probabilities of the hidden variables (i.e. our new beliefs over GCMs), we apply Loopy Belief Propagation (LBP) to the MRF. In LBP, each node sends messages to neighboring nodes about the sender's 'belief' of the neighbor's state. Figure 1 shows our initial results from an online evaluation of GCM temperature hindcasts from the IPCC Phase 3 Coupled Model Intercomparison Project (CMIP3) archive. The red line shows the mean loss of our method versus the spatial switching rate. The right-most point on the graph

As train loads and travel speeds have increased over time, railway axle bearings have become critical elements which require more efficient non-destructive inspection and fault diagnostics methods. This paper presents a novel and adaptive procedure based on ensemble empirical mode decomposition (EEMD) and Hilbert marginal spectrum for multi-fault diagnostics of axle bearings. EEMD overcomes the limitations that often hypothesize about data and computational efforts that restrict the application of signal processing techniques. The outputs of this adaptive approach are the intrinsic mode functions that are treated with the Hilbert transform in order to obtain the Hilbert instantaneous frequency spectrum and marginal spectrum. Anyhow, not all the IMFs obtained by the decomposition should be considered into Hilbert marginal spectrum. The IMFs' confidence index arithmetic proposed in this paper is fully autonomous, overcoming the major limit of selection by user with experience, and allows the development of on-line tools. The effectiveness of the improvement is proven by the successful diagnosis of an axle bearing with a single fault or multiple composite faults, e.g., outer ring fault, cage fault and pin roller fault. PMID:25970256

In this talk presents a new model of the global forecast error growth applied to the forecast errors simulated by the ensemble prediction system (ENS) of the ECMWF. The proxy for forecast errors is the total spread of the ECMWF operational ensemble forecasts obtained by the decomposition of the wind and geopotential fields in the normal-mode functions. In this way, the ensemble spread can be quantified separately for the balanced and inertio-gravity (IG) modes for every forecast range. Ensemble reliability is defined for the balanced and IG modes comparing the ensemble spread with the control analysis in each scale. The results show that initial uncertainties in the ECMWF ENS are largest in the tropical large-scale modes and their spatial distribution is similar to the distribution of the short-range forecast errors. Initially the ensemble spread grows most in the smallest scales and in the synoptic range of the IG modes but the overall growth is dominated by the increase of spread in balanced modes in synoptic and planetary scales in the midlatitudes. During the forecasts, the distribution of spread in the balanced and IG modes grows towards the climatological spread distribution characteristic of the analyses. The ENS system is found to be somewhat under-dispersive which is associated with the lack of tropical variability, primarily the Kelvin waves. The new model of the forecast error growth has three fitting parameters to parameterize the initial fast growth and a more slow exponential error growth later on. The asymptotic values of forecast errors are independent of the exponential growth rate. It is found that the asymptotic values of the errors due to unbalanced dynamics are around 10 days while the balanced and total errors saturate in 3 to 4 weeks. Reference: Žagar, N., R. Buizza, and J. Tribbia, 2015: A three-dimensional multivariate modal analysis of atmospheric predictability with application to the ECMWF ensemble. J. Atmos. Sci., 72, 4423-4444.

Determining the level of confidence in regional climate model projections could be very useful for designing climate change adaptation, particularly for vulnerable regions. The majority of previous research to evaluate models has been based on the mean state, but for confidence in projections the plausibility of the mechanisms for change is just as, if not more, important. In this study we demonstrate a methodology for process-based assessment of projections, whereby circulation changes accompanying future responses are examined and then compared to atmospheric dynamics during historical years in models and reanalyses. We apply this methodology to an ensemble of five global and regional model experiments and focus on West Africa, where these models project a strong drying trend. The analysis reveals that this drying is associated with anomalous subsidence in the upper atmosphere, and large warming of the Saharan heat low region, with potential feedback effects via the African easterly jet and West African monsoon. This mode occurs during dry years in the historical period, and dominates in the future experiments. However, the same mode is not found in dry years in reanalysis data, which casts doubt on the reasons for strong drying in these models. The regional models show a very similar response to their driving global models, and are therefore no more trustworthy in this case. This result underlines the importance of assessing model credibility on a case-by-case basis and implies that process-based methodologies should be applied to other model projections before their outputs are used to inform decision making.

Climate simulations codes, such as the Community Earth System Model (CESM), are especially complex and continually evolving. Their on-going state of development requires frequent software verification in the form of quality assurance to both preserve the quality of the code and instill model confidence. To formalize and simplify this previously subjective and computationally-expensive aspect of the verification process, we have developed a new tool for evaluating climate consistency. Because an ensemble of simulations allows us to gauge the natural variability of the model's climate, our new tool uses an ensemble approach for consistency testing. In particular, an ensemble of CESM climate runs is created, from which we obtain a statistical distribution that can be used to determine whether a new climate run is statistically distinguishable from the original ensemble. The CESM Ensemble Consistency Test, referred to as CESM-ECT, is objective in nature and accessible to CESM developers and users. The tool has proven its utility in detecting errors in software and hardware environments and providing rapid feedback to model developers.

Successful treatment of tumors with motion-adaptive radiotherapy requires accurate prediction of respiratory motion, ideally with a prediction horizon larger than the latency in radiotherapy system. Accurate prediction of respiratory motion is however a non-trivial task due to the presence of irregularities and intra-trace variabilities, such as baseline drift and temporal changes in fundamental frequency pattern. In this paper, to enhance the accuracy of the respiratory motion prediction, we propose a stacked regression ensemble framework that integrates heterogeneous respiratory motion prediction algorithms. We further address two crucial issues for developing a successful ensemble framework: (1) selection of appropriate prediction methods to ensemble (level-0 methods) among the best existing prediction methods; and (2) finding a suitable generalization approach that can successfully exploit the relative advantages of the chosen level-0 methods. The efficacy of the developed ensemble framework is assessed with real respiratory motion traces acquired from 31 patients undergoing treatment. Results show that the developed ensemble framework improves the prediction performance significantly compared to the best existing methods. PMID:27238760

Military and civil defense personnel are often involved in complex activities in a variety of outdoor environments. The choice of appropriate clothing ensembles represents an important strategy to establish the success of a military mission. The main aim of this study was to compare the known clothing insulation of the garment ensembles worn by soldiers during two winter outdoor field trials (hike and guard duty) with the estimated optimal clothing thermal insulations recommended to maintain thermoneutrality, assessed by using two different biometeorological procedures. The overall aim was to assess the applicability of such biometeorological procedures to weather forecast systems, thereby developing a comprehensive biometeorological tool for military operational forecast purposes. Military trials were carried out during winter 2006 in Pokljuka (Slovenia) by Slovene Armed Forces personnel. Gastrointestinal temperature, heart rate and environmental parameters were measured with portable data acquisition systems. The thermal characteristics of the clothing ensembles worn by the soldiers, namely thermal resistance, were determined with a sweating thermal manikin. Results showed that the clothing ensemble worn by the military was appropriate during guard duty but generally inappropriate during the hike. A general under-estimation of the biometeorological forecast model in predicting the optimal clothing insulation value was observed and an additional post-processing calibration might further improve forecast accuracy. This study represents the first step in the development of a comprehensive personalized biometeorological forecast system aimed at improving recommendations regarding the optimal thermal insulation of military garment ensembles for winter activities.

This paper presents a method of constructing prediction interval for artificial neural network (ANN) rainfall runoff models during calibration with a consideration of generating ensemble predictions. A two stage optimization procedure is envisaged in this study for construction of prediction interval for the ANN output. In Stage 1, ANN model is trained with genetic algorithm (GA) to obtain optimal set of weights and biases vector. In Stage 2, possible variability of ANN parameters (obtained in Stage 1) is optimized so as to create an ensemble of models with the consideration of minimum residual variance for the ensemble mean, while ensuring a maximum of the measured data to fall within the estimated prediction interval. The width of the prediction interval is also minimized simultaneously. The method is demonstrated using a real world case study of rainfall runoff data for an Indian basin. The method was able to produce ensembles with a prediction interval (average width) of 26.49 m3/s with 97.17% of the total observed data points lying within the interval in validation. One specific advantage of the method is that when ensemble mean value is considered as a forecast, the peak flows are predicted with improved accuracy by this method compared to traditional single point forecasted ANNs.

Accurate determination of thermodynamic properties of petroleum reservoir fluids is of great interest to many applications, especially in petroleum engineering and chemical engineering. Molecular simulation has many appealing features, especially its requirement of fewer tuned parameters but yet better predicting capability; however it is well known that molecular simulation is very CPU expensive, as compared to equation of state approaches. We have recently introduced an efficient thermodynamically consistent technique to regenerate rapidly Monte Carlo Markov Chains (MCMCs) at different thermodynamic conditions from the existing data points that have been pre-computed with expensive classical simulation. This technique can speed up the simulation more than a million times, making the regenerated molecular simulation almost as fast as equation of state approaches. In this paper, this technique is first briefly reviewed and then numerically investigated in its capability of predicting ensemble averages of primary quantities at different neighboring thermodynamic conditions to the original simulated MCMCs. Moreover, this extrapolation technique is extended to predict second derivative properties (e.g. heat capacity and fluid compressibility). The method works by reweighting and reconstructing generated MCMCs in canonical ensemble for Lennard-Jones particles. In this paper, system's potential energy, pressure, isochoric heat capacity and isothermal compressibility along isochors, isotherms and paths of changing temperature and density from the original simulated points were extrapolated. Finally, an optimized set of Lennard-Jones parameters (ε, σ) for single site models were proposed for methane, nitrogen and carbon monoxide.

An ensemble is a collection of related datasets, called members, built from a series of runs of a simulation or an experiment. Ensembles are large, temporal, multidimensional, and multivariate, making them difficult to analyze. Another important challenge is visualizing ensembles that vary both in space and time. Initial visualization techniques displayed ensembles with a small number of members, or presented an overview of an entire ensemble, but without potentially important details. Recently, researchers have suggested combining these two directions, allowing users to choose subsets of members to visualization. This manual selection process places the burden on the user to identify which members to explore. We first introduce a static ensemble visualization system that automatically helps users locate interesting subsets of members to visualize. We next extend the system to support analysis and visualization of temporal ensembles. We employ 3D shape comparison, cluster tree visualization, and glyph based visualization to represent different levels of detail within an ensemble. This strategy is used to provide two approaches for temporal ensemble analysis: (1) segment basedensemble analysis, to capture important shape transition time-steps, clusters groups of similar members, and identify common shape changes over time across multiple members; and (2) time-step basedensemble analysis, which assumes ensemble members are aligned in time by combining similar shapes at common time-steps. Both approaches enable users to interactively visualize and analyze a temporal ensemble from different perspectives at different levels of detail. We demonstrate our techniques on an ensemble studying matter transition from hadronic gas to quark-gluon plasma during gold-on-gold particle collisions. PMID:26529728

Today, remote machine condition monitoring is popular due to the continuous advancement in wireless communication. Bearing is the most frequently and easily failed component in many rotating machines. To accurately identify the type of bearing fault, large amounts of vibration data need to be collected. However, the volume of transmitted data cannot be too high because the bandwidth of wireless communication is limited. To solve this problem, the data are usually compressed before transmitting to a remote maintenance center. This paper proposes a novel signal compression method that can substantially reduce the amount of data that need to be transmitted without sacrificing the accuracy of fault identification. The proposed signal compression method is based on ensemble empirical mode decomposition (EEMD), which is an effective method for adaptively decomposing the vibration signal into different bands of signal components, termed intrinsic mode functions (IMFs). An optimization method was designed to automatically select appropriate EEMD parameters for the analyzed signal, and in particular to select the appropriate level of the added white noise in the EEMD method. An index termed the relative root-mean-square error was used to evaluate the decomposition performances under different noise levels to find the optimal level. After applying the optimal EEMD method to a vibration signal, the IMF relating to the bearing fault can be extracted from the original vibration signal. Compressing this signal component obtains a much smaller proportion of data samples to be retained for transmission and further reconstruction. The proposed compression method were also compared with the popular wavelet compression method. Experimental results demonstrate that the optimization of EEMD parameters can automatically find appropriate EEMD parameters for the analyzed signals, and the IMF-based compression method provides a higher compression ratio, while retaining the bearing defect

Content-based medical image retrieval (CBMIR) is a powerful resource to improve differential computer-aided diagnosis. The major problem with CBMIR applications is the semantic gap, a situation in which the system does not follow the users' sense of similarity. This gap can be bridged by the adequate modeling of similarity queries, which ultimately depends on the combination of feature extractor methods and distance functions. In this study, such combinations are referred to as perceptual parameters, as they impact on how images are compared. In a CBMIR, the perceptual parameters must be manually set by the users, which imposes a heavy burden on the specialists; otherwise, the system will follow a predefined sense of similarity. This paper presents a novel approach to endow a CBMIR with a proper sense of similarity, in which the system defines the perceptual parameter depending on the query element. The method employs ensemble strategy, where an extreme learning machine acts as a meta-learner and identifies the most suitable perceptual parameter according to a given query image. This parameter defines the search space for the similarity query that retrieves the most similar images. An instance-based learning classifier labels the query image following the query result set. As the concept implementation, we integrated the approach into a mammogram CBMIR. For each query image, the resulting tool provided a complete second opinion, including lesion class, system certainty degree, and set of most similar images. Extensive experiments on a large mammogram dataset showed that our proposal achieved a hit ratio up to 10% higher than the traditional CBMIR approach without requiring external parameters from the users. Our database-driven solution was also up to 25% faster than content retrieval traditional approaches. PMID:26259520

An extended range tropical cyclogenesis forecast model has been developed using the forecasts of global models available from TIGGE portal. A scheme has been developed to detect the signatures of cyclogenesis in the global model forecast fields [i.e., the mean sea level pressure and surface winds (10 m horizontal winds)]. For this, a wind matching index was determined between the synthetic cyclonic wind fields and the forecast wind fields. The thresholds of 0.4 for wind matching index and 1005 hpa for pressure were determined to detect the cyclonic systems. These detected cyclonic systems in the study region are classified into different cyclone categories based on their intensity (maximum wind speed). The forecasts of up to 15 days from three global models viz., ECMWF, NCEP and UKMO have been used to predict cyclogenesis based on multi-model ensemble approach. The occurrence of cyclonic events of different categories in all the forecast steps in the grided region (10 × 10 km2) was used to estimate the probability of the formation of cyclogenesis. The probability of cyclogenesis was estimated by computing the grid score using the wind matching index by each model and at each forecast step and convolving it with Gaussian filter. The proposed method is used to predict the cyclogenesis of five named tropical cyclones formed during the year 2013 in the north Indian Ocean. The 6-8 days advance cyclogenesis of theses systems were predicted using the above approach. The mean lead prediction time for the cyclogenesis event of the proposed model has been found as 7 days.

In IaaS (infrastructure as a service) cloud environment, users are provisioned with virtual machines (VMs). To allocate resources for users dynamically and effectively, accurate resource demands predicting is essential. For this purpose, this paper proposes a self-adaptive prediction method using ensemble model and subtractive-fuzzy clustering based fuzzy neural network (ESFCFNN). We analyze the characters of user preferences and demands. Then the architecture of the prediction model is constructed. We adopt some base predictors to compose the ensemble model. Then the structure and learning algorithm of fuzzy neural network is researched. To obtain the number of fuzzy rules and the initial value of the premise and consequent parameters, this paper proposes the fuzzy c-means combined with subtractive clustering algorithm, that is, the subtractive-fuzzy clustering. Finally, we adopt different criteria to evaluate the proposed method. The experiment results show that the method is accurate and effective in predicting the resource demands. PMID:25691896

Aims A fast, non-invasive and observer-independent method to analyze the homogeneity and maturity of human pluripotent stem cell (hPSC) derived retinal pigment epithelial (RPE) cells is warranted to assess the suitability of hPSC-RPE cells for implantation or in vitro use. The aim of this work was to develop and validate methods to create ensembles of state-of-the-art texture descriptors and to provide a robust classification tool to separate three different maturation stages of RPE cells by using phase contrast microscopy images. The same methods were also validated on a wide variety of biological image classification problems, such as histological or virus image classification. Methods For image classification we used different texture descriptors, descriptor ensembles and preprocessing techniques. Also, three new methods were tested. The first approach was an ensemble of preprocessing methods, to create an additional set of images. The second was the region-based approach, where saliency detection and wavelet decomposition divide each image in two different regions, from which features were extracted through different descriptors. The third method was an ensemble of Binarized Statistical Image Features, based on different sizes and thresholds. A Support Vector Machine (SVM) was trained for each descriptor histogram and the set of SVMs combined by sum rule. The accuracy of the computer vision tool was verified in classifying the hPSC-RPE cell maturation level. Dataset and Results The RPE dataset contains 1862 subwindows from 195 phase contrast images. The final descriptor ensemble outperformed the most recent stand-alone texture descriptors, obtaining, for the RPE dataset, an area under ROC curve (AUC) of 86.49% with the 10-fold cross validation and 91.98% with the leave-one-image-out protocol. The generality of the three proposed approaches was ascertained with 10 more biological image datasets, obtaining an average AUC greater than 97%. Conclusions Here we

Three highly sensitive and selective switches for monosaccharides were composed by anionic polyelectrolyte PPPSO3Na and cationic viologen quencheres BBVs. The sensing processes of three ensembles (PPPSO3Na/o-BBV, PPPSO3Na/m-BBV and PPPSO3Na/p-BBV) to common seven monosaccharides have been determined by fluorescence spectra at pH 7.4 buffer solution. The results show that the three sensing ensembles all embody higher selectivity and sensitivity for D-fructose with reversible "on-off-on" fluorescence response. The research results can provide a new mode for developing highly selective probes.

With increasing shares of installed wind power in Germany, accurate forecasts of wind speed and power get increasingly important for the grid integration of Renewable Energies. Applications like grid management and trading also benefit from uncertainty information. This uncertainty information can be provided by ensemble forecasts. These forecasts often exhibit systematic errors such as biases and spread deficiencies. The errors can be reduced by statistical post-processing. We use forecast data from the regional Numerical Weather Prediction model COSMO-DE EPS as input to regional wind power forecasts. In order to enhance the power forecast, we first calibrate the wind speed forecasts against the model analysis, so some of the model's systematic errors can be removed. Wind measurements at every grid point are usually not available and as we want to conduct grid zone forecasts, the model analysis is the best target for calibration. We use forecasts from the COSMO-DE EPS, a high-resolution ensemble prediction system with 20 forecast members. The model covers the region of Germany and surroundings with a vertical resolution of 50 model levels and a horizontal resolution of 0.025 degrees (approximately 2.8 km). The forecast range is 21 hours with model output available on an hourly basis. Thus, we use it for shortest-term wind power forecasts. The COSMO-DE EPS was originally designed with a focus on forecasts of convective precipitation. The COSMO-DE EPS wind speed forecasts at hub height were post-processed by nonhomogenous Gaussian regression (NGR; Thorarinsdottir and Gneiting, 2010), a calibration method that fits a truncated normal distribution to the ensemble wind speed forecasts. As calibration target, the model analysis was used. The calibration is able to remove some deficits of the COSMO-DE EPS. In contrast to the raw ensemble members, the calibrated ensemble members do not show anymore the strong correlations with each other and the spread-skill relationship

Tractography uses diffusion MRI to estimate the trajectory and cortical projection zones of white matter fascicles in the living human brain. There are many different tractography algorithms and each requires the user to set several parameters, such as curvature threshold. Choosing a single algorithm with specific parameters poses two challenges. First, different algorithms and parameter values produce different results. Second, the optimal choice of algorithm and parameter value may differ between different white matter regions or different fascicles, subjects, and acquisition parameters. We propose using ensemble methods to reduce algorithm and parameter dependencies. To do so we separate the processes of fascicle generation and evaluation. Specifically, we analyze the value of creating optimized connectomes by systematically combining candidate streamlines from an ensemble of algorithms (deterministic and probabilistic) and systematically varying parameters (curvature and stopping criterion). The ensemble approach leads to optimized connectomes that provide better cross-validated prediction error of the diffusion MRI data than optimized connectomes generated using a single-algorithm or parameter set. Furthermore, the ensemble approach produces connectomes that contain both short- and long-range fascicles, whereas single-parameter connectomes are biased towards one or the other. In summary, a systematic ensemble tractography approach can produce connectomes that are superior to standard single parameter estimates both for predicting the diffusion measurements and estimating white matter fascicles. PMID:26845558

Operational numerical weather prediction (NWP) systems occasionally exhibit "forecast skill dropouts" in which the forecast skill drops to an abnormally low level, due in part to the assimilation of flawed observational data. Recent studies have shown that a diagnostic technique called Ensemble Forecast Sensitivity to Observations (EFSO) can detect such observations (Kalnay et.al 2012; Ota et al. 2013, Tellus A). Based on this technique, a new Quality Control (QC) scheme called Proactive QC (PQC) has been proposed which detects "flawed" observations using EFSO after just 6 hours forecast, when the analysis at the next cycle becomes available for verification and then repeats the analysis and forecast without using the detected observations (Hotta 2014). In Hotta (2014), it was shown using the JCSDA S4 Testbed that the 6hr PQC reduces the 24-hour forecast errors from the detected skill dropout events. With such encouraging results we are performing preliminary experiments towards operational implementation. First, we show that offline PQC correction can significantly reduce forecast errors up to 5 days, and that the reduction and improved areal coverage can grow with synoptic weather disturbances for several days. Second, with online PQC cycle experiment the reduction of forecast error is shown to be even larger than in the offline version, since the effect could accumulate over each time we perform a PQC correction. Finally, the operational center imposes very tight schedule in order to deliver the products on time, thus the computational cost has to be minimized in order for PQC to be implemented. To avoid performing the analysis twice, which is the most expensive part of PQC, we test the accuracy of constant-K approximation, which assumes the Kalman gain K doesn't change much given the fact that only a small subset of observation is rejected. In this presentation, we will demonstrate the performance and feasibility of PQC implementation in real-time operational

Chlorophyll a concentration is one of the important parameters for the characterization of water quality, which reflects the degree of eutrophication and algae content in the water body. It is also an important factor in determining water spectral reflectance. Chlorophyll a concentration is an important water quality parameter in water quality remote sensing. Remote sensing quantitative retrieval of chlorophyll a concentration can provide new ideas and methods for the monitoring and evaluation of lake water quality. In this work, we developed a data assimilation scheme based on ensemble square root filters and three-dimensional numerical modeling for wind-driven circulation and pollutant transport to assimilate the concentration of chlorophyll a. We also conducted some assimilation experiments using buoy observation data on May 20, 2010. We estimated the concentration of chlorophyll a in Taihu Lake, and then used this result to forecast the concentration of chlorophyll a. During the assimilation stage, the root mean square error reduced from 1.58, 1.025, and 2.76 to 0.465, 0.276, and 1.01, respectively, and the average relative error reduced from 0.2 to 0.05, 0.046, and 0.069, respectively. During the prediction stage, the root mean square error reduced from 1.486, 1.143, and 2.38 to 0.017, 0.147, and 0.23, respectively, and the average relative error reduced from 0.2 to 0.002, 0.025, and 0.019, respectively. The final results indicate that the method of data assimilation can significantly improve the accuracy in the estimation and prediction of chlorophyll a concentration in Taihu Lake. PMID:23487919

Public school music education in the USA remains wedded to large ensemble performance. Instruction tends to be teacher directed, relies on styles from the Western canon and exhibits little concern for musical interests of students. The idea that a fundamental purpose of education is the creation of a just society is difficult for many music…

Forecasting future extreme events under the present changing climate represents a difficult task. Currently there are a large number of ensembles of simulations for climate projections that take in account different models and scenarios. However, there is a need for reducing the size of the ensemble to make the interpretation of these simulations more manageable for impact studies or climate risk assessment. This can be achieved by developing subsampling strategies to identify a limited number of simulations that best represent the ensemble. In this study, cold waves are chosen to test different approaches for subsampling available simulations. The definition of cold waves depends on the criteria used, but they are generally defined using a minimum temperature threshold, the duration of the cold spell as well as their geographical extend. These climate indicators are not universal, highlighting the difficulty of directly comparing different studies. As part of the of the CLIPC European project, we use daily surface temperature data obtained from CMIP5 outputs as well as Euro-CORDEX simulations to predict future cold waves events in Europe. From these simulations a clustering method is applied to minimise the number of ensembles required. Furthermore, we analyse the different uncertainties that arise from the different model characteristics and definitions of climate indicators. Finally, we will test if the same subsampling strategy can be used for different climate indicators. This will facilitate the use of the subsampling results for a wide number of impact assessment studies.

Traditional Ensemble Kalman Filter (EnKF) data assimilation requires computationally intensive Monte Carlo (MC) sampling, which suffers from filter inbreeding unless the number of simulations is large. Recently we proposed an alternative EnKF groundwater-data assimilation method that obviates the need for sampling and is free of inbreeding issues. In our new approach, theoretical ensemble moments are approximated directly by solving a system of corresponding stochastic groundwater flow equations. Like MC-based EnKF, our moment equations (ME) approach allows Bayesian updating of system states and parameters in real-time as new data become available. Here we compare the performances and accuracies of the two approaches on two-dimensional transient groundwater flow toward a well pumping water in a synthetic, randomly heterogeneous confined aquifer subject to prescribed head and flux boundary conditions.

By studying the co-crystal information of interactions between PDE5 and its inhibitors, forty new tetrahydro-β-carbolines based-analogues were synthesized, and tested for their PDE5 inhibition. Some compounds were as active as tadalafil in inhibiting PDE5 and of better selectivity profile particularly versus PDE11A, the nature of the terminal ring and its nitrogen substituent are the main determinants of selectivity. Ensemble docking confirmed the role of H-loop closed conformer in activity versus its occluded and open forms. Conformational studies showed the effect of bulkiness of the terminal ring N-alkyl substituent on the formation of stable enzyme ligands conformers. The difference in potencies of hydantoin and piperazinedione analogues, together with the necessity of C-5/C-6 R-absolute configuration has been revealed through molecular docking. PMID:23117589

Carbon nanotubes and the nanotube heterojunctions have recently emerged as excellent candidates for nanoscale molecular electronic device components. Experimental measurements on the conductivity, rectifying behavior and conductivity-chirality correlation have also been made. While quasi-one dimensional simple heterojunctions between nanotubes with different electronic behavior can be generated by introduction of a pair of heptagon-pentagon defects in an otherwise all hexagon graphene sheet. Other complex 3- and 4-point junctions may require other mechanisms. Structural stability as well as local electronic density of states of various nanotube junctions are investigated using a generalized tight-binding molecular dynamics (GDBMD) scheme that incorporates non-orthogonality of the orbitals. The junctions investigated include straight and small angle heterojunctions of various chiralities and diameters; as well as more complex 'T' and 'Y' junctions which do not always obey the usual pentagon-heptagon pair rule. The study of local density of states (LDOS) reveal many interesting features, most prominent among them being the defect-induced states in the gap. The proposed three and four pointjunctions are one of the smallest possible tunnel junctions made entirely of carbon atoms. Furthermore the electronic behavior of the nanotube based device components can be taylored by doping with group III-V elements such as B and N, and BN nanotubes as a wide band gap semiconductor has also been realized in experiments. Structural properties of heteroatomic nanotubes comprising C, B and N will be discussed.

Evolution provides the unifying framework with which to understand biology. The coherent investigation of genic and genomic data often requires comparative genomics analyses based on whole-genome alignments, sets of homologous genes and other relevant datasets in order to evaluate and answer evolutionary-related questions. However, the complexity and computational requirements of producing such data are substantial: this has led to only a small number of reference resources that are used for most comparative analyses. The Ensembl comparative genomics resources are one such reference set that facilitates comprehensive and reproducible analysis of chordate genome data. Ensembl computes pairwise and multiple whole-genome alignments from which large-scale synteny, per-base conservation scores and constrained elements are obtained. Gene alignments are used to define Ensembl Protein Families, GeneTrees and homologies for both protein-coding and non-coding RNA genes. These resources are updated frequently and have a consistent informatics infrastructure and data presentation across all supported species. Specialized web-based visualizations are also available including synteny displays, collapsible gene tree plots, a gene family locator and different alignment views. The Ensembl comparative genomics infrastructure is extensively reused for the analysis of non-vertebrate species by other projects including Ensembl Genomes and Gramene and much of the information here is relevant to these projects. The consistency of the annotation across species and the focus on vertebrates makes Ensembl an ideal system to perform and support vertebrate comparative genomic analyses. We use robust software and pipelines to produce reference comparative data and make it freely available. Database URL: http://www.ensembl.org. PMID:26896847

Evolution provides the unifying framework with which to understand biology. The coherent investigation of genic and genomic data often requires comparative genomics analyses based on whole-genome alignments, sets of homologous genes and other relevant datasets in order to evaluate and answer evolutionary-related questions. However, the complexity and computational requirements of producing such data are substantial: this has led to only a small number of reference resources that are used for most comparative analyses. The Ensembl comparative genomics resources are one such reference set that facilitates comprehensive and reproducible analysis of chordate genome data. Ensembl computes pairwise and multiple whole-genome alignments from which large-scale synteny, per-base conservation scores and constrained elements are obtained. Gene alignments are used to define Ensembl Protein Families, GeneTrees and homologies for both protein-coding and non-coding RNA genes. These resources are updated frequently and have a consistent informatics infrastructure and data presentation across all supported species. Specialized web-based visualizations are also available including synteny displays, collapsible gene tree plots, a gene family locator and different alignment views. The Ensembl comparative genomics infrastructure is extensively reused for the analysis of non-vertebrate species by other projects including Ensembl Genomes and Gramene and much of the information here is relevant to these projects. The consistency of the annotation across species and the focus on vertebrates makes Ensembl an ideal system to perform and support vertebrate comparative genomic analyses. We use robust software and pipelines to produce reference comparative data and make it freely available. Database URL: http://www.ensembl.org. PMID:26896847

The Ensembl project (http://www.ensembl.org) is a system for genome annotation, analysis, storage and dissemination designed to facilitate the access of genomic annotation from chordates and key model organisms. It provides access to data from 87 species across our main and early access Pre! websites. This year we introduced three newly annotated species and released numerous updates across our supported species with a concentration on data for the latest genome assemblies of human, mouse, zebrafish and rat. We also provided two data updates for the previous human assembly, GRCh37, through a dedicated website (http://grch37.ensembl.org). Our tools, in particular the VEP, have been improved significantly through integration of additional third party data. REST is now capable of larger-scale analysis and our regulatory data BioMart can deliver faster results. The website is now capable of displaying long-range interactions such as those found in cis-regulated datasets. Finally we have launched a website optimized for mobile devices providing views of genes, variants and phenotypes. Our data is made available without restriction and all code is available from our GitHub organization site (http://github.com/Ensembl) under an Apache 2.0 license. PMID:26687719

The Ensembl project (http://www.ensembl.org) is a system for genome annotation, analysis, storage and dissemination designed to facilitate the access of genomic annotation from chordates and key model organisms. It provides access to data from 87 species across our main and early access Pre! websites. This year we introduced three newly annotated species and released numerous updates across our supported species with a concentration on data for the latest genome assemblies of human, mouse, zebrafish and rat. We also provided two data updates for the previous human assembly, GRCh37, through a dedicated website (http://grch37.ensembl.org). Our tools, in particular the VEP, have been improved significantly through integration of additional third party data. REST is now capable of larger-scale analysis and our regulatory data BioMart can deliver faster results. The website is now capable of displaying long-range interactions such as those found in cis-regulated datasets. Finally we have launched a website optimized for mobile devices providing views of genes, variants and phenotypes. Our data is made available without restriction and all code is available from our GitHub organization site (http://github.com/Ensembl) under an Apache 2.0 license. PMID:26687719

An integrated method of a proper orthogonal decomposition based reduced-order model (ROM) and data assimilation is proposed for the real-time prediction of an unsteady flow field. In this paper, a particle filter (PF) and an ensemble Kalman filter (EnKF) are compared for data assimilation and the difference in the predicted flow fields is evaluated focusing on the probability density function (PDF) of the model variables. The proposed method is demonstrated using identical twin experiments of an unsteady flow field around a circular cylinder at the Reynolds number of 1000. The PF and EnKF are employed to estimate temporal coefficients of the ROM based on the observed velocity components in the wake of the circular cylinder. The prediction accuracy of ROM-PF is significantly better than that of ROM-EnKF due to the flexibility of PF for representing a PDF compared to EnKF. Furthermore, the proposed method reproduces the unsteady flow field several orders faster than the reference numerical simulation based on the Navier-Stokes equations.

Flooding is a wide spread and devastating natural disaster worldwide. Floods that took place in the last decade in China were ranked the worst amongst recorded floods worldwide in terms of the number of human fatalities and economic losses (Munich Re-Insurance). Rapid economic development and population expansion into low lying flood plains has worsened the situation. Current conventional flood prediction systems in China are neither suited to the perceptible climate variability nor the rapid pace of urbanization sweeping the country. Flood prediction, from short-term (a few hours) to medium-term (a few days), needs to be revisited and adapted to changing socio-economic and hydro-climatic realities. The latest technology requires implementation of multiple numerical weather prediction systems. The availability of twelve global ensemble weather prediction systems through the ‘THORPEX Interactive Grand Global Ensemble’ (TIGGE) offers a good opportunity for an effective state-of-the-art early forecasting system. A prototype of a Novel Flood Early Warning System (NEWS) using the TIGGE database is tested in the Huai River basin in east-central China. It is the first early flood warning system in China that uses the massive TIGGE database cascaded with river catchment models, the Xinanjiang hydrologic model and a 1-D hydraulic model, to predict river discharge and flood inundation. The NEWS algorithm is also designed to provide web-based services to a broad spectrum of end-users. The latter presents challenges as both databases and proprietary codes reside in different locations and converge at dissimilar times. NEWS will thus make use of a ready-to-run grid system that makes distributed computing and data resources available in a seamless and secure way. An ability to run or function on different operating systems and provide an interface or front that is accessible to broad spectrum of end-users is additional requirement. The aim is to achieve robust

This paper presents an image classification model developed to classify images embedded in commercial real estate flyers. It is a component in a larger, multimodal system which uses texts as well as images in the flyers to automatically classify them by the property types. The role of the image classifier in the system is to provide the genres of the embedded images (map, schematic drawing, aerial photo, etc.), which to be combined with the texts in the flyer to do the overall classification. In this work, we used an ensemble learning approach and developed a model where the outputs of an ensemble of support vector machines (SVMs) are combined by a k-nearest neighbor (KNN) classifier. In this model, the classifiers in the ensemble are strong classifiers, each of which is trained to predict a given/assigned genre. Not only is our model intuitive by taking advantage of the mutual distinctness of the image genres, it is also scalable. We tested the model using over 3000 images extracted from online real estate flyers. The result showed that our model outperformed the baseline classifiers by a large margin.

Due to the large dimensionality of the state vector and sparsity of observations, the initial conditions (IC) of water quality models are subject to large uncertainties. To reduce the IC uncertainties in operational water quality forecasting, an ensemble data assimilation (DA) procedure for the Hydrologic Simulation Program - Fortran (HSPF) model has been developed and evaluated for the Kumho River Subcatchment of the Nakdong River Basin in Korea. The procedure, referred to herein as MLEF-HSPF, uses maximum likelihood ensemble filter (MLEF) which combines strengths of variational assimilation (VAR) and ensemble Kalman filter (EnKF). The Control variables involved in the DA procedure include the bias correction factors for mean areal precipitation and mean areal potential evaporation, the hydrologic state variables, and the water quality state variables such as water temperature, dissolved oxygen (DO), biochemical oxygen demand (BOD), ammonium (NH4), nitrate (NO3), phosphate (PO4) and chlorophyll a (CHL-a). Due to the very large dimensionality of the inverse problem, accurately specifying the parameters for the DA procdedure is a challenge. Systematic sensitivity analysis is carried out for identifying the optimal parameter settings. To evaluate the robustness of MLEF-HSPF, we use multiple subcatchments of the Nakdong River Basin. In evaluation, we focus on the performance of MLEF-HSPF on prediction of extreme water quality events.

We present a simplified version of a repeater protocol in a cold neutral-atom ensemble with Rydberg excitations optimized for two-node entanglement generation and describe a protocol for quantum teleportation. Our proposal draws from previous proposals [B. Zhao et al., Phys. Rev. A 81, 052329 (2010), 10.1103/PhysRevA.81.052329; Y. Han et al., Phys. Rev. A 81, 052311 (2010), 10.1103/PhysRevA.81.052311] that described efficient and robust protocols for long-distance entanglement with many nodes. Using realistic experimental values, we predict an entanglement generation rate of ˜25 Hz and a teleportation rate of ˜5 Hz . Our predicted rates match the current state-of-the-art experiments for entanglement generation and teleportation between quantum memories. With improved efficiencies we predict entanglement generation and teleportation rates of ˜7.8 and ˜3.6 kHz, respectively, representing a two-order-of-magnitude improvement over the currently realized values. Cold-atom ensembles with Rydberg excitations are promising candidates for repeater nodes because collective effects in the ensemble can be used to deterministically generate a long-lived ground-state memory which may be efficiently mapped onto a directionally emitted single photon.

Using an ensemble of classifiers instead of a single classifier has been shown to improve generalization performance in many pattern recognition problems. However, the extent of such improvement depends greatly on the amount of correlation among the errors of the base classifiers. Therefore, reducing those correlations while keeping the classifiers' performance levels high is an important area of research. In this article, we explore input decimation (ID), a method which selects feature subsets for their ability to discriminate among the classes and uses them to decouple the base classifiers. We provide a summary of the theoretical benefits of correlation reduction, along with results of our method on two underwater sonar data sets, three benchmarks from the Probenl/UCI repositories, and two synthetic data sets. The results indicate that input decimated ensembles (IDEs) outperform ensembles whose base classifiers use all the input features; randomly selected subsets of features; and features created using principal components analysis, on a wide range of domains.

Hierarchical organization of free energy landscape (FEL) for native globular proteins has been widely accepted by the biophysics community. However, FEL of native proteins is usually projected onto one or a few dimensions. Here we generated collectively 0.2 milli-second molecular dynamics simulation trajectories in explicit solvent for hen egg white lysozyme (HEWL), and carried out detailed conformational analysis based on backbone torsional degrees of freedom (DOF). Our results demonstrated that at micro-second and coarser temporal resolutions, FEL of HEWL exhibits hub-like topology with crystal structures occupying the dominant structural ensemble that serves as the hub of conformational transitions. However, at 100ns and finer temporal resolutions, conformational substates of HEWL exhibit network-like topology, crystal structures are associated with kinetic traps that are important but not dominant ensembles. Backbone torsional state transitions on time scales ranging from nanoseconds to beyond microseconds were found to be associated with various types of molecular interactions. Even at nanoseconds temporal resolution, the number of conformational substates that are of statistical significance is quite limited. These observations suggest that detailed analysis of conformational substates at multiple temporal resolutions is both important and feasible. Transition state ensembles among various conformational substates at microsecond temporal resolution were observed to be considerably disordered. Life times of these transition state ensembles are found to be nearly independent of the time scales of the participating torsional DOFs. PMID:26057625

A fully-coupled physically-based land surface hydrologic model, Flux-PIHM, is developed by incorporating a land-surface scheme into the Penn State Integrated Hydrologic Model (PIHM). The land-surface scheme is mainly adapted from the Noah LSM, which is widely used in mesoscale atmospheric models and has undergone extensive testing. Because PIHM is capable of simulating lateral water flow and deep groundwater, Flux-PIHM is able to represent both the link between groundwater and the surface energy balance, as well as some of the land surface heterogeneities caused by topography. Flux-PIHM has been implemented and manually calibrated at the Shale Hills watershed (0.08 km2) in central Pennsylvania. Model predictions of discharge, soil moisture, water table depth, sensible and latent heat fluxes, and soil temperature show good agreement with observations. The discharge prediction is significantly better than state-of-the-art conceptual models implemented at similar watersheds. The ensemble Kalman filter (EnKF) provides a promising approach for physically-based land surface hydrologic model calibration. A Flux-PHIM data assimilation system is developed by incorporating EnKF into Flux-PIHM for model parameter and state estimation. This is the first parameter estimation using EnKF for a physically-based hydrologic model. Both synthetic and real data experiments are performed at the Shale Hills watershed to test the capability of EnKF in parameter estimation. Six model parameters selected from a model parameter sensitivity test are estimated. In the synthetic experiments, synthetic observations of discharge, water table depth, soil moisture, land surface temperature, sensible and latent heat fluxes, and transpiration are assimilated into the system. Observations are assimilated every 72 hours in wet periods, and every 144 hours in dry periods. Results show that EnKF is capable of accurately estimating model parameter values for Flux-PIHM. In the first set of experiments

Ensembl (http://www.ensembl.org) creates tools and data resources to facilitate genomic analysis in chordate species with an emphasis on human, major vertebrate model organisms and farm animals. Over the past year we have increased the number of species that we support to 77 and expanded our genome browser with a new scrollable overview and improved variation and phenotype views. We also report updates to our core datasets and improvements to our gene homology relationships from the addition of new species. Our REST service has been extended with additional support for comparative genomics and ontology information. Finally, we provide updated information about our methods for data access and resources for user training. PMID:24316576

As one of the most adopted sequential data assimilation methods in many areas, especially those involving complex nonlinear dynamics, the ensemble Kalman filter (EnKF) has been under extensive investigation regarding its properties and efficiency. Compared to other variants of the Kalman filter (KF), EnKF is straightforward to implement, as it employs random ensembles to represent solution states. This, however, introduces sampling errors that affect the accuracy of EnKF in a negative manner. Though sampling errors can be easily reduced by using a large number of samples, in practice this is undesirable as each ensemble member is a solution of the system of state equations and can be time consuming to compute for large-scale problems. In this paper we present an efficient EnKF implementation via generalized polynomial chaos (gPC) expansion. The key ingredients of the proposed approach involve (1) solving the system of stochastic state equations via the gPC methodology to gain efficiency; and (2) sampling the gPC approximation of the stochastic solution with an arbitrarily large number of samples, at virtually no additional computational cost, to drastically reduce the sampling errors. The resulting algorithm thus achieves a high accuracy at reduced computational cost, compared to the classical implementations of EnKF. Numerical examples are provided to verify the convergence property and accuracy improvement of the new algorithm. We also prove that for linear systems with Gaussian noise, the first-order gPC Kalman filter method is equivalent to the exact Kalman filter.

Heavy rainfall-triggered landslides are often associated with flood events and cause additional loss of life and property. It is pertinent to build a robust coupled flash flood and landslide disaster early warning system for disaster preparedness and hazard management based. In this study, we built an ensemble-based coupled flash flood and landslide disaster early warning system, which is aimed for operational use by the US National Weather Service, by integrating the Coupled Routing and Excess STorage (CREST) model and Sacramento Soil Moisture Accounting Model (SAC-SMA) with the physically based SLope-Infiltration-Distributed Equilibrium (SLIDE) landslide prediction model. We further evaluated this ensemble-based prototype warning system by conducting multi-year simulations driven by the Multi-Radar Multi-Sensor (MRMS) rainfall estimates in North Carolina and Oregon. We comprehensively evaluated the predictive capabilities of this system against observed and reported flood and landslides events. We then evaluated the sensitivity of the coupled system to the simulated hydrological processes. Our results show that the system is generally capable of making accurate predictions of flash flood and landslide events in terms of their locations and time of occurrence. The occurrence of predicted landslides show high sensitivity to total infiltration and soil water content, highlighting the importance of accurately simulating the hydrological processes on the accurate forecasting of rainfall triggered landslide events.

In previous work, we have proposed a constructive methodology for temporal data learning supported by results and prescriptions related to the embedding theorem, and using the singular spectrum analysis both in order to reduce the effects of the possible discontinuity of the signal and to implement an efficient ensemble method. In this paper we present new results concerning the application of this approach to the forecasting of the individual rain-fall intensities series collected by 135 stations distributed in the Tiber basin. The average RMS error of the obtained forecasting is less than 3mm of rain. PMID:12672433

While chemical shifts are invaluable for obtaining structural information from proteins, they also offer one of the rare ways to obtain information about protein dynamics. A necessary tool in transforming chemical shifts into structural and dynamic information is chemical shift prediction. In our previous work we developed a method for 4D prediction of protein (1)H chemical shifts in which molecular motions, the 4th dimension, were modeled using molecular dynamics (MD) simulations. Although the approach clearly improved the prediction, the X-ray structures and single NMR conformers used in the model cannot be considered fully realistic models of protein in solution. In this work, NMR ensembles (NMRE) were used to expand the conformational space of proteins (e.g. side chains, flexible loops, termini), followed by MD simulations for each conformer to map the local fluctuations. Compared with the non-dynamic model, the NMRE+MD model gave 6-17% lower root-mean-square (RMS) errors for different backbone nuclei. The improved prediction indicates that NMR ensembles with MD simulations can be used to obtain a more realistic picture of protein structures in solutions and moreover underlines the importance of short and long time-scale dynamics for the prediction. The RMS errors of the NMRE+MD model were 0.24, 0.43, 0.98, 1.03, 1.16 and 2.39 ppm for (1)Hα, (1)HN, (13)Cα, (13)Cβ, (13)CO and backbone (15)N chemical shifts, respectively. The model is implemented in the prediction program 4DSPOT, available at http://www.uef.fi/4dspot. PMID:22314705

It is difficult to model multi-frequency signal, such as mechanical vibration and acoustic signals of wet ball mill in the mineral grinding process. In this paper, these signals are decomposed into multi-scale intrinsic mode functions (IMFs) by the empirical mode decomposition (EMD) technique. A new adaptive multi-scale spectral features selection approach based on sphere criterion (SC) is applied to these IMFs frequency spectra. The candidate sub-models are constructed by the partial least squares (PLS) with the selected features. Finally, the branch and bound based selective ensemble (BBSEN) algorithm is applied to select and combine these ensemble sub-models. This method can be easily extended to regression and classification problems with multi-time scale signal. We successfully apply this approach to a laboratory-scale ball mill. The shell vibration and acoustic signals are used to model mill load parameters. The experimental results demonstrate that this novel approach is more effective than the other modeling methods based on multi-scale frequency spectral features.

The aim of the present work focuses on exploring the feasibility of analyzing the relationship between diabetes mellitus and several element levels in hair/urine specimens by chemometrics. A dataset involving 211 specimens and eight element concentrations was used. The control group was divided into three age subsets in order to analyze the influence of age. It was found that the most obvious difference was the effect of age on the level of zinc and iron. The decline of iron concentration with age in hair was exactly consistent with the opposite trend in urine. Principal component analysis (PCA) was used as a tool for a preliminary evaluation of the data. Both ensemble and single support vector machine (SVM) algorithms were used as the classification tools. On average, the accuracy, sensitivity and specificity of ensemble SVM models were 99%, 100%, 99% and 97%, 89%, 99% for hair and urine samples, respectively. The findings indicate that hair samples are superior to urine samples. Even so, it can provide more valuable information for prevention, diagnostics, treatment and research of diabetes by simultaneously analyzing the hair and urine samples. PMID:24835087

A novel multi-frame particle image velocimetry (PIV) method, able to evaluate a fluid trajectory by means of an ensemble-averaged cross-correlation, is introduced. The method integrates the advantages of the state-of-art time-resolved PIV (TR-PIV) methods to further enhance both robustness and dynamic range. The fluid trajectory follows a polynomial model with a prescribed order. A set of polynomial coefficients, which maximizes the ensemble-averaged cross-correlation value across the frames, is regarded as the most appropriate solution. To achieve a convergence of the trajectory in terms of polynomial coefficients, an ensemble-averaged cross-correlation map is constructed by sampling cross-correlation values near the predictor trajectory with respect to an imposed change of each polynomial coefficient. A relation between the given change and corresponding cross-correlation maps, which could be calculated from the ordinary cross-correlation, is derived. A disagreement between computational domain and corresponding physical domain is compensated by introducing the Jacobian matrix based on the image deformation scheme in accordance with the trajectory. An increased cost of the convergence calculation, associated with the nonlinearity of the fluid trajectory, is moderated by means of a V-cycle iteration. To validate enhancements of the present method, quantitative comparisons with the state-of-arts TR-PIV methods, e.g., the adaptive temporal interval, the multi-frame pyramid correlation and the fluid trajectory correlation, were carried out by using synthetically generated particle image sequences. The performances of the tested methods are discussed in algorithmic terms. A high-rate TR-PIV experiment of a flow over an airfoil demonstrates the effectiveness of the present method. It is shown that the present method is capable of reducing random errors in both velocity and material acceleration while suppressing spurious temporal fluctuations due to measurement noise.

Atmospheric inversions can be used to assess biosphere-atmosphere CO2 surface exchanges, but variability among inverse flux estimates at regional scales remains significant. Atmospheric transport model errors are presumed to be one of the main contributors to this variability, but have not been quantified thoroughly. Our study aims to evaluate and quantify the transport errors in the Weather Research and Forecasting (WRF) mesoscale model, recently used to produce inverse flux estimates at the regional scale over the NACP Mid-Continental Intensive (MCI) domain. We evaluate transport errors with an ensemble of WRF simulations using different physical parameterizations (e.g., atmospheric boundary layer (ABL) schemes, land surface models (LSMs), and cumulus parameterizations (CP)). Modeled meteorological variables and atmospheric CO2 mixing ratios are compared to observations (e.g., radiosondes, wind profilers, AmeriFlux sites, and CO2 mixing ratio towers) available in the MCI region for summer of 2008. Comparisons to date include simulations using two different land surface models (Noah and Rapid Update Cycle (RUC)), three different ABL schemes (YSU, MYJ and MYNN) and two different cumulus parameterizations (Kain-Fritsch and Grell-3D). We examine using the ensemble as a proxy for the observed model-data mismatch. Then we present a study of the sensitivity of atmospheric conditions to the choice of physical parameterization, to identify the parameterization driving the model-to-model variability in atmospheric CO2 concentrations at the mesoscale over the MCI domain. For example, we show that, whereas the ABL depth is highly influenced by the choice of ABL scheme and LSM, the mean horizontal wind speed is mainly influenced by the LSM only. Finally, we evaluate the variability in space and time of transport errors and their impact in atmospheric CO2 concentrations. Future work will be to describe transport errors in the MCI regional atmospheric inversion based on the

Sparse energy functions that ignore long range interactions between residue pairs are frequently used by protein design algorithms to reduce computational cost. Current dynamic programming algorithms that fully exploit the optimal substructure produced by these energy functions only compute the GMEC. This disproportionately favors the sequence of a single, static conformation and overlooks better binding sequences with multiple low-energy conformations. Provable, ensemble-based algorithms such as A* avoid this problem, but A* cannot guarantee better performance than exhaustive enumeration. We propose a novel, provable, dynamic programming algorithm called Branch-Width Minimization* (BWM*) to enumerate a gap-free ensemble of conformations in order of increasing energy. Given a branch-decomposition of branch-width w for an n-residue protein design with at most q discrete side-chain conformations per residue, BWM* returns the sparse GMEC in O([Formula: see text]) time and enumerates each additional conformation in merely O([Formula: see text]) time. We define a new measure, Total Effective Search Space (TESS), which can be computed efficiently a priori before BWM* or A* is run. We ran BWM* on 67 protein design problems and found that TESS discriminated between BWM*-efficient and A*-efficient cases with 100% accuracy. As predicted by TESS and validated experimentally, BWM* outperforms A* in 73% of the cases and computes the full ensemble or a close approximation faster than A*, enumerating each additional conformation in milliseconds. Unlike A*, the performance of BWM* can be predicted in polynomial time before running the algorithm, which gives protein designers the power to choose the most efficient algorithm for their particular design problem. PMID:26744898

In this work we study how the multi-element nature of light emitting diodes (LEDs) based on nanowire (NW) ensembles influences their current voltage (I–V) characteristics. We systematically address critical issues of the fabrication process that can result in significant fluctuations of the electrical properties among the individual NWs in such LEDs, paying particular attention to the planarization step. Electroluminescence (EL) maps acquired for two nominally identical NW-LEDs reveal that small processing variations can result in a large difference in the number of individual nano-devices emitting EL. The lower number of EL spots in one of the LEDs is caused by its inhomogeneous electrical properties. The I–V characteristics of this LED cannot be described well by the classical Shockley model. We are able to take into account the multi-element nature of such LEDs and fit the I–V characteristics in the forward bias regime by employing an ad hoc adjusted version of the Shockley equation. More specifically, we introduce a bias dependence of the ideality factor. The basic considerations of our model should remain valid also for other types of devices based on ensembles of interconnected p–n junctions with inhomogeneous electrical properties, regardless of the employed material system.

In this work we study how the multi-element nature of light emitting diodes (LEDs) based on nanowire (NW) ensembles influences their current voltage (I-V) characteristics. We systematically address critical issues of the fabrication process that can result in significant fluctuations of the electrical properties among the individual NWs in such LEDs, paying particular attention to the planarization step. Electroluminescence (EL) maps acquired for two nominally identical NW-LEDs reveal that small processing variations can result in a large difference in the number of individual nano-devices emitting EL. The lower number of EL spots in one of the LEDs is caused by its inhomogeneous electrical properties. The I-V characteristics of this LED cannot be described well by the classical Shockley model. We are able to take into account the multi-element nature of such LEDs and fit the I-V characteristics in the forward bias regime by employing an ad hoc adjusted version of the Shockley equation. More specifically, we introduce a bias dependence of the ideality factor. The basic considerations of our model should remain valid also for other types of devices based on ensembles of interconnected p-n junctions with inhomogeneous electrical properties, regardless of the employed material system. PMID:27232449

The biological properties of proteins are uniquely determined by their structure and dynamics. A protein in solution populates a structural ensemble of metastable configurations around the global fold. From overall rotation to local fluctuations, the dynamics of proteins can cover several orders of magnitude in time scales. We propose a simulation-free coarse-grained approach which utilizes knowledge of the important metastable folded states of the protein to predict the protein dynamics. This approach is based upon the Langevin Equation for Protein Dynamics (LE4PD), a Langevin formalism in the coordinates of the protein backbone. The linear modes of this Langevin formalism organize the fluctuations of the protein, so that more extended dynamical cooperativity relates to increasing energy barriers to mode diffusion. The accuracy of the LE4PD is verified by analyzing the predicted dynamics across a set of seven different proteins for which both relaxation data and NMR solution structures are available. Using experimental NMR conformers as the input structural ensembles, LE4PD predicts quantitatively accurate results, with correlation coefficient ρ = 0.93 to NMR backbone relaxation measurements for the seven proteins. The NMR solution structure derived ensemble and predicted dynamical relaxation is compared with molecular dynamics simulation-derived structural ensembles and LE4PD predictions and is consistent in the time scale of the simulations. The use of the experimental NMR conformers frees the approach from computationally demanding simulations.

We present a new release (6.0β) of the ORAC program [Marsili et al. J. Comput. Chem. 2010, 31, 1106-1116] with a hybrid OpenMP/MPI (open multiprocessing message passing interface) multilevel parallelism tailored for generalized ensemble (GE) and fast switching double annihilation (FS-DAM) nonequilibrium technology aimed at evaluating the binding free energy in drug-receptor system on high performance computing platforms. The production of the GE or FS-DAM trajectories is handled using a weak scaling parallel approach on the MPI level only, while a strong scaling force decomposition scheme is implemented for intranode computations with shared memory access at the OpenMP level. The efficiency, simplicity, and inherent parallel nature of the ORAC implementation of the FS-DAM algorithm, project the code as a possible effective tool for a second generation high throughput virtual screening in drug discovery and design. The code, along with documentation, testing, and ancillary tools, is distributed under the provisions of the General Public License and can be freely downloaded at www.chim.unifi.it/orac . PMID:27231982

To enhance prediction reliability and accuracy, a hybrid model based on the promising principle of "decomposition and ensemble" and a recently proposed meta-heuristic called grey wolf optimizer (GWO) is introduced for daily PM2.5 concentration forecasting. Compared with existing PM2.5 forecasting methods, this proposed model has improved the prediction accuracy and hit rates of directional prediction. The proposed model involves three main steps, i.e., decomposing the original PM2.5 series into several intrinsic mode functions (IMFs) via complementary ensemble empirical mode decomposition (CEEMD) for simplifying the complex data; individually predicting each IMF with support vector regression (SVR) optimized by GWO; integrating all predicted IMFs for the ensemble result as the final prediction by another SVR optimized by GWO. Seven benchmark models, including single artificial intelligence (AI) models, other decomposition-ensemble models with different decomposition methods and models with the same decomposition-ensemble method but optimized by different algorithms, are considered to verify the superiority of the proposed hybrid model. The empirical study indicates that the proposed hybrid decomposition-ensemble model is remarkably superior to all considered benchmark models for its higher prediction accuracy and hit rates of directional prediction.

factors that will influence the outcome of the algorithm are the following: the choice of the hydrological model, the uncertainty model applied for ensemble generation, the general wetness of the catchment during which the error covariance is computed, etc. In this research the influence of the latter two is examined more in-depth. Furthermore, the optimal network configuration resulting from the newly developed algorithm is compared to network configurations obtained by two other algorithms. The first algorithm is based on a temporal stability analysis of the modeled soil moisture in order to identify catchment representative monitoring locations with regard to average conditions. The second algorithm involves the clustering of available spatially distributed data (e.g. land cover and soil maps) that is not obtained by hydrological modeling.

A quantum memory is a key component for quantum networks, which will enable the distribution of quantum information. Its successful development requires storage of single-photon light. Encoding photons with spatial shape through higher-dimensional states significantly increases their information-carrying capability and network capacity. However, constructing such quantum memories is challenging. Here we report the first experimental realization of a true single-photon-carrying orbital angular momentum stored via electromagnetically induced transparency in a cold atomic ensemble. Our experiments show that the non-classical pair correlation between trigger photon and retrieved photon is retained, and the spatial structure of input and retrieved photons exhibits strong similarity. More importantly, we demonstrate that single-photon coherence is preserved during storage. The ability to store spatial structure at the single-photon level opens the possibility for high-dimensional quantum memories. PMID:24084711

A quantum memory is a key component for quantum networks, which will enable the distribution of quantum information. Its successful development requires storage of single-photon light. Encoding photons with spatial shape through higher-dimensional states significantly increases their information-carrying capability and network capacity. However, constructing such quantum memories is challenging. Here we report the first experimental realization of a true single-photon-carrying orbital angular momentum stored via electromagnetically induced transparency in a cold atomic ensemble. Our experiments show that the non-classical pair correlation between trigger photon and retrieved photon is retained, and the spatial structure of input and retrieved photons exhibits strong similarity. More importantly, we demonstrate that single-photon coherence is preserved during storage. The ability to store spatial structure at the single-photon level opens the possibility for high-dimensional quantum memories.

Radial basis function (RBF) surrogate models have been widely applied in engineering design optimization problems to approximate computationally expensive simulations. Ensemble of radial basis functions (ERBF) using the weighted sum of stand-alone RBFs improves the approximation performance. To achieve a good trade-off between the accuracy and efficiency of the modelling process, this article presents a novel efficient ERBF method to determine the weights through solving a quadratic programming subproblem, denoted ERBF-QP. Several numerical benchmark functions are utilized to test the performance of the proposed ERBF-QP method. The results show that ERBF-QP can significantly improve the modelling efficiency compared with several existing ERBF methods. Moreover, ERBF-QP also provides satisfactory performance in terms of approximation accuracy. Finally, the ERBF-QP method is applied to a satellite multidisciplinary design optimization problem to illustrate its practicality and effectiveness for real-world engineering applications.

With the growing interest in the application of wind energy, doubly fed induction generator (DFIG) plays an essential role in the industry nowadays. To deal with the increasing stochastic variations introduced by intermittent wind resource and responsive loads, dynamic state estimation (DSE) are introduced in any power system associated with DFIGs. However, sometimes this dynamic analysis canould not work because the parameters of DFIGs are not accurate enough. To solve the problem, an ensemble Kalman filter (EnKF) method is proposed for the state estimation and parameter calibration tasks. In this paper, a DFIG is modeled and implemented with the EnKF method. Sensitivity analysis is demonstrated regarding the measurement noise, initial state errors and parameter errors. The results indicate this EnKF method has a robust performance on the state estimation and parameter calibration of DFIGs.

Molecular umbrellas are “amphomorphic” compounds that can produce a hydrophobic or a hydrophilic exterior when exposed to a hydrophobic or hydrophilic microenvironment, respectively. Such molecules are composed of two or more facial amphiphiles that are connected to a central scaffold. Molecular umbrellas that have been synthesized to date, using bile acids as umbrella “walls”, polyamines such as spermidine and spermine as scaffold material, and L-lysine as “branches”, have been found capable of transporting certain hydrophilic peptides, nucleotides, and oligonucleotides across liposomal membranes by passive diffusion. They have also have been shown to increase in the water solubility and hydrolytic stability of a hydrophobic drug, and to exhibit significant antiviral activity. The ability of a fluorescently-labeled molecular umbrella to readily enter live HeLa cells suggests that such conjugates could find use as drug carriers. PMID:19053303

Small oligomers formed early along human islet amyloid polypeptide (hIAPP) aggregation is responsible for the cell death in Type II diabetes. The epigallocatechin gallate (EGCG), a green tea extract, was found to inhibit hIAPP fibrillation. However, the inhibition mechanism and the conformational distribution of the smallest hIAPP oligomer - dimer are mostly unknown. Herein, we performed extensive replica exchange molecular dynamic simulations on hIAPP dimer with and without EGCG molecules. Extended hIAPP dimer conformations, with a collision cross section value similar to that observed by ion mobility-mass spectrometry, were observed in our simulations. Notably, these dimers adopt a three-stranded antiparallel β-sheet and contain the previously reported β-hairpin amyloidogenic precursor. We find that EGCG binding strongly blocks both the inter-peptide hydrophobic and aromatic-stacking interactions responsible for inter-peptide β-sheet formation and intra-peptide interaction crucial for β-hairpin formation, thus abolishes the three-stranded β-sheet structures and leads to the formation of coil-rich conformations. Hydrophobic, aromatic-stacking, cation-π and hydrogen-bonding interactions jointly contribute to the EGCG-induced conformational shift. This study provides, on atomic level, the conformational ensemble of hIAPP dimer and the molecular mechanism by which EGCG inhibits hIAPP aggregation. PMID:27620620

By performing molecular dynamics simulations to form a hydrate with a methane nano-bubble in liquid water at 250 K and 50 MPa, we report how different ensembles, such as the NPT, NVT, and NVE ensembles, affect the nucleation kinetics of the methane hydrate. The nucleation trajectories are monitored using the face-saturated incomplete cage analysis (FSICA) and the mutually coordinated guest (MCG) order parameter (OP). The nucleation rate and the critical nucleus are obtained using the mean first-passage time (MFPT) method based on the FS cages and the MCG-1 OPs, respectively. The fitting results of MFPT show that hydrate nucleation and growth are coupled together, consistent with the cage adsorption hypothesis which emphasizes that the cage adsorption of methane is a mechanism for both hydrate nucleation and growth. For the three different ensembles, the hydrate nucleation rate is quantitatively ordered as follows: NPT > NVT > NVE, while the sequence of hydrate crystallinity is exactly reversed. However, the largest size of the critical nucleus appears in the NVT ensemble, rather than in the NVE ensemble. These results are helpful for choosing a suitable ensemble when to study hydrate formation via computer simulations, and emphasize the importance of the order degree of the critical nucleus. PMID:27222203

We assess the potential forecast skill of a climate model-based approach for seasonal ensemble hydrologic and streamflow forecasting for the western United States. By using climate model ensemble forecasts and ensembles formed via the resampling of observations, we distinguish hydrologic forecast skill resulting from the predictable evolution of initial hydrologic conditions from that derived from the climate model forecasts. Monthly climate model ensembles of precipitation and temperature produced by the National Centers for Environmental prediction global spectral model (GSM) are downscaled for use as forcings of the variable infiltration capacity (VIC) hydrologic model. VIC then simulates ensembles of streamflow and spatially distributed hydrologic variables such as snowpack, soil moisture, and runoff. The regional averages of the ensemble forcings and derived hydrologic variables were evaluated over five regions: the Pacific Northwest, California, the Great Basin, the Colorado River basin, and the upper Rio Grande River basin. The skill assessment focuses on a retrospective 21-year period (1979-1999) during which GSM retrospective forecast ensembles (termed hindcasts), created using similar procedures to GSM real-time forecasts, are available. The observational verification data set for the hindcasts was a retrospective hydroclimatology at 1/8°-1/4° consisting of gridded observations of temperature and precipitation and gridded hydrologic simulation results (for hydrologic variables and streamflow) based on the observed meteorological inputs. The GSM hindcast skill was assessed relative to that of a naive ensemble climatology forecast and to that of ensemble streamflow prediction (ESP) hindcasts, a forecast baseline sharing the same initial condition information as the GSM-based hindcasts. We found that the unconditional (all years) GSM hindcasts for regionally averaged variables provided practically no skill improvement over the ESP hindcasts and did not

The combination of ensemble predictions of Hs made by the US National Weather Service (NEW) and the US Navy Fleet Numerical Meteorological and Oceanography Center (FNMOC) has established the NFCENS, a probabilistic wave forecast system in operations at NCEP since 2011. Computed from 41 combined wave ensemble members, the new product outperforms deterministic and probabilistic forecasts and nowcasts of Hs issued separately at each forecast center, at all forecast ranges. The successful implementation of the NFCENS has brought new opportunities for collaboration with Environment Canada (EC). EC is in the process of adding new global wave model ensemble products to its existing suite of operational regional products. The planned upgrade to the current NFCENS wave multi-center ensemble includes the addition of 20 members from the Canadian WES. With this upgrade, the NFCENS will be renamed North American Wave Ensemble System (NAWES). As part of the new system implementation, new higher-resolution grids and upgrades to model physics using recent advances in source-term parameterizations are being tested. We provide results of a first validation of NAWES relative to global altimeter data, and buoy measurements of waves, as well as its ability to forecast waves during the 2012 North Atlantic hurricane Sandy. A second line of research involving wave ensembles at the NWS is the implementation of a LETKF-based data assimilation system developed in collaboration with the Argentinian Navy Meteorological Service. The project involves an implementation of the 4D-LETKF in the NWS global wave ensemble forecast system GWES. The 4-D scheme initializes a full 81-member ensemble in a 6-hour cycle. The LETKF determines the analysis ensemble locally in the space spanned by the ensemble, as a linear combination of the background perturbations. Observations from three altimeters and one scatterometer were used. Preliminary results for a prototype system running at the NWS, including

The Ensembl gene annotation system has been used to annotate over 70 different vertebrate species across a wide range of genome projects. Furthermore, it generates the automatic alignment-based annotation for the human and mouse GENCODE gene sets. The system is based on the alignment of biological sequences, including cDNAs, proteins and RNA-seq reads, to the target genome in order to construct candidate transcript models. Careful assessment and filtering of these candidate transcripts ultimately leads to the final gene set, which is made available on the Ensembl website. Here, we describe the annotation process in detail.Database URL: http://www.ensembl.org/index.html. PMID:27337980

The Ensembl gene annotation system has been used to annotate over 70 different vertebrate species across a wide range of genome projects. Furthermore, it generates the automatic alignment-based annotation for the human and mouse GENCODE gene sets. The system is based on the alignment of biological sequences, including cDNAs, proteins and RNA-seq reads, to the target genome in order to construct candidate transcript models. Careful assessment and filtering of these candidate transcripts ultimately leads to the final gene set, which is made available on the Ensembl website. Here, we describe the annotation process in detail. Database URL: http://www.ensembl.org/index.html PMID:27337980

Niobate-based octahedral molecular sieves having significant activity for multivalent cations and a method for synthesizing such sieves are disclosed. The sieves have a net negatively charged octahedral framework, comprising niobium, oxygen, and octahedrally coordinated lower valence transition metals. The framework can be charge balanced by the occluded alkali cation from the synthesis method. The alkali cation can be exchanged for other contaminant metal ions. The ion-exchanged niobate-based octahedral molecular sieve can be backexchanged in acidic solutions to yield a solution concentrated in the contaminant metal. Alternatively, the ion-exchanged niobate-based octahedral molecular sieve can be thermally converted to a durable perovskite phase waste form.

Niobate-based octahedral molecular sieves having significant activity for multivalent cations and a method for synthesizing such sieves are disclosed. The sieves have a net negatively charged octahedral framework, comprising niobium, oxygen, and octahedrally coordinated lower valence transition metals. The framework can be charge balanced by the occluded alkali cation from the synthesis method. The alkali cation can be exchanged for other contaminant metal ions. The ion-exchanged niobate-based octahedral molecular sieve can be backexchanged in acidic solutions to yield a solution concentrated in the contaminant metal. Alternatively, the ion-exchanged niobate-based octahedral molecular sieve can be thermally converted to a durable perovskite phase waste form.

Good quality cardiopulmonary resuscitation (CPR) is the mainstay of treatment for managing patients with out-of-hospital cardiac arrest (OHCA). Assessment of the quality of the CPR delivered is now possible through the electrocardiography (ECG) signal that can be collected by an automated external defibrillator (AED). This study evaluates a nonlinear approximation of the CPR given to the asystole patients. The raw ECG signal is filtered using ensemble empirical mode decomposition (EEMD), and the CPR-related intrinsic mode functions (IMF) are chosen to be evaluated. In addition, sample entropy (SE), complexity index (CI), and detrended fluctuation algorithm (DFA) are collated and statistical analysis is performed using ANOVA. The primary outcome measure assessed is the patient survival rate after two hours. CPR pattern of 951 asystole patients was analyzed for quality of CPR delivered. There was no significant difference observed in the CPR-related IMFs peak-to-peak interval analysis for patients who are younger or older than 60 years of age, similarly to the amplitude difference evaluation for SE and DFA. However, there is a difference noted for the CI (p < 0.05). The results show that patients group younger than 60 years have higher survival rate with high complexity of the CPR-IMFs amplitude differences. PMID:27529068

Good quality cardiopulmonary resuscitation (CPR) is the mainstay of treatment for managing patients with out-of-hospital cardiac arrest (OHCA). Assessment of the quality of the CPR delivered is now possible through the electrocardiography (ECG) signal that can be collected by an automated external defibrillator (AED). This study evaluates a nonlinear approximation of the CPR given to the asystole patients. The raw ECG signal is filtered using ensemble empirical mode decomposition (EEMD), and the CPR-related intrinsic mode functions (IMF) are chosen to be evaluated. In addition, sample entropy (SE), complexity index (CI), and detrended fluctuation algorithm (DFA) are collated and statistical analysis is performed using ANOVA. The primary outcome measure assessed is the patient survival rate after two hours. CPR pattern of 951 asystole patients was analyzed for quality of CPR delivered. There was no significant difference observed in the CPR-related IMFs peak-to-peak interval analysis for patients who are younger or older than 60 years of age, similarly to the amplitude difference evaluation for SE and DFA. However, there is a difference noted for the CI (p < 0.05). The results show that patients group younger than 60 years have higher survival rate with high complexity of the CPR-IMFs amplitude differences. PMID:27529068

This study assesses projected changes to drought characteristics in Alberta, Saskatchewan and Manitoba, the prairie provinces of Canada, using a multi-regional climate model (RCM) ensemble available through the North American Regional Climate Change Assessment Program. Simulations considered include those performed with six RCMs driven by National Center for Environmental Prediction reanalysis II for the 1981-2003 period and those driven by four Atmosphere-Ocean General Circulation Models for the 1970-1999 and 2041-2070 periods (i.e. eleven current and the same number of corresponding future period simulations). Drought characteristics are extracted using two drought indices, namely the Standardized Precipitation Index (SPI) and the Standardized Precipitation Evapotranspiration Index (SPEI). Regional frequency analysis is used to project changes to selected 20- and 50-year regional return levels of drought characteristics for fifteen homogeneous regions, covering the study area. In addition, multivariate analyses of drought characteristics, derived on the basis of 6-month SPI and SPEI values, are developed using the copula approach for each region. Analysis of multi-RCM ensemble-averaged projected changes to mean and selected return levels of drought characteristics show increases over the southern and south-western parts of the study area. Based on bi- and trivariate joint occurrence probabilities of drought characteristics, the southern regions along with the central regions are found highly drought vulnerable, followed by the southwestern and southeastern regions. Compared to the SPI-based analysis, the results based on SPEI suggest drier conditions over many regions in the future, indicating potential effects of rising temperatures on drought risks. These projections will be useful in the development of appropriate adaptation strategies for the water and agricultural sectors, which play an important role in the economy of the study area.

This study assesses projected changes to drought characteristics in Alberta, Saskatchewan and Manitoba, the prairie provinces of Canada using a multi-Regional Climate Model (RCM) ensemble available through the North American Regional Climate Change Assessment Program. Simulations considered include those performed with six RCMs for the 1981-2003 period driven by National Centre for Environmental Prediction reanalysis II and by four Atmosphere-Ocean General Circulation Models for the 1970-1999 and 2041-2070 periods (11 current-to-future period simulation pairs). Drought characteristics are extracted using two drought indices, namely Standardized Precipitation Index (SPI), which is solely based on precipitation, and the Standardized Precipitation Evapotranspiration Index (SPEI), which is based on both precipitation and temperature in the form of evapotranspiration. Regional frequency analysis is used to project changes to selected 20- and 50-yr regional return levels of drought for fifteen homogeneous regions. In addition, multivariate analyses of drought characteristics, derived on the basis of SPI and SPEI values of six month time scale, are developed using the copula approach for each region. Results reveal that analysis of multi-RCM ensemble-averaged projected changes to drought characteristics and various return levels of drought characteristics show increases over the southern, western and eastern parts of the study area. Based on bi- and trivariate joint occurrence probabilities of drought characteristics, the southern regions along with the central regions are found highly drought vulnerable, followed by southwestern and southeastern regions. These projections will be useful in the development of appropriate adaptation strategies for the water and agricultural sectors, which play an important role in the economy of the study area.

A peptide-basedensemble for the detection of cyanide ions in 100% aqueous solutions was designed on the basis of the copper binding motif. 7-Nitro-2,1,3-benzoxadiazole-labeled tripeptide (NBD-SSH, NBD-SerSerHis) formed the ensemble with Cu(2+), leading to a change in the color of the solution from yellow to orange and a complete decrease of fluorescence emission. The ensemble (NBD-SSH-Cu(2+)) sensitively and selectively detected a low concentration of cyanide ions in 100% aqueous solutions by a colorimetric change as well as a fluorescent change. The addition of cyanide ions instantly removed Cu(2+) from the ensemble (NBD-SSH-Cu(2+)) in 100% aqueous solutions, resulting in a color change of the solution from orange to yellow and a "turn-on" fluorescent response. The detection limits for cyanide ions were lower than the maximum allowable level of cyanide ions in drinking water set by the World Health Organization. The peptide-basedensemble system is expected to be a potential and practical way for the detection of submicromolar concentrations of cyanide ions in 100% aqueous solutions. PMID:26320594

Mimicking receptor flexibility during receptor-ligand binding is a challenging task in computational drug design since it is associated with a large increase in the conformational search space. In the present study, we have devised an in silico design strategy incorporating receptor flexibility in virtual screening to identify potential lead compounds as inhibitors for flexible proteins. We have considered BACE1 (β-secretase), a key target protease from a therapeutic perspective for Alzheimer's disease, as the highly flexible receptor. The protein undergoes significant conformational transitions from open to closed form upon ligand binding, which makes it a difficult target for inhibitor design. We have designed a hybrid structure-activity model containing both ligand based descriptors and energetic descriptors obtained from molecular docking based on a dataset of structurally diverse BACE1 inhibitors. An ensemble of receptor conformations have been used in the docking study, further improving the prediction ability of the model. The designed model that shows significant prediction ability judged by several statistical parameters has been used to screen an in house developed 3-D structural library of 731 phytochemicals. 24 highly potent, novel BACE1 inhibitors with predicted activity (Ki) ≤ 50 nM have been identified. Detailed analysis reveals pharmacophoric features of these novel inhibitors required to inhibit BACE1. PMID:25088750

Drought prediction from monthly to seasonal time scales is of critical importance to disaster mitigation, agricultural planning, and multi-purpose reservoir management. Starting in December 2012, NOAA Climate Prediction Center (CPC) has been providing operational Standardized Precipitation Index (SPI) Outlooks using the North American Multi-Model Ensemble (NMME) forecasts, to support CPC's monthly drought outlooks and briefing activities. The current NMME system consists of six model forecasts from U.S. and Canada modeling centers, including the CFSv2, CM2.1, GEOS-5, CCSM3.0, CanCM3, and CanCM4 models. In this study, we conduct an assessment of the predictive skill of meteorological drought using real-time NMME forecasts for the period from May 2012 to May 2014. The ensemble SPI forecasts are the equally weighted mean of the six model forecasts. Two performance measures, the anomaly correlation coefficient and root-mean-square errors against the observations, are used to evaluate forecast skill.Similar to the assessment based on NMME retrospective forecasts, predictive skill of monthly-mean precipitation (P) forecasts is generally low after the second month and errors vary among models. Although P forecast skill is not large, SPI predictive skill is high and the differences among models are small. The skill mainly comes from the P observations appended to the model forecasts. This factor also contributes to the similarity of SPI prediction among the six models. Still, NMME SPI ensemble forecasts have higher skill than those based on individual models or persistence, and the 6-month SPI forecasts are skillful out to four months. The three major drought events occurred during the 2012-2014 period, the 2012 Central Great Plains drought, the 2013 Upper Midwest flash drought, and 2013-2014 California drought, are used as examples to illustrate the system's strength and limitations. For precipitation-driven drought events, such as the 2012 Central Great Plains drought

The immune system is a tight network of different types of cells and molecules. The coordinated action of these elements mounts a precise immune response against tumor cells. However, these cells present several escape mechanisms, leading to tumor progression. This paper shows several cellular and molecular events involved in the regulation of the immune response against tumor cells. The interaction of several molecules such as MHC, TcR, adhesins, tumor antigens and cytokines are discussed, as well as the most recent knowledge about escape mechanisms and immunotherapy. PMID:7502157

In order to analyze the effect of engine vibration on cab noise of construction machinery in multi-frequency bands, a new method based on ensemble empirical mode decomposition (EEMD) and spectral correlation analysis is proposed. Firstly, the intrinsic mode functions (IMFs) of vibration and noise signals were obtained by EEMD method, and then the IMFs which have the same frequency bands were selected. Secondly, we calculated the spectral correlation coefficients between the selected IMFs, getting the main frequency bands in which engine vibration has significant impact on cab noise. Thirdly, the dominated frequencies were picked out and analyzed by spectral analysis method. The study result shows that the main frequency bands and dominated frequencies in which engine vibration have serious impact on cab noise can be identified effectively by the proposed method, which provides effective guidance to noise reduction of construction machinery.

This paper describes the structure of dynamic neuronal ensembles (DNEs). DNEs represent a new paradigm for learning, based on biological neural networks that use variable structures. We present a computational neural element that demonstrates biological neuron functionality such as neurotransmitter feedback absolute refractory period and multiple output potentials. More specifically, we will develop a network of neural elements that have the ability to dynamically strengthen, weaken, add and remove interconnections. We demonstrate that the DNE is capable of performing dynamic modifications to neuron connections and exhibiting biological neuron functionality. In addition to its applications for learning, DNEs provide an excellent environment for testing and analysis of biological neural systems. An example of habituation and hyper-sensitization in biological systems, using a neural circuit from a snail is presented and discussed. This paper provides an insight into the DNE paradigm using models developed and simulated in DEVS.

Coordination polymers serving as molecular magnetic refrigerants have been attracting great interest. In particular, coordination cluster compounds that demonstrate their apparent advantages on cryogenic magnetic refrigerants have attracted more attention in the last five years. Herein, we mainly focus on depicting aspects of syntheses, structures, and magnetothermal properties of coordination clusters that serve as magnetic refrigerants on account of the magnetocaloric effect. The documented molecular magnetic refrigerants are classified into two primary categories according to the types of metal centers, namely, homo- and heterometallic clusters. Every section is further divided into several subgroups based on the metal nuclearity and their dimensionalities, including discrete molecular clusters and those with extended structures constructed from molecular clusters. The objective is to present a rough overview of recent progress in coordination-cluster-basedmolecular magnetic refrigerants and provide a tutorial for researchers who are interested in the field. PMID:27381662

Cells generate and experience mechanical forces that may shape tissues and regulate signaling pathways in a variety of physiological or pathological situations. How forces propagate and transduce signals at the molecular level is poorly understood. The advent of FRET-basedMolecular Tension Microscopy now allows to achieve mechanical force measurements at a molecular scale with molecular specificity in situ, and thereby better understand the mechanical architecture of cells and tissues, and mechanotransduction pathways. In this review, we will first expose the basic principles of FRET-based MTM and its various incarnations. We will describe different ways of measuring FRET, their advantages and drawbacks. Then, throughout the range of proteins of interest, cells and organisms to which it has been applied, we will review the tests developed to validate the approach, how molecular tension was related to cell functions, and conclude with possible developments and offshoots. PMID:26210398

A signal processing methodology is proposed in this paper for effective reconstruction of ultrasonic signals in coarse grained high scattering austenitic stainless steel. The proposed methodology is comprised of the Ensemble Empirical Mode Decomposition (EEMD) processing of ultrasonic signals and application of signal minimisation algorithm on selected Intrinsic Mode Functions (IMFs) obtained by EEMD. The methodology is applied to ultrasonic signals obtained from austenitic stainless steel specimens of different grain size, with and without defects. The influence of probe frequency and data length of a signal on EEMD decomposition is also investigated. For a particular sampling rate and probe frequency, the same range of IMFs can be used to reconstruct the ultrasonic signal, irrespective of the grain size in the range of 30-210 μm investigated in this study. This methodology is successfully employed for detection of defects in a 50mm thick coarse grain austenitic stainless steel specimens. Signal to noise ratio improvement of better than 15 dB is observed for the ultrasonic signal obtained from a 25 mm deep flat bottom hole in 200 μm grain size specimen. For ultrasonic signals obtained from defects at different depths, a minimum of 7 dB extra enhancement in SNR is achieved as compared to the sum of selected IMF approach. The application of minimisation algorithm with EEMD processed signal in the proposed methodology proves to be effective for adaptive signal reconstruction with improved signal to noise ratio. This methodology was further employed for successful imaging of defects in a B-scan. PMID:25488024

Face recognition algorithms are generally trained for matching high-resolution images and they perform well for similar resolution test data. However, the performance of such systems degrades when a low-resolution face image captured in unconstrained settings, such as videos from cameras in a surveillance scenario, are matched with high-resolution gallery images. The primary challenge, here, is to extract discriminating features from limited biometric content in low-resolution images and match it to information rich high-resolution face images. The problem of cross-resolution face matching is further alleviated when there is limited labeled positive data for training face recognition algorithms. In this paper, the problem of cross-resolution face matching is addressed where low-resolution images are matched with high-resolution gallery. A co-transfer learning framework is proposed, which is a cross-pollination of transfer learning and co-training paradigms and is applied for cross-resolution face matching. The transfer learning component transfers the knowledge that is learnt while matching high-resolution face images during training to match low-resolution probe images with high-resolution gallery during testing. On the other hand, co-training component facilitates this transfer of knowledge by assigning pseudolabels to unlabeled probe instances in the target domain. Amalgamation of these two paradigms in the proposed ensemble framework enhances the performance of cross-resolution face recognition. Experiments on multiple face databases show the efficacy of the proposed algorithm and compare with some existing algorithms and a commercial system. In addition, several high profile real-world cases have been used to demonstrate the usefulness of the proposed approach in addressing the tough challenges. PMID:25314702

Carbon Monoxide is a key component in tropospheric chemistry. It plays an important role by affecting the oxidative capacity through its reaction with OH and being a precursor of tropospheric ozone. One year of multispectral retrievals of CO partial columns obtained from the MOPITT instrument have been assimilated into the Community Atmosphere Model with Chemistry (CAM-Chem). The assimilation is carried out using an Ensemble Adjustment Kalman Filter algorithm within the Data Assimilation Research Testbed (DART) package. Two assimilation experiments have been performed: 1) assimilation of meteorological observations and 2) joint assimilation of meteorological observations and MOPITT CO. We first evaluate the assimilation performance by investigating skill scores and other statistics for the two experiments, and comparing to independent CO datasets such as surface (WDCGG), aircraft (MOZAIC-IAGOS), and FTS (NDACC). Our results clearly demonstrate an overall improvement for spatio-temporal magnitude and variability in representing CO abundance in CAM-Chem. We then investigate the response of CAM-Chem to changes in CO fields (via CO assimilation) focusing mainly on the oxidative capacity (i.e., OH distribution, methane lifetime) and CO chemical production and loss (i.e., regional to global budget). This is carried out by analyzing the mean 6-hourly forecast adjustments as reflected between the two experiments. We show that changes in CO directly impact OH abundance, with subsequent non-linear responses in CO chemical production (CO from methane and VOCs) and CO loss. This is clearly evident in NOx-limited regions (e.g., Southern Hemisphere, remote sites). Such analysis has direct implications on the consistencies in inverse modeling estimates of CO sources through improved representation of chemical response (including full chemistry) in atmospheric chemistry models and through multi-species constraints.

In spite of the critical role of river discharge in land surface hydrology, global gauging networks are sparse and even have been in decline. Over the past decade, researchers have been trying to better estimate river discharge using remote sensing techniques to complement the existing in-situ gage networks. The upcoming Surface Water and Ocean Topography (SWOT) mission will directly provide simultaneous spatial mapping of inundation area (A) and inland water surface elevation (WSE) data (i.e., river, lakes, wetlands, and reservoirs), both temporally (dh/dt) and spatially (dh/dx), with the Ka-band Radar INterferometer (KaRIN). With these observations, the SWOT mission will provide the measurements of water storage changes in terrestrial surface water bodies. However, because the SWOT will measure WSE, not the true depth to the river bottom, the cross section channel bathymetry will not be fully measured. Thus, estimating bathymetry is important in order to produce accurate estimates of river discharge from the SWOT data. In previous work, a local ensemble Kalman filter (LEnKF) was used to estimate the river bathymetry, given synthetic SWOT observations and WSE predictions by the LISFLOOD-FP hydrodynamic model. However, the accuracy of river bathymetry was highly affected by the severe bias of boundary inflows due to the mathematical relationship for the assimilation. The bias in model is not accounted for the data assimilation. Here, we focus on correcting the forecast bias for the LEnKF scheme to result in the improvement of river bathymetry estimates. To correct the forecast bias and improve the accuracy, we combined the LEnKF scheme with continuity and momentum equations. To evaluate the reanalysis approach, the error of bathymetry was evaluated by comparing with the true value and previous work. In addition, we examined the sensitivity to the bathymetry estimate for estimating the river discharge.

In recent years, the increasing level of volatility of the gold price has received the increasing level of attention from the academia and industry alike. Due to the complexity and significant fluctuations observed in the gold market, however, most of current approaches have failed to produce robust and consistent modeling and forecasting results. Ensemble Empirical Model Decomposition (EEMD) and Independent Component Analysis (ICA) are novel data analysis methods that can deal with nonlinear and non-stationary time series. This study introduces a new methodology which combines the two methods and applies it to gold price analysis. This includes three steps: firstly, the original gold price series is decomposed into several Intrinsic Mode Functions (IMFs) by EEMD. Secondly, IMFs are further processed with unimportant ones re-grouped. Then a new set of data called Virtual Intrinsic Mode Functions (VIMFs) is reconstructed. Finally, ICA is used to decompose VIMFs into statistically Independent Components (ICs). The decomposition results reveal that the gold price series can be represented by the linear combination of ICs. Furthermore, the economic meanings of ICs are analyzed and discussed in detail, according to the change trend and ICs' transformation coefficients. The analyses not only explain the inner driving factors and their impacts but also conduct in-depth analysis on how these factors affect gold price. At the same time, regression analysis has been conducted to verify our analysis. Results from the empirical studies in the gold markets show that the EEMD-ICA serve as an effective technique for gold price analysis from a new perspective.

This study conducted 24- to 72-h multi-model ensemble forecasts to explore the tracks and intensities (central mean sea level pressure) of tropical cyclones (TCs). Forecast data for the northwestern Pacific basin in 2010 and 2011 were selected from the China Meteorological Administration, European Centre for Medium-Range Weather Forecasts (ECMWF), Japan Meteorological Agency, and National Centers for Environmental Prediction datasets of the Observing System Research and Predictability Experiment Interactive Grand Global Ensemble project. The Kalman Filter was employed to conduct the TC forecasts, along with the ensemble mean and super-ensemble for comparison. The following results were obtained: (1) The statistical-dynamic Kalman Filter, in which recent observations are given more importance and model weighting coefficients are adjusted over time, produced quite different results from that of the super-ensemble. (2) The Kalman Filter reduced the TC mean absolute track forecast error by approximately 50, 80 and 100 km in the 24-, 48- and 72-h forecasts, respectively, compared with the best individual model (ECMWF). Also, the intensity forecasts were improved by the Kalman Filter to some extent in terms of average intensity deviation (AID) and correlation coefficients with reanalysis intensity data. Overall, the Kalman Filter technique performed better compared to multi-models, the ensemble mean, and the super-ensemble in 3-day forecasts. The implication of this study is that this technique appears to be a very promising statistical-dynamic method for multi-model ensemble forecasts of TCs.

We present the application of interactive three-dimensional (3-D) visualization of ensemble weather predictions to forecasting warm conveyor belt situations during aircraft-based atmospheric research campaigns. Motivated by forecast requirements of the T-NAWDEX-Falcon 2012 (THORPEX - North Atlantic Waveguide and Downstream Impact Experiment) campaign, a method to predict 3-D probabilities of the spatial occurrence of warm conveyor belts (WCBs) has been developed. Probabilities are derived from Lagrangian particle trajectories computed on the forecast wind fields of the European Centre for Medium Range Weather Forecasts (ECMWF) ensemble prediction system. Integration of the method into the 3-D ensemble visualization tool Met.3D, introduced in the first part of this study, facilitates interactive visualization of WCB features and derived probabilities in the context of the ECMWF ensemble forecast. We investigate the sensitivity of the method with respect to trajectory seeding and grid spacing of the forecast wind field. Furthermore, we propose a visual analysis method to quantitatively analyse the contribution of ensemble members to a probability region and, thus, to assist the forecaster in interpreting the obtained probabilities. A case study, revisiting a forecast case from T-NAWDEX-Falcon, illustrates the practical application of Met.3D and demonstrates the use of 3-D and uncertainty visualization for weather forecasting and for planning flight routes in the medium forecast range (3 to 7 days before take-off).

To overcome the pseudoergodicity problem, conformational sampling can be accelerated via generalized ensemble methods, e.g., through the realization of random walks along prechosen collective variables, such as spatial order parameters, energy scaling parameters, or even system temperatures or pressures, etc. As usually observed, in generalized ensemble simulations, hidden barriers are likely to exist in the space perpendicular to the collective variable direction and these residual free energy barriers could greatly abolish the sampling efficiency. This sampling issue is particularly severe when the collective variable is defined in a low-dimension subset of the target system; then the "Hamiltonian lagging" problem, which reveals the fact that necessary structural relaxation falls behind the move of the collective variable, may be likely to occur. To overcome this problem in equilibrium conformational sampling, we adopted the orthogonal space random walk (OSRW) strategy, which was originally developed in the context of free energy simulation [L. Zheng, M. Chen, and W. Yang, Proc. Natl. Acad. Sci. U.S.A. 105, 20227 (2008)]. Thereby, generalized ensemble simulations can simultaneously escape both the explicit barriers along the collective variable direction and the hidden barriers that are strongly coupled with the collective variable move. As demonstrated in our model studies, the present OSRW based generalized ensemble treatments show improved sampling capability over the corresponding classical generalized ensemble treatments. PMID:19548709

At present, there is no consensus understanding on the origin of photoluminescence of carbon nanoparticles, particularly the so-called carbon dots. Providing comparative analysis of spectroscopic studies in solution and on a single-molecular level, we demonstrate that these particles behave collectively as fixed single dipoles and probably are the quantum emitter entities. Their spectral and lifetime heterogeneity in solutions is explained by variation of the local chemical environment within and around luminescence centers. Hence, the carbon dots possess a unique hybrid combination of fluorescence properties peculiar to dye molecules, their conjugates and semiconductor nanocrystals. It is proposed that their optical properties are due to generation of H-aggregate-type excitonic states with their coherence spreading over the whole nanoparticles. PMID:27399599

At present, there is no consensus understanding on the origin of photoluminescence of carbon nanoparticles, particularly the so-called carbon dots. Providing comparative analysis of spectroscopic studies in solution and on a single-molecular level, we demonstrate that these particles behave collectively as fixed single dipoles and probably are the quantum emitter entities. Their spectral and lifetime heterogeneity in solutions is explained by variation of the local chemical environment within and around luminescence centers. Hence, the carbon dots possess a unique hybrid combination of fluorescence properties peculiar to dye molecules, their conjugates and semiconductor nanocrystals. It is proposed that their optical properties are due to generation of H-aggregate-type excitonic states with their coherence spreading over the whole nanoparticles.

To visualize a bird's-eye view of an ensemble of mitochondrial genome sequences for various species, we recently developed a novel method of mapping a biological sequence ensemble into Three-Dimensional (3D) vector space. First, we represented a biological sequence of a species s by a word-composition vector x(s), where its length [absolute value]x(s)[absolute value] represents the sequence length, and its unit vector x(s)/[absolute value]x(s)[absolute value] represents the relative composition of the K-tuple words through the sequence and the size of the dimension, N=4(K), is the number of all possible words with the length of K. Second, we mapped the vector x(s) to the 3D position vector y(s), based on the two following simple principles: (1) [absolute value]y(s)[absolute value]=[absolute value]x(s)[absolute value] and (2) the angle between y(s) and y(t) maximally correlates with the angle between x(s) and x(t). The mitochondrial genome sequences for 311 species, including 177 Animalia, 85 Fungi and 49 Green plants, were mapped into 3D space by using K=7. The mapping was successful because the angles between vectors before and after the mapping highly correlated with each other (correlation coefficients were 0.92-0.97). Interestingly, the Animalia kingdom is distributed along a single arc belt (just like the Milky Way on a Celestial Globe), and the Fungi and Green plant kingdoms are distributed in a similar arc belt. These two arc belts intersect at their respective middle regions and form a cross structure just like a jet aircraft fuselage and its wings. This new mapping method will allow researchers to intuitively interpret the visual information presented in the maps in a highly effective manner. PMID:22776549

Algorithms for identifying and tracking extra-tropical cyclonic features that were initially developed at the Met Office have now been applied to the ECMWF ensemble, and expanded. A wide range of real-time web-based forecast products are being created from the output. These products assist with day to day forecasting, and in particular can provide alerts regarding the likelihood of extreme weather in the next 15 days. We are relying here on the fact that cyclonic features are the primary synoptic-scale cause of extreme rainfall over large catchments, extreme snowfall and extreme windstorms. Moreover, prolonged heat in summer, protracted cold in winter and drought can all relate to the absence of cyclonic activity, and so for anticipating these hazards too the products can also be a useful tool. After providing a brief overview of how the identification and tracking work, the bulk of the talk will illustrate the new products that are becoming available, and will show, with examples from the last 12 months, how these can be used to forecast extreme weather. Products include storm-track strike probabilities for different thresholds, feature-specific plume diagrams, objective front spaghetti plots, and cyclonic feature 'dalmatian' plots showing various cyclone attributes. There are also clickable links to quickly visualise, in synoptic chart format, those ensemble members that depict the most extreme evolutions. There is potential to develop similar 'products', in an aggregated way, from reanalysis data, and from climate model simulations of present and future climate. Intercomparison of data from these three categories can then hopefully provide policy-makers with a clearcut reference point for anticipating changes in extremes. This opportunity will be discussed in the context of the IMILAST project.

Recent droughts and the continuing water wars between the states of Georgia, Alabama and Florida have made agricultural producers more aware of the importance of managing their irrigation systems more efficiently. Many southeastern states are beginning to consider laws that will require monitoring and regulation of water used for irrigation. Recently, Georgia suspended issuing irrigation permits in some areas of the southwestern portion of the state to try and limit the amount of water being used in irrigation. However, even in southern Georgia, which receives on average between 23 and 33 inches of rain during the growing season, irrigation can significantly impact crop yields. In fact, studies have shown that when fields do not receive rainfall at the most critical stages in the life of cotton, yield for irrigated fields can be up to twice as much as fields for non-irrigated cotton. This leads to the motivation for this study, which is to produce a forecast tool that will enable producers to make more efficient irrigation management decisions. We will use the ECMWF (European Centre for Medium-Range Weather Forecasts) vars EPS (Ensemble Prediction System) model precipitation forecasts for the grid points included in the 1◦ x 1◦ lat/lon square surrounding the point of interest. We will then apply q-to-q bias corrections to the forecasts. Once we have applied the bias corrections, we will use the check-book method of irrigation scheduling to determine the probability of receiving the required amount of rainfall for each week of the growing season. These forecasts will be used during a field trial conducted at the CM Stripling Irrigation Research Park in Camilla, Georgia. This research will compare differences in yield and water use among the standard checkbook method of irrigation, which uses no precipitation forecast knowledge, the weather.com forecast, a dry land plot, and the ensemble-based forecasts mentioned above.

Ensemble run simulations are becoming increasingly widespread. In this work, we couple particle advection with pathline analysis to visualize and reveal the differences among the flow fields of ensemble runs. Our method first constructs a variation field using a Lagrangian-based distance metric. The variation field characterizes the variation between vector fields of the ensemble runs, by extracting and visualizing the variation of pathlines within ensemble. Parallelism in a MapReduce style is leveraged to handle data processing and computing at scale. Using our prototype system, we demonstrate how scientists can effectively explore and investigate differences within ensemble simulations. PMID:24051840

Satellite imagery has proved useful for obtaining information on water levels in flood events. Microwave frequencies are generally more useful for flood detection than visible-band sensors because of their all-weather day-night capability. Specifically, the future SWOT mission, with Ka-band interferometry, will be able to provide direct Water Level Observations (WLOs), and current and future Synthetic Aperture Radar (SAR) sensors can provide information of flood extent, which, when intersected with a Digital Elevation Model (DEM) of the floodplain, provides indirect WLOs. By either means, satellite-based WLOs can be assimilated into a hydrodynamic model to decrease forecast uncertainty and further to estimate river discharge into the flooded domain and model parameters. However, studies on assimilation of real satellite-based WLOs into flood models are still sparse. For 2D high resolution flood modelling, the data assimilation (DA) techniques based on Monte Carlo implementations of the Kalman filter (Ensemble Kalman Filters; EKFs) provide a minimum variance estimator. The performance of ensemble techniques depends on the quality of both the observations to be assimilated and the correctness of the several covariance matrices involved, which serve to convey the observation information (innovations) to elsewhere in the studied domain. Here we evaluate how some of the particularities of flood models may hamper the straightforward implementation of EKFs for operational assimilation of satellite-based WLOs. Specifically, the filter may become hyper-sensitive to observations in minor tributaries, and the specific network connectivity of braided flooded domains (e.g. converging tributaries or urban domains) indicate that straightforward spatial localization (Euclidean distance-based covariance moderation) is just not sound. Here we discuss these problems by assimilating real WLOs obtained from a 7-image sequence from the COSMO-Skymed (CSK) constellation X-band SAR, in a

Based on the specific binding of Cu(2+) ions to the 11-mercaptoundecanoic acid (11-MUA)-protected AuNCs with intense orange-red emission, we have proposed and constructed a novel fluorescent nanomaterials-metal ions ensemble at a nonfluorescence off-state. Subsequently, an AuNCs@11-MUA-Cu(2+) ensemble-based fluorescent chemosensor, which is amenable to convenient, sensitive, selective, turn-on and real-time assay of acetylcholinesterase (AChE), could be developed by using acetylthiocholine (ATCh) as the substrate. Herein, the sensing ensemble solution exhibits a marvelous fluorescent enhancement in the presence of AChE and ATCh, where AChE hydrolyzes its active substrate ATCh into thiocholine (TCh), and then TCh captures Cu(2+) from the ensemble, accompanied by the conversion from fluorescence off-state to on-state of the AuNCs. The AChE activity could be detected less than 0.05 mU/mL within a good linear range from 0.05 to 2.5 mU/mL. Our proposed fluorescence assay can be utilized to evaluate the AChE activity quantitatively in real biological sample, and furthermore to screen the inhibitor of AChE. As far as we know, the present study has reported the first analytical proposal for sensing AChE activity in real time by using a fluorescent nanomaterials-Cu(2+) ensemble or focusing on the Cu(2+)-triggered fluorescence quenching/recovery. This strategy paves a new avenue for exploring the biosensing applications of fluorescent AuNCs, and presents the prospect of AuNCs@11-MUA-Cu(2+) ensemble as versatile enzyme activity assay platforms by means of other appropriate substrates/analytes. PMID:26141104

Hierarchical self-assembly centered on metallacyclic scaffolds greatly facilitates the construction of mechanically interlocked structures. The formation of two [3]catenanes and one [4]molecular necklace is presented by utilizing the orthogonality of coordination-driven self-assembly and crown ether-based cryptand/paraquat derivative complexation. The threaded [3]catenanes and [4]molecular necklace were fabricated by using ten and nine total molecular components, respectively, from four and three unique species in solution, respectively. In all cases single supramolecular ensembles were obtained, attesting to the high degree of structural complexity made possible via self-assembly approaches. PMID:25996900

A detailed characterisation of the molecular determinants of membrane binding by α-synuclein (αS), a 140-residue protein whose aggregation is associated with Parkinson’s disease, is of fundamental significance to clarify the manner in which the balance between functional and dysfunctional processes are regulated for this protein. Despite its biological relevance, the structural nature of the membrane-bound state αS remains elusive, in part because of the intrinsically dynamic nature of the protein and also because of the difficulties in studying this state in a physiologically relevant environment. In the present study we have used solid-state NMR and restrained MD simulations to refine structure and topology of the N-terminal region of αS bound to the surface of synaptic-like membranes. This region has fundamental importance in the binding mechanism of αS as it acts as to anchor the protein to lipid bilayers. The results enabled the identification of the key elements for the biological properties of αS in its membrane-bound state. PMID:27273030

A detailed characterisation of the molecular determinants of membrane binding by α-synuclein (αS), a 140-residue protein whose aggregation is associated with Parkinson's disease, is of fundamental significance to clarify the manner in which the balance between functional and dysfunctional processes are regulated for this protein. Despite its biological relevance, the structural nature of the membrane-bound state αS remains elusive, in part because of the intrinsically dynamic nature of the protein and also because of the difficulties in studying this state in a physiologically relevant environment. In the present study we have used solid-state NMR and restrained MD simulations to refine structure and topology of the N-terminal region of αS bound to the surface of synaptic-like membranes. This region has fundamental importance in the binding mechanism of αS as it acts as to anchor the protein to lipid bilayers. The results enabled the identification of the key elements for the biological properties of αS in its membrane-bound state. PMID:27273030

The development of metal oxide-basedmolecular wires is important for fundamental research and potential practical applications. However, examples of these materials are rare. Here we report an all-inorganic transition metal oxide molecular wire prepared by disassembly of larger crystals. The wires are comprised of molybdenum(VI) with either tellurium(IV) or selenium(IV): {(NH4)2[XMo6O21]}n (X=tellurium(IV) or selenium(IV)). The ultrathin molecular nanowires with widths of 1.2 nm grow to micrometre-scale crystals and are characterized by single-crystal X-ray analysis, Rietveld analysis, scanning electron microscopy, X-ray photoelectron spectroscopy, ultraviolet-visible spectroscopy, thermal analysis and elemental analysis. The crystals can be disassembled into individual molecular wires through cation exchange and subsequent ultrasound treatment, as visualized by atomic force microscopy and transmission electron microscopy. The ultrathin molecular wire-based material exhibits high activity as an acid catalyst, and the band gap of the molecular wire-based crystal is tunable by heat treatment. PMID:26139011

The development of metal oxide-basedmolecular wires is important for fundamental research and potential practical applications. However, examples of these materials are rare. Here we report an all-inorganic transition metal oxide molecular wire prepared by disassembly of larger crystals. The wires are comprised of molybdenum(VI) with either tellurium(IV) or selenium(IV): {(NH4)2[XMo6O21]}n (X=tellurium(IV) or selenium(IV)). The ultrathin molecular nanowires with widths of 1.2 nm grow to micrometre-scale crystals and are characterized by single-crystal X-ray analysis, Rietveld analysis, scanning electron microscopy, X-ray photoelectron spectroscopy, ultraviolet–visible spectroscopy, thermal analysis and elemental analysis. The crystals can be disassembled into individual molecular wires through cation exchange and subsequent ultrasound treatment, as visualized by atomic force microscopy and transmission electron microscopy. The ultrathin molecular wire-based material exhibits high activity as an acid catalyst, and the band gap of the molecular wire-based crystal is tunable by heat treatment. PMID:26139011

Weblogs have greatly changed the communication ways of mankind. Affective analysis of blog posts is found valuable for many applications such as text-to-speech synthesis or computer-assisted recommendation. Traditional emotion recognition in text based on single-label classification can not satisfy higher requirements of affective computing. In this paper, the automatic identification of sentence emotion in weblogs is modeled as a multi-label text categorization task. Experiments are carried out on 12273 blog sentences from the Chinese emotion corpus Ren_CECps with 8-dimension emotion annotation. An ensemble algorithm RAKEL is used to recognize dominant emotions from the writer's perspective. Our emotion feature using detailed intensity representation for word emotions outperforms the other main features such as the word frequency feature and the traditional lexicon-based feature. In order to deal with relatively complex sentences, we integrate grammatical characteristics of punctuations, disjunctive connectives, modification relations and negation into features. It achieves 13.51% and 12.49% increases for Micro-averaged F1 and Macro-averaged F1 respectively compared to the traditional lexicon-based feature. Result shows that multiple-dimension emotion representation with grammatical features can efficiently classify sentence emotion in a multi-label problem.

Theories about generalization error with ensembles are mainly based on the diversity concept, which promotes resorting to many members of different properties to support mutually agreeable decisions. Kuncheva (2004) proposed the Multi Level Diversity Model (MLDM) to promote diversity in model ensembles, combining different data subsets, input subsets, models, parameters, and including a combiner level in order to optimize the final ensemble. This work tests the hypothesis about the minimisation of the generalization error with ensembles of Neural Network (NN) structures. We used the MLDM to evaluate two different scenarios: (i) ensembles from a same NN architecture, and (ii) a super-ensemble built by a combination of sub-ensembles of many NN architectures. The time series used correspond to the 12 basins of the MOdel Parameter Estimation eXperiment (MOPEX) project that were used by Duan et al. (2006) and Vos (2013) as benchmark. Six architectures are evaluated: FeedForward NN (FFNN) trained with the Levenberg Marquardt algorithm (Hagan et al., 1996), FFNN trained with SCE (Duan et al., 1993), Recurrent NN trained with a complex method (Weins et al., 2008), Dynamic NARX NN (Leontaritis and Billings, 1985), Echo State Network (ESN), and leak integrator neuron (L-ESN) (Lukosevicius and Jaeger, 2009). Each architecture performs separately an Input Variable Selection (IVS) according to a forward stepwise selection (Anctil et al., 2009) using mean square error as objective function. Post-processing by Predictor Stepwise Selection (PSS) of the super-ensemble has been done following the method proposed by Brochero et al. (2011). IVS results showed that the lagged stream flow, lagged precipitation, and Standardized Precipitation Index (SPI) (McKee et al., 1993) were the most relevant variables. They were respectively selected as one of the firsts three selected variables in 66, 45, and 28 of the 72 scenarios. A relationship between aridity index (Arora, 2002) and NN

Theories about generalization error with ensembles are mainly based on the diversity concept, which promotes resorting to many members of different properties to support mutually agreeable decisions. Kuncheva (2004) proposed the Multi Level Diversity Model (MLDM) to promote diversity in model ensembles, combining different data subsets, input subsets, models, parameters, and including a combiner level in order to optimize the final ensemble. This work tests the hypothesis about the minimisation of the generalization error with ensembles of Neural Network (NN) structures. We used the MLDM to evaluate two different scenarios: (i) ensembles from a same NN architecture, and (ii) a super-ensemble built by a combination of sub-ensembles of many NN architectures. The time series used correspond to the 12 basins of the MOdel Parameter Estimation eXperiment (MOPEX) project that were used by Duan et al. (2006) and Vos (2013) as benchmark. Six architectures are evaluated: FeedForward NN (FFNN) trained with the Levenberg Marquardt algorithm (Hagan et al., 1996), FFNN trained with SCE (Duan et al., 1993), Recurrent NN trained with a complex method (Weins et al., 2008), Dynamic NARX NN (Leontaritis and Billings, 1985), Echo State Network (ESN), and leak integrator neuron (L-ESN) (Lukosevicius and Jaeger, 2009). Each architecture performs separately an Input Variable Selection (IVS) according to a forward stepwise selection (Anctil et al., 2009) using mean square error as objective function. Post-processing by Predictor Stepwise Selection (PSS) of the super-ensemble has been done following the method proposed by Brochero et al. (2011). IVS results showed that the lagged stream flow, lagged precipitation, and Standardized Precipitation Index (SPI) (McKee et al., 1993) were the most relevant variables. They were respectively selected as one of the firsts three selected variables in 66, 45, and 28 of the 72 scenarios. A relationship between aridity index (Arora, 2002) and NN

Recent experiments have shown that when specific biomolecular interactions are confined to one surface of a microcantilever beam, changes in intermolecular nanomechanical forces provide sufficient differential torque to bend the cantilever beam. This has been used to detect single base pair mismatches during DNA hybridization, as well as prostate specific antigen (PSA) at concentrations and conditions that are clinically relevant for prostate cancer diagnosis. Since cantilever motion originates from free energy change induced by specific biomolecular binding, this technique is now offering a common platform for label-free quantitative analysis of protein-protein binding, DNA hybridization DNA-protein interactions, and in general receptor-ligandmore » interactions. Current work is focused on developing “universal microarrays” of microcantilever beams for high-throughput multiplexed bioassays.« less

Many methods of protein structure generation such as NMR-based solution structure determination and template-based modeling do not produce a single model, but an ensemble of models consistent with the available information. Current strategies for comparing ensembles lose information because they use only a single representative structure. Here, we describe the ENSEMBLATOR and its novel strategy to directly compare two ensembles containing the same atoms to identify significant global and local backbone differences between them on per-atom and per-residue levels, respectively. The ENSEMBLATOR has four components: eePREP (ee for ensemble-ensemble), which selects atoms common to all models; eeCORE, which identifies atoms belonging to a cutoff-distance dependent common core; eeGLOBAL, which globally superimposes all models using the defined core atoms and calculates for each atom the two intraensemble variations, the interensemble variation, and the closest approach of members of the two ensembles; and eeLOCAL, which performs a local overlay of each dipeptide and, using a novel measure of local backbone similarity, reports the same four variations as eeGLOBAL. The combination of eeGLOBAL and eeLOCAL analyses identifies the most significant differences between ensembles. We illustrate the ENSEMBLATOR's capabilities by showing how using it to analyze NMR ensembles and to compare NMR ensembles with crystal structures provides novel insights compared to published studies. One of these studies leads us to suggest that a "consistency check" of NMR-derived ensembles may be a useful analysis step for NMR-based structure determinations in general. The ENSEMBLATOR 1.0 is available as a first generation tool to carry out ensemble-ensemble comparisons. PMID:26032515

Accurate forecasting of stock market volatility is an important issue in portfolio risk management. In this paper, an ensemble system for stock market volatility is presented. It is composed of three different models that hybridize the exponential generalized autoregressive conditional heteroscedasticity (GARCH) process and the artificial neural network trained with the backpropagation algorithm (BPNN) to forecast stock market volatility under normal, t-Student, and generalized error distribution (GED) assumption separately. The goal is to design an ensemble system where each single hybrid model is capable to capture normality, excess skewness, or excess kurtosis in the data to achieve complementarity. The performance of each EGARCH-BPNN and the ensemble system is evaluated by the closeness of the volatility forecasts to realized volatility. Based on mean absolute error and mean of squared errors, the experimental results show that proposed ensemble model used to capture normality, skewness, and kurtosis in data is more accurate than the individual EGARCH-BPNN models in forecasting the S&P 500 intra-day volatility based on one and five-minute time horizons data.

A de-noising method for electrocardiogram (ECG) based on ensemble empirical mode decomposition (EEMD) and wavelet threshold de-noising theory is proposed in our school. We decomposed noised ECG signals with the proposed method using the EEMD and calculated a series of intrinsic mode functions (IMFs). Then we selected IMFs and reconstructed them to realize the de-noising for ECG. The processed ECG signals were filtered again with wavelet transform using improved threshold function. In the experiments, MIT-BIH ECG database was used for evaluating the performance of the proposed method, contrasting with de-noising method based on EEMD and wavelet transform with improved threshold function alone in parameters of signal to noise ratio (SNR) and mean square error (MSE). The results showed that the ECG waveforms de-noised with the proposed method were smooth and the amplitudes of ECG features did not attenuate. In conclusion, the method discussed in this paper can realize the ECG denoising and meanwhile keep the characteristics of original ECG signal. PMID:25219236

In order to guarantee the stable operation of shearers and promote construction of an automatic coal mining working face, an online cutting pattern recognition method with high accuracy and speed based on Improved Ensemble Empirical Mode Decomposition (IEEMD) and Probabilistic Neural Network (PNN) is proposed. An industrial microphone is installed on the shearer and the cutting sound is collected as the recognition criterion to overcome the disadvantages of giant size, contact measurement and low identification rate of traditional detectors. To avoid end-point effects and get rid of undesirable intrinsic mode function (IMF) components in the initial signal, IEEMD is conducted on the sound. The end-point continuation based on the practical storage data is performed first to overcome the end-point effect. Next the average correlation coefficient, which is calculated by the correlation of the first IMF with others, is introduced to select essential IMFs. Then the energy and standard deviation of the reminder IMFs are extracted as features and PNN is applied to classify the cutting patterns. Finally, a simulation example, with an accuracy of 92.67%, and an industrial application prove the efficiency and correctness of the proposed method. PMID:26528985

Considering the low accuracy of prediction in the positive samples and poor overall classification effects caused by unbalanced sample data of MicroRNA (miRNA) target, we proposes a support vector machine (SVM)-integration of under-sampling and weight (IUSM) algorithm in this paper, an under-sampling based on the ensemble learning algorithm. The algorithm adopts SVM as learning algorithm and AdaBoost as integration framework, and embeds clustering-based under-sampling into the iterative process, aiming at reducing the degree of unbalanced distribution of positive and negative samples. Meanwhile, in the process of adaptive weight adjustment of the samples, the SVM-IUSM algorithm eliminates the abnormal ones in negative samples with robust sample weights smoothing mechanism so as to avoid over-learning. Finally, the prediction of miRNA target integrated classifier is achieved with the combination of multiple weak classifiers through the voting mechanism. The experiment revealed that the SVM-IUSW, compared with other algorithms on unbalanced dataset collection, could not only improve the accuracy of positive targets and the overall effect of classification, but also enhance the generalization ability of miRNA target classifier. PMID:27382743

In order to guarantee the stable operation of shearers and promote construction of an automatic coal mining working face, an online cutting pattern recognition method with high accuracy and speed based on Improved Ensemble Empirical Mode Decomposition (IEEMD) and Probabilistic Neural Network (PNN) is proposed. An industrial microphone is installed on the shearer and the cutting sound is collected as the recognition criterion to overcome the disadvantages of giant size, contact measurement and low identification rate of traditional detectors. To avoid end-point effects and get rid of undesirable intrinsic mode function (IMF) components in the initial signal, IEEMD is conducted on the sound. The end-point continuation based on the practical storage data is performed first to overcome the end-point effect. Next the average correlation coefficient, which is calculated by the correlation of the first IMF with others, is introduced to select essential IMFs. Then the energy and standard deviation of the reminder IMFs are extracted as features and PNN is applied to classify the cutting patterns. Finally, a simulation example, with an accuracy of 92.67%, and an industrial application prove the efficiency and correctness of the proposed method. PMID:26528985

Although compelling evidence suggests that the genetic etiology of complex diseases could be heterogeneous in subphenotype groups, little attention has been paid to phenotypic heterogeneity in genetic association analysis of complex diseases. Simply ignoring phenotypic heterogeneity in association analysis could result in attenuated estimates of genetic effects and low power of association tests if subphenotypes with similar clinical manifestations have heterogeneous underlying genetic etiologies. To facilitate the family-based association analysis allowing for phenotypic heterogeneity, we propose a clustered multiclass likelihood-ratio ensemble (CMLRE) method. The proposed method provides an alternative way to model the complex relationship between disease outcomes and genetic variants. It allows for heterogeneous genetic causes of disease subphenotypes and can be applied to various pedigree structures. Through simulations, we found CMLRE outperformed the commonly adopted strategies in a variety of underlying disease scenarios. We further applied CMLRE to a family-based dataset from the International Consortium to Identify Genes and Interactions Controlling Oral Clefts (ICOC) to investigate the genetic variants and interactions predisposing to subphenotypes of oral clefts. The analysis suggested that two subphenotypes, nonsyndromic cleft lip without palate (CL) and cleft lip with palate (CLP), shared similar genetic etiologies, while cleft palate only (CP) had its own genetic mechanism. The analysis further revealed that rs10863790 (IRF6), rs7017252 (8q24), and rs7078160 (VAX1) were jointly associated with CL/CLP, while rs7969932 (TBK1), rs227731 (17q22), and rs2141765 (TBK1) jointly contributed to CP. PMID:27321816

A Compound fault signal usually contains multiple characteristic signals and strong confusion noise, which makes it difficult to separate week fault signals from them through conventional ways, such as FFT-based envelope detection, wavelet transform or empirical mode decomposition individually. In order to improve the compound faults diagnose of rolling bearings via signals' separation, the present paper proposes a new method to identify compound faults from measured mixed-signals, which is based on ensemble empirical mode decomposition (EEMD) method and independent component analysis (ICA) technique. With the approach, a vibration signal is firstly decomposed into intrinsic mode functions (IMF) by EEMD method to obtain multichannel signals. Then, according to a cross correlation criterion, the corresponding IMF is selected as the input matrix of ICA. Finally, the compound faults can be separated effectively by executing ICA method, which makes the fault features more easily extracted and more clearly identified. Experimental results validate the effectiveness of the proposed method in compound fault separating, which works not only for the outer race defect, but also for the rollers defect and the unbalance fault of the experimental system. PMID:25289644

The scientific community's major conceptual notion of structural biology has recently shifted in emphasis from the classical structure-function paradigm due to the emergence of intrinsically disordered proteins (IDPs). As opposed to their folded cousins, these proteins are defined by the lack of a stable 3D fold and a high degree of inherent structural heterogeneity that is closely tied to their function. Due to their flexible nature, solution techniques such as small-angle X-ray scattering (SAXS), nuclear magnetic resonance (NMR) spectroscopy and fluorescence resonance energy transfer (FRET) are particularly well-suited for characterizing their biophysical properties. Computationally derived structural ensemblesbased on such experimental measurements provide models of the conformational sampling displayed by these proteins, and they may offer valuable insights into the functional consequences of inherent flexibility. The Protein Ensemble Database (http://pedb.vib.be) is the first openly accessible, manually curated online resource storing the ensemble models, protocols used during the calculation procedure, and underlying primary experimental data derived from SAXS and/or NMR measurements. By making this previously inaccessible data freely available to researchers, this novel resource is expected to promote the development of more advanced modelling methodologies, facilitate the design of standardized calculation protocols, and consequently lead to a better understanding of how function arises from the disordered state. PMID:26387108

We present a global chemical data assimilation system using a global atmosphere model, the Community Atmosphere Model (CAM3) with simplified chemistry and the Data Assimilation Research Testbed (DART) assimilation package. DART is a community software facility for assimilation studies using the ensemble Kalman filter approach. Here, we apply the assimilation system to constrain global tropospheric carbon monoxide (CO) by assimilating meteorological observations of temperature and horizontal wind velocity and satellite CO retrievals from the Measurement of Pollution in the Troposphere (MOPITT) satellite instrument. We verify the system performance using independent CO observations taken on board the NSFINCAR C-130 and NASA DC-8 aircrafts during the April 2006 part of the Intercontinental Chemical Transport Experiment (INTEX-B). Our evaluations show that MOPITT data assimilation provides significant improvements in terms of capturing the observed CO variability relative to no MOPITT assimilation (i.e. the correlation improves from 0.62 to 0.71, significant at 99% confidence). The assimilation provides evidence of median CO loading of about 150 ppbv at 700 hPa over the NE Pacific during April 2006. This is marginally higher than the modeled CO with no MOPITT assimilation (-140 ppbv). Our ensemble-based estimates of model uncertainty also show model overprediction over the source region (i.e. China) and underprediction over the NE Pacific, suggesting model errors that cannot be readily explained by emissions alone. These results have important implications for improving regional chemical forecasts and for inverse modeling of CO sources and further demonstrate the utility of the assimilation system in comparing non-coincident measurements, e.g. comparing satellite retrievals of CO with in-situ aircraft measurements. The work described above also brought to light several short-comings of the data assimilation approach for CO profiles. Because of the limited vertical

We used observed climate data, an ensemble of four GCM-RCM combinations (global and regional climate models) and the water balance model mGROWA to estimate present and future groundwater recharge for the intensively-used Thau lagoon catchment in southern France. In addition to a highly resolved soil map, soil moisture distributions obtained from SAR-images (Synthetic Aperture Radar) were used to derive the spatial distribution of soil parameters covering the full simulation domain. Doing so helped us to assess the impact of different soil parameter sources on the modelled groundwater recharge levels. Groundwater recharge was simulated in monthly time steps using the ensemble approach and analysed in its spatial and temporal variability. The soil parameters originating from both sources led to very similar groundwater recharge rates, proving that soil parameters derived from SAR images may replace traditionally used soil maps in regions where soil maps are sparse or missing. Additionally, we showed that the variance in different GCM-RCMs influences the projected magnitude of future groundwater recharge change significantly more than the variance in the soil parameter distributions derived from the two different sources. For the period between 1950 and 2100, climate change impacts based on the climate model ensemble indicated that overall groundwater recharge will possibly show a low to moderate decrease in the Thau catchment. However, as no clear trend resulted from the ensemble simulations, reliable recommendations for adapting the regional groundwater management to changed available groundwater volumes could not be derived. PMID:26190446

Water shortage and climate change are the most important issues of sustainable agricultural and water resources development. Given the importance of water availability in crop production, the present study focused on risk assessment of climate change impact on agricultural water requirement in southwest of Iran, under two emission scenarios (A2 and B1) for the future period (2025-2054). A multi-model ensemble framework based on mean observed temperature-precipitation (MOTP) method and a combined probabilistic approach Long Ashton Research Station-Weather Generator (LARS-WG) and change factor (CF) have been used for downscaling to manage the uncertainty of outputs of 14 general circulation models (GCMs). The results showed an increasing temperature in all months and irregular changes of precipitation (either increasing or decreasing) in the future period. In addition, the results of the calculated annual net water requirement for all crops affected by climate change indicated an increase between 4 and 10 %. Furthermore, an increasing process is also expected regarding to the required water demand volume. The most and the least expected increase in the water demand volume is about 13 and 5 % for A2 and B1 scenarios, respectively. Considering the results and the limited water resources in the study area, it is crucial to provide water resources planning in order to reduce the negative effects of climate change. Therefore, the adaptation scenarios with the climate change related to crop pattern and water consumption should be taken into account.

Ensemble Streamflow Prediction (ESP) provides an efficient tool for seasonal hydrological forecasts. In this study, we propose a new modification of input data series for the ESP system used for the runoff volume prediction with a lead of one month. These series are not represented by short historical weather datasets but by longer generated synthetic weather data series. Before their submission to the hydrological model, their number is restricted by relations among observed meteorological variables (average monthly precipitation and temperature) and large-scale climatic patterns and indices (e.g. North Atlantic Oscillation, sea level pressure values and two geopotential heights). This modification was tested over a four-year testing period using the river basin in central Europe. The LARS-WG weather generator proved to be a suitable tool for the extension of the historical weather records. The modified ESP approach proved to be more efficient in the majority of months compared both to the original ESP method and reference forecast (based on probability distribution of historical discharges). The improvement over traditional ESP was most obvious in the narrower forecast interval of the expected runoff volume. The inefficient forecasts of the modified ESP scheme (compared to traditional ESP) were conditioned by an insufficient restriction of input synthetic weather datasets by the climate forecast.

This study presents a novel procedure based on ensemble empirical mode decomposition (EEMD) and optimized support vector machine (SVM) for multi-fault diagnosis of rolling element bearings. The vibration signal is adaptively decomposed into a number of intrinsic mode functions (IMFs) by EEMD. Two types of features, the EEMD energy entropy and singular values of the matrix whose rows are IMFs, are extracted. EEMD energy entropy is used to specify whether the bearing has faults or not. If the bearing has faults, singular values are input to multi-class SVM optimized by inter-cluster distance in the feature space (ICDSVM) to specify the fault type. The proposed method was tested on a system with an electric motor which has two rolling bearings with 8 normal working conditions and 48 fault working conditions. Five groups of experiments were done to evaluate the effectiveness of the proposed method. The results show that the proposed method outperforms other methods both mentioned in this paper and published in other literatures.

As train loads and travel speeds have increased over time, railway axle bearings have become critical elements which require more efficient non-destructive inspection and fault diagnostics methods. This paper presents a novel and adaptive procedure based on ensemble empirical mode decomposition (EEMD) and Hilbert marginal spectrum for multi-fault diagnostics of axle bearings. EEMD overcomes the limitations that often hypothesize about data and computational efforts that restrict the application of signal processing techniques. The outputs of this adaptive approach are the intrinsic mode functions that are treated with the Hilbert transform in order to obtain the Hilbert instantaneous frequency spectrum and marginal spectrum. Anyhow, not all the IMFs obtained by the decomposition should be considered into Hilbert marginal spectrum. The IMFs’ confidence index arithmetic proposed in this paper is fully autonomous, overcoming the major limit of selection by user with experience, and allows the development of on-line tools. The effectiveness of the improvement is proven by the successful diagnosis of an axle bearing with a single fault or multiple composite faults, e.g., outer ring fault, cage fault and pin roller fault. PMID:25970256

To increase the accuracy of ocean predictions in the Yellow and East China Sea (YES), the satellite-borne sea surface temperature (SST) data have been assimilated to an operational ocean modeling system by applying an ensemble Kalman filter (EnKF). As the observed SST was assimilated continuously into the model with time, the ensemble spread decreased and the efficiency of data assimilation degenerated. To increase liability of the system and the ensemble spread, model uncertainties were represented stochastically by perturbing the model tendency parameters such as eddy viscosity, bottom drag coefficient, light attenuation depth, as well as atmospheric forcing. Data assimilation experiments were performed with forcing from a regional atmospheric model from September 2011 to February 2012. The assimilation results with and without the stochastically perturbed model parameters and atmospheric forcing were compared. The ensemble with the perturbations has larger spread and smaller root-mean-square deviation (RMSD) in temperature compared with the ensemble without the perturbations. The SST RMSD relative to anther supplementary SST dataset was reduced from 0.91 to 0.81 °C over the YES. The assimilation of the SST data improved the simulated SST compared with the observation at the ocean buoy stations, and also made the subsurface temperature profiles closer to the observed ones. The assimilation experiments showed that a stochastic representation of the model errors by the perturbations of the model parameters and atmospheric forcing increases the spread of the ensemble and improves the structure of background error covariance, which enhances the performance of the ensemble modeling system in the YES.

There is increasing evidence that protein dynamics and conformational changes can play an important role in modulating biological function. As a result, experimental and computational methods are being developed, often synergistically, to study the dynamical heterogeneity of a protein or other macromolecules in solution. Thus, methods such as molecular dynamics simulations or ensemble refinement approaches have provided conformational ensembles that can be used to understand protein function and biophysics. These developments have in turn created a need for algorithms and software that can be used to compare structural ensembles in the same way as the root-mean-square-deviation is often used to compare static structures. Although a few such approaches have been proposed, these can be difficult to implement efficiently, hindering a broader applications and further developments. Here, we present an easily accessible software toolkit, called ENCORE, which can be used to compare conformational ensembles generated either from simulations alone or synergistically with experiments. ENCORE implements three previously described methods for ensemble comparison, that each can be used to quantify the similarity between conformational ensembles by estimating the overlap between the probability distributions that underlie them. We demonstrate the kinds of insights that can be obtained by providing examples of three typical use-cases: comparing ensembles generated with different molecular force fields, assessing convergence in molecular simulations, and calculating differences and similarities in structural ensembles refined with various sources of experimental data. We also demonstrate efficient computational scaling for typical analyses, and robustness against both the size and sampling of the ensembles. ENCORE is freely available and extendable, integrates with the established MDAnalysis software package, reads ensemble data in many common formats, and can work with large

There is increasing evidence that protein dynamics and conformational changes can play an important role in modulating biological function. As a result, experimental and computational methods are being developed, often synergistically, to study the dynamical heterogeneity of a protein or other macromolecules in solution. Thus, methods such as molecular dynamics simulations or ensemble refinement approaches have provided conformational ensembles that can be used to understand protein function and biophysics. These developments have in turn created a need for algorithms and software that can be used to compare structural ensembles in the same way as the root-mean-square-deviation is often used to compare static structures. Although a few such approaches have been proposed, these can be difficult to implement efficiently, hindering a broader applications and further developments. Here, we present an easily accessible software toolkit, called ENCORE, which can be used to compare conformational ensembles generated either from simulations alone or synergistically with experiments. ENCORE implements three previously described methods for ensemble comparison, that each can be used to quantify the similarity between conformational ensembles by estimating the overlap between the probability distributions that underlie them. We demonstrate the kinds of insights that can be obtained by providing examples of three typical use-cases: comparing ensembles generated with different molecular force fields, assessing convergence in molecular simulations, and calculating differences and similarities in structural ensembles refined with various sources of experimental data. We also demonstrate efficient computational scaling for typical analyses, and robustness against both the size and sampling of the ensembles. ENCORE is freely available and extendable, integrates with the established MDAnalysis software package, reads ensemble data in many common formats, and can work with large

Background Accurately predicting the binding affinities of large sets of protein-ligand complexes is a key challenge in computational biomolecular science, with applications in drug discovery, chemical biology, and structural biology. Since a scoring function (SF) is used to score, rank, and identify drug leads, the fidelity with which it predicts the affinity of a ligand candidate for a protein's binding site has a significant bearing on the accuracy of virtual screening. Despite intense efforts in developing conventional SFs, which are either force-field based, knowledge-based, or empirical, their limited predictive power has been a major roadblock toward cost-effective drug discovery. Therefore, in this work, we present novel SFs employing a large ensemble of neural networks (NN) in conjunction with a diverse set of physicochemical and geometrical features characterizing protein-ligand complexes to predict binding affinity. Results We assess the scoring accuracies of two new ensemble NN SFs based on bagging (BgN-Score) and boosting (BsN-Score), as well as those of conventional SFs in the context of the 2007 PDBbind benchmark that encompasses a diverse set of high-quality protein families. We find that BgN-Score and BsN-Score have more than 25% better Pearson's correlation coefficient (0.804 and 0.816 vs. 0.644) between predicted and measured binding affinities compared to that achieved by a state-of-the-art conventional SF. In addition, these ensemble NN SFs are also at least 19% more accurate (0.804 and 0.816 vs. 0.675) than SFs based on a single neural network that has been traditionally used in drug discovery applications. We further find that ensemble models based on NNs surpass SFs based on the decision-tree ensemble technique Random Forests. Conclusions Ensemble neural networks SFs, BgN-Score and BsN-Score, are the most accurate in predicting binding affinity of protein-ligand complexes among the considered SFs. Moreover, their accuracies are even higher

Climate simulation codes, such as the Community Earth System Model (CESM), are especially complex and continually evolving. Their ongoing state of development requires frequent software verification in the form of quality assurance to both preserve the quality of the code and instill model confidence. To formalize and simplify this previously subjective and computationally expensive aspect of the verification process, we have developed a new tool for evaluating climate consistency. Because an ensemble of simulations allows us to gauge the natural variability of the model's climate, our new tool uses an ensemble approach for consistency testing. In particular, an ensemble of CESM climate runs is created, from which we obtain a statistical distribution that can be used to determine whether a new climate run is statistically distinguishable from the original ensemble. The CESM ensemble consistency test, referred to as CESM-ECT, is objective in nature and accessible to CESM developers and users. The tool has proven its utility in detecting errors in software and hardware environments and providing rapid feedback to model developers.

Dynamic groundwater-river water exchange between the Columbia River and the Hanford 300 Area has substantial influence on flow and transport processes and biogeochemical cycles at the site. Existing research efforts have shown that the groundwater-river water interaction zone is a heterogeneous and highly dynamic region exhibiting variability over a range of space and time scales. Since it is insufficient to rely on well-based information to characterize the spatially variable subsurface properties within this interaction zone, we have installed a large-scale (300 m by 300 m) 3-dimensional electrical resistivity tomography (ERT) array to monitor river water intrusion and retreat at a temporal resolution of four images per day, using a novel time lapse ERT imaging methodology that explicitly accommodates the sharp, transient bulk conductivity contrast at the water table. The 4-dimensional electrical geophysical data is incorporated into ensemble-based data assimilation algorithms (e.g., ensemble Kalman filter and ensemble smoother) to statistically estimate the heterogeneous permeability field at the groundwater-river water interaction zone, which is critical for modeling flow and biogeochemical transport processes at the site. A new high performance computing capability has been developed to couple the ERT imaging code E4D (Johnson et al., 2010) with the site-scale flow and transport code, PFLOTRAN (Hammond et al., 2012), which serves as the forward simulator of the hydrogeophysical data assimilation. The joint, parallel, multi-physics code is able to simulate well-based pressure and pore-fluid conductivity measurements, as well as spatially continuous ERT measurements collected throughout the experiment. The data assimilation framework integrates both the well-based point measurements and spatially continuous ERT measurements in a sequential Bayesian manner. Our study demonstrates the effectiveness of ERT data for large-scale characterization of subsurface

An ensemble is a collection of related datasets. Each dataset, or member, of an ensemble is normally large, multidimensional, and spatio-temporal. Ensembles are used extensively by scientists and mathematicians, for example, by executing a simulation repeatedly with slightly different input parameters and saving the results in an ensemble to see how parameter choices affect the simulation. To draw inferences from an ensemble, scientists need to compare data both within and between ensemble members. We propose two techniques to support ensemble exploration and comparison: a pairwise sequential animation method that visualizes locally neighboring members simultaneously, and a screen door tinting method that visualizes subsets of members using screen space subdivision. We demonstrate the capabilities of both techniques, first using synthetic data, then with simulation data of heavy ion collisions in high-energy physics. Results show that both techniques are capable of supporting meaningful comparisons of ensemble data. PMID:22347540

In recent years, increasing attention has been paid to the rate of spread of endometrial carcinoma, especially in the postmenopausal period. Along with routine diagnostic methods, giving information on the location and progression of the disease, there are some morphological methods determining very accurately the correlations in the development of this type of cancer and his prognosis. Moreover--in recent years, the accumulated information about the molecular profile of this type of cancer made it possible to implement a number of new drugs against the so-called molecular therapy -'targets' in the neoplastic process. Significant proportion of cases show response rates, it is more hope in the development of more successful formulas and target -based therapy. In this review, we present and discuss the role of certain molecular markers as potential indicators of prognosis and development, as well as determining the target treatment of endometrial carcinoma. PMID:25909140

Cold temperature and associated extremes often impact adversely human health and environment and bring disruptions in economic activities during winter over Canada. This study investigates projected changes in winter (December to March) period cold extreme days (i.e., cold nights, cold days, frost days, and ice days) and cold spells over Canada based on 11 regional climate model (RCM) simulations for the future 2040-2069 period with respect to the current 1970-1999 period. These simulations, available from the North American Regional Climate Change Assessment Program, were obtained with six different RCMs, when driven by four different Atmosphere-Ocean General Circulation Models, under the Special Report on Emissions Scenarios A2 scenario. Based on the reanalysis boundary conditions, the RCM simulations reproduce spatial patterns of observed mean values of the daily minimum and maximum temperatures and inter-annual variability of the number of cold nights over different Canadian climatic regions considered in the study. A comparison of current and future period simulations suggests decreases in the frequency of cold extreme events (i.e., cold nights, cold days and cold spells) and in selected return levels of maximum duration of cold spells over the entire study domain. Important regional differences are noticed as the simulations generally indicate smaller decreases in the characteristics of extreme cold events over western Canada compared to the other regions. The analysis also suggests an increase in the frequency of midwinter freeze-thaw events, due mainly to a decrease in the number of frost days and ice days for all Canadian regions. Especially, densely populated southern and coastal Canadian regions will require in depth studies to facilitate appropriate adaptation strategies as these regions are clearly expected to experience large increases in the frequency of freeze-thaw events.

Drought is among the costliest natural hazards worldwide and extreme drought events in recent years have caused huge losses to various sectors. Drought prediction is therefore critically important for providing early warning information to aid decision making to cope with drought. Due to the complicated nature of drought, it has been recognized that the univariate drought indicator may not be sufficient for drought characterization and hence multivariate drought indices have been developed for drought monitoring. Alongside the substantial effort in drought monitoring with multivariate drought indices, it is of equal importance to develop a drought prediction method with multivariate drought indices to integrate drought information from various sources. This study proposes a general framework for multivariate multi-index drought prediction that is capable of integrating complementary prediction skills from multiple drought indices. The Multivariate Ensemble Streamflow Prediction (MESP) is employed to sample from historical records for obtaining statistical prediction of multiple variables, which is then used as inputs to achieve multivariate prediction. The framework is illustrated with a linearly combined drought index (LDI), which is a commonly used multivariate drought index, based on climate division data in California and New York in the United States with different seasonality of precipitation. The predictive skill of LDI (represented with persistence) is assessed by comparison with the univariate drought index and results show that the LDI prediction skill is less affected by seasonality than the meteorological drought prediction based on SPI. Prediction results from the case study show that the proposed multivariate drought prediction outperforms the persistence prediction, implying a satisfactory performance of multivariate drought prediction. The proposed method would be useful for drought prediction to integrate drought information from various sources

In order to reduce the uncertainty of offline land surface model (LSM) simulations of land evapotranspiration (ET), we used ensemble simulations based on three meteorological forcing datasets [Princeton, ITPCAS (Institute of Tibetan Plateau Research, Chinese Academy of Sciences), Qian] and four LSMs (BATS, VIC, CLM3.0 and CLM3.5), to explore the trends and spatiotemporal characteristics of ET, as well as the spatiotemporal pattern of ET in response to climate factors over mainland China during 1982-2007. The results showed that various simulations of each member and their arithmetic mean (Ens Mean) could capture the spatial distribution and seasonal pattern of ET sufficiently well, where they exhibited more significant spatial and seasonal variation in the ET compared with observation-based ET estimates (Obs MTE). For the mean annual ET, we found that the BATS forced by Princeton forcing overestimated the annual mean ET compared with Obs MTE for most of the basins in China, whereas the VIC forced by Princeton forcing showed underestimations. By contrast, the Ens Mean was closer to Obs MTE, although the results were underestimated over Southeast China. Furthermore, both the Obs MTE and Ens Mean exhibited a significant increasing trend during 1982-98; whereas after 1998, when the last big EI Ni˜no event occurred, the Ens Mean tended to decrease significantly between 1999 and 2007, although the change was not significant for Obs MTE. Changes in air temperature and shortwave radiation played key roles in the long-term variation in ET over the humid area of China, but precipitation mainly controlled the long-term variation in ET in arid and semi-arid areas of China.

In this paper, a glucose and pH-responsive release system based on polymeric network capped mesoporous silica nanoparticles (MSN) has been presented. The poly(acrylic acid) (PAA) brush on MSN was obtained through the surface-initiated atom transfer radical polymerization (SI-ATRP) of t-butyl acrylate and the subsequent hydrolysis of the ester bond. Then the PAA was glycosylated with glucosamine to obtain P(AA-AGA). To block the pore of silica, the P(AA-AGA) chains were cross-linked through the formation of boronate esters between 4,4-(ethylenedicarbamoyl)phenylboronic acid (EPBA) and the hydroxyl groups of P(AA-AGA). The boronate esters disassociated in the presence of glucose or in acidic conditions, which lead to opening of the mesoporous channels and the release of loaded guest molecules. The rate of release could be tuned by varying the pH or the concentration of glucose in the environment. The combination of two stimuli exhibited an obvious enhanced release capacity in mild acidic conditions (pH 6.0). PMID:25735191

As instrumental world music ensembles such as steel pan, mariachi, gamelan and West African drums are becoming more the norm than the exception in North American school music programs, there are other world music ensembles just starting to gain popularity in particular parts of the United States. The kulintang ensemble, a drum and gong ensemble…

Extreme hot spells can have significant impacts on human society and ecosystems, and therefore it is important to assess how these extreme events will evolve in a changing climate. In this study, the impact of climate change on hot days, hot spells, and heat waves, over 10 climatic regions covering Canada, based on 11 regional climate model (RCM) simulations from the North American Regional Climate Change Assessment Program for the June to August summer period is presented. These simulations were produced with six RCMs driven by four Atmosphere-Ocean General Circulation Models (AOGCM), for the A2 emission scenario, for the current 1970-1999 and future 2040-2069 periods. Two types of hot days, namely HD-1 and HD-2, defined respectively as days with only daily maximum temperature (Tmax) and both Tmax and daily minimum temperature (Tmin) exceeding their respective thresholds (i.e., period-of-record 90th percentile of Tmax and Tmin values), are considered in the study. Analogous to these hot days, two types of hot spells, namely HS-1 and HS-2, are identified as spells of consecutive HD-1 and HD-2 type hot days. In the study, heat waves are defined as periods of three or more consecutive days, with Tmax above 32 °C threshold. Results suggest future increases in the number of both types of hot days and hot spell events for the 10 climatic regions considered. However, the projected changes show high spatial variability and are highly dependent on the RCM and driving AOGCM combination. Extreme hot spell events such as HS-2 type hot spells of longer duration are expected to experience relatively larger increases compared to hot spells of moderate duration, implying considerable heat related environmental and health risks. Regionally, the Great Lakes, West Coast, Northern Plains, and Maritimes regions are found to be more affected due to increases in the frequency and severity of hot spells and/or heat wave characteristics, requiring more in depth studies for these regions

We describe a combined 2D/3D approach for the superposition of flexible chemical structures, which is based on recent progress in the efficient identification of common subgraphs and a gradient-based torsion space optimization algorithm. The simplicity of the approach is reflected in its generality and computational efficiency: the suggested approach neither requires precalculated statistics on the conformations of the molecules nor does it make simplifying assumptions on the topology of the molecules being compared. Furthermore, graph-basedmolecular alignment produces alignments that are consistent with the chemistry of the molecules as well as their general structure, as it depends on both the local connectivities between atoms and the overall topology of the molecules. We validate this approach on benchmark sets taken from the literature and show that it leads to good results compared to computationally and algorithmically more involved methods. The results suggest that, for most practical purposes, graph-basedmolecular alignment is a viable alternative to molecular field alignment with respect to structural superposition and leads to structures of comparable quality in a fraction of the time. PMID:17381175

Exposure to high concentrations of fine particulate matter (PM₂.₅) can cause serious health problems because PM₂.₅ contains microscopic solid or liquid droplets that are sufficiently small to be ingested deep into human lungs. Thus, daily prediction of PM₂.₅ levels is notably important for regulatory plans that inform the public and restrict social activities in advance when harmful episodes are foreseen. A hybrid EEMD-GRNN (ensemble empirical mode decomposition-general regression neural network) model based on data preprocessing and analysis is firstly proposed in this paper for one-day-ahead prediction of PM₂.₅ concentrations. The EEMD part is utilized to decompose original PM₂.₅ data into several intrinsic mode functions (IMFs), while the GRNN part is used for the prediction of each IMF. The hybrid EEMD-GRNN model is trained using input variables obtained from principal component regression (PCR) model to remove redundancy. These input variables accurately and succinctly reflect the relationships between PM₂.₅ and both air quality and meteorological data. The model is trained with data from January 1 to November 1, 2013 and is validated with data from November 2 to November 21, 2013 in Xi'an Province, China. The experimental results show that the developed hybrid EEMD-GRNN model outperforms a single GRNN model without EEMD, a multiple linear regression (MLR) model, a PCR model, and a traditional autoregressive integrated moving average (ARIMA) model. The hybrid model with fast and accurate results can be used to develop rapid air quality warning systems. PMID:25089688

Uncertainty analysis is starting to be widely acknowledged as an integral part of hydrological modeling. The conventional treatment of uncertainty analysis in hydrologic modeling is to assume a deterministic model structure, and treat its associated parameters as imperfectly known, thereby neglecting the uncertainty associated with the model structure. In this paper, a modeling framework that can explicitly account for the effect of model structure uncertainty has been proposed. The modeling framework is based on initially generating different realizations of the original data set using a non-parametric bootstrap method, and then exploiting the ability of the self-organizing algorithms, namely genetic programming, to evolve their own model structure for each of the resampled data sets. The resulting ensemble of models is then used to quantify the uncertainty associated with the model structure. The performance of the proposed modeling framework is analyzed with regards to its ability in characterizing the evapotranspiration process at the Southwest Sand Storage facility, located near Ft. McMurray, Alberta. Eddy-covariance-measured actual evapotranspiration is modeled as a function of net radiation, air temperature, ground temperature, relative humidity, and wind speed. Investigating the relation between model complexity, prediction accuracy, and uncertainty, two sets of experiments were carried out by varying the level of mathematical operators that can be used to define the predictand-predictor relationship. While the first set uses just the additive operators, the second set uses both the additive and the multiplicative operators to define the predictand-predictor relationship. The results suggest that increasing the model complexity may lead to better prediction accuracy but at an expense of increasing uncertainty. Compared to the model parameter uncertainty, the relative contribution of model structure uncertainty to the predictive uncertainty of a model is

One of the key recommendations of the WCRP Global Drought Information System (GDIS) workshop is to develop an experimental real-time global monitoring and prediction system. While great advances has been made in global drought monitoring based on satellite observations and model reanalysis data, global drought forecasting has been stranded in part due to the limited skill both in climate forecast models and global hydrologic predictions. Having been working on drought monitoring and forecasting over USA for more than a decade, the Princeton land surface hydrology group is now developing an experimental global drought early warning system that is based on multiple climate forecast models and a calibrated global hydrologic model. In this presentation, we will test its capability in seasonal forecasting of meteorological, agricultural and hydrologic droughts over global major river basins, using precipitation, soil moisture and streamflow forecasts respectively. Based on the joint probability distribution between observations using Princeton's global drought monitoring system and model hindcasts and real-time forecasts from North American Multi-Model Ensemble (NMME) project, we (i) bias correct the monthly precipitation and temperature forecasts from multiple climate forecast models, (ii) downscale them to a daily time scale, and (iii) use them to drive the calibrated VIC model to produce global drought forecasts at a 1-degree resolution. A parallel run using the ESP forecast method, which is based on resampling historical forcings, is also carried out for comparison. Analysis is being conducted over global major river basins, with multiple drought indices that have different time scales and characteristics. The meteorological drought forecast does not have uncertainty from hydrologic models and can be validated directly against observations - making the validation an 'apples-to-apples' comparison. Preliminary results for the evaluation of meteorological drought onset

This article proposes a method for multiclass classification problems using ensembles of multinomial logistic regression models. A multinomial logit model is used as a base classifier in ensembles from random partitions of predictors. The multinomial logit model can be applied to each mutually exclusive subset of the feature space without variable selection. By combining multiple models the proposed method can handle a huge database without a constraint needed for analyzing high-dimensional data, and the random partition can improve the prediction accuracy by reducing the correlation among base classifiers. The proposed method is implemented using R, and the performance including overall prediction accuracy, sensitivity, and specificity for each category is evaluated on two real data sets and simulation data sets. To investigate the quality of prediction in terms of sensitivity and specificity, the area under the receiver operating characteristic (ROC) curve (AUC) is also examined. The performance of the proposed model is compared to a single multinomial logit model and it shows a substantial improvement in overall prediction accuracy. The proposed method is also compared with other classification methods such as the random forest, support vector machines, and random multinomial logit model. PMID:23611203

Irrigation agriculture is one the biggest consumer of water in Europe, especially in southern regions, where it accounts for up to 70% of the total water consumption. The EU Common Agricultural Policy, combined with the Water Framework Directive, imposes to farmers and irrigation managers a substantial increase of the efficiency in the use of water in agriculture for the next decade. Ensemble numerical weather predictions can be valuable data for developing operational advisory irrigation services. We propose a stochastic ensemble-based model providing spatial and temporal estimates of crop water requirements, implemented within an advisory service offering detailed maps of irrigation water requirements and crop water consumption estimates, to be used by water irrigation managers and farmers. The stochastic model combines estimates of crop potential evapotranspiration retrieved from ensemble numerical weather forecasts (COSMO-LEPS, 16 members, 7 km resolution) and canopy parameters (LAI, albedo, fractional vegetation cover) derived from high resolution satellite images in the visible and near infrared wavelengths. The service provides users with daily estimates of crop water requirements for lead times up to five days. The temporal evolution of the crop potential evapotranspiration is simulated with autoregressive models. An ensemble Kalman filter is employed for updating model states by assimilating both ground based meteorological variables (where available) and numerical weather forecasts. The model has been applied in Campania region (Southern Italy), where a satellite assisted irrigation advisory service has been operating since 2006. This work presents the results of the system performance for one year of experimental service. The results suggest that the proposed model can be an effective support for a sustainable use and management of irrigation water, under conditions of water scarcity and drought. Since the evapotranspiration term represents a staple

probabilistic component to the FF-EWS. As a first step, we have incorporated the uncertainty in rainfall estimates and forecasts based on an ensemble of equiprobable rainfall scenarios. The presented study has focused on a number of rainfall events and the performance of the FF-EWS evaluated in terms of its ability to produce probabilistic hazard warnings for decision-making support.

A major problem in structure-based virtual screening applications is the appropriate selection of a single or even multiple protein structures to be used in the virtual screening process. A priori it is unknown which protein structure(s) will perform best in a virtual screening experiment. We investigated the performance of ensemble docking, as a function of ensemble size, for eight targets of pharmaceutical interest. Starting from single protein structure docking results, for each ensemble size up to 500,000 combinations of protein structures were generated, and, for each ensemble, pose prediction and virtual screening results were derived. Comparison of single to multiple protein structure results suggests improvements when looking at the performance of the worst and the average over all single protein structures to the performance of the worst and average over all protein ensembles of size two or greater, respectively. We identified several key factors affecting ensemble docking performance, including the sampling accuracy of the docking algorithm, the choice of the scoring function, and the similarity of database ligands to the cocrystallized ligands of ligand-bound protein structures in an ensemble. Due to these factors, the prospective selection of optimum ensembles is a challenging task, shown by a reassessment of published ensemble selection protocols. PMID:22482774

The formation of effective and precise linkages in bottom-up or top-down processes is important for the development of self-assembled materials. Self-assembly through molecular recognition events is a powerful tool for producing functionalized materials. Photoresponsive molecular recognition systems can permit the creation of photoregulated self-assembled macroscopic objects. Here we demonstrate that macroscopic gel assembly can be highly regulated through photoisomerization of an azobenzene moiety that interacts differently with two host molecules. A photoregulated gel assembly system is developed using polyacrylamide-based hydrogels functionalized with azobenzene (guest) or cyclodextrin (host) moieties. Reversible adhesion and dissociation of the host gel from the guest gel may be controlled by photoirradiation. The differential affinities of α-cyclodextrin or β-cyclodextrin for the trans-azobenzene and cis-azobenzene are employed in the construction of a photoswitchable gel assembly system. PMID:22215078

The formation of effective and precise linkages in bottom-up or top-down processes is important for the development of self-assembled materials. Self-assembly through molecular recognition events is a powerful tool for producing functionalized materials. Photoresponsive molecular recognition systems can permit the creation of photoregulated self-assembled macroscopic objects. Here we demonstrate that macroscopic gel assembly can be highly regulated through photoisomerization of an azobenzene moiety that interacts differently with two host molecules. A photoregulated gel assembly system is developed using polyacrylamide-based hydrogels functionalized with azobenzene (guest) or cyclodextrin (host) moieties. Reversible adhesion and dissociation of the host gel from the guest gel may be controlled by photoirradiation. The differential affinities of α-cyclodextrin or β-cyclodextrin for the trans-azobenzene and cis-azobenzene are employed in the construction of a photoswitchable gel assembly system. PMID:22215078

Electrophysiological properties of neurons as the basic cellular elements of the central nervous system and their synaptic connections are well characterized down to a molecular level. However, the behavior of complex noisy networks formed by these constituents usually cannot simply be derived from the knowledge of its microscopic parameters. As a consequence, cooperative phenomena based on the interaction of neurons were postulated. This is a report on a study of global network spike activity as a function of synaptic interaction. We performed experiments in dissociated cultured hippocampal neurons and, for comparison, simulations of a mathematical model closely related to electrophysiology. Numeric analyses revealed that at a critical level of synaptic connectivity the firing behavior undergoes a phase transition. This cooperative effect depends crucially on the interaction of numerous cells and cannot be attributed to the spike threshold of individual neurons. In the experiment a drastic increase in the firing level was observed upon increase of synaptic efficacy by lowering of the extracellular magnesium concentration, which is compatible with our theoretical predictions. This "on-off" phenomenon demonstrates that even in small neuronal ensembles collective behavior can emerge which is not explained by the characteristics of single neurons. PMID:8542966

A modification of ensemble Monte Carlo uninformative variable elimination (EMCUVE) is proposed, which does not involve the use of random variables, with the aim of improving the performance of partial least squares (PLS) regression models, increasing the consistency of results and reducing processing time by selecting the most informative variables in a spectral dataset. The proposed method (ensemble Monte Carlo variable selection - EMCVS) and the robust version (REMCVS) were compared to PLS models and with the existing EMCUVE method using three near infrared (NIR) datasets, i.e. prediction of n-butanol in a five-solvent mixture, moisture in corn and glucosinolates in rapeseed. The proposed methods were more consistent, produced models with better predictive accuracy (lower root mean squared error of prediction) and required lower computation time than the conventional EMCUVE method on these datasets. In this application, the proposed method was applied to PLS regression coefficients but it may, in principle, be used on any regression vector.

Colloidal semiconductor nanocrystals are among the best candidates for realizing a nano-structured single photon source at room temperature. In this paper we present a new and efficient optical method to assess the quality of a sample of nanocrystals as single-photon emitters, by an ensemble measurement of photoluminescence. We relate the ensemble photoluminescence measurements to the photon statistics of single emitters by a simple theoretical model. As an example we compare two different kinds of CdSe/CdS dot-in-rods, showing a similar degree of single photon emission when observed on a selection of single nanocrystals. The results are compared with anti-bunching measurements realized on single nanocrystals of the two kinds.

In this study, we addressed the application of Artificial Neural Networks (ANN) in the context of Hydrological Ensemble Prediction Systems (HEPS). Such systems have become popular in the past years as a tool to include the forecast uncertainty in the decision making process. HEPS considers fundamentally the uncertainty cascade model [4] for uncertainty representation. Analogously, the machine learning community has proposed models of multiple classifier systems that take into account the variability in datasets, input space, model structures, and parametric configuration [3]. This approach is based primarily on the well-known "no free lunch theorem" [1]. Consequently, we propose a framework based on two separate but complementary topics: data stratification and input variable selection (IVS). Thus, we promote an ANN prediction stack in which each predictor is trained based on input spaces defined by the IVS application on different stratified sub-samples. All this, added to the inherent variability of classical ANN optimization, leads us to our ultimate goal: diversity in the prediction, defined as the complementarity of the individual predictors. The stratification application on the 12 basins used in this study, which originate from the second and third workshop of the MOPEX project [2], shows that the informativeness of the data is far more important than the quantity used for ANN training. Additionally, the input space variability leads to ANN stacks that outperform an ANN stack model trained with 100% of the available information but with a random selection of dataset used in the early stopping method (scenario R100P). The results show that from a deterministic view, the main advantage focuses on the efficient selection of the training information, which is an equally important concept for the calibration of conceptual hydrological models. On the other hand, the diversity achieved is reflected in a substantial improvement in the scores that define the

Recently, extreme weather occurrences associated with climate change are gradually increasing in frequency, causing unprecedented major weather-related disasters. General Circulation Models (GCMs) are the basic tool used for modelling climate. However, the discrepancy between the spatio-temporal scale at which the models deliver output and the scales that are generally required for applied studies has led to the development of various downscaling methods. Stochastic downscaling methods have been used extensively to generate long-term weather sequences from finite observed records. A primary objective of this study is to develop a forecasting scheme which is able to make use of a multimodel ensemble of different GCMs. This study employed a Nonstationary Hidden Markov Chain Model (NHMM) as a main tool for downscaling seasonal ensemble forecasts over 3 month period, providing daily forecasts. In particular, this study uses MMEs from the APEC Climate Center (APCC) as a predictor. Our results showed that the proposed downscaling scheme can provide the skillful forecasts as inputs for hydrologic modeling, which in turn may improve water resources management. An application to the Nakdong watershed in South Korea illustrates how the proposed approach can lead to potentially reliable information for water resources management. Acknowledgement: This research was supported by a grant (13SCIPA01) from Smart Civil Infrastructure Research Program funded by the Ministry of Land, Infrastructure and Transport (MOLIT) of Korea government and the Korea Agency for Infrastructure Technology Advancement (KAIA). Keywords: Climate Change, GCM, Hidden Markov Chain Model, Multi-Model Ensemble

The molecular mechanism of a reaction is embedded in its transition path ensemble, the complete collection of reactive trajectories. Utilizing the information in the transition path ensemble alone, we developed a novel metric, which we termed the emergent potential energy, for distinguishing reaction coordinates from the bath modes. The emergent potential energy can be understood as the average energy cost for making a displacement of a coordinate in the transition path ensemble. Where displacing a bath mode invokes essentially no cost, it costs significantly to move the reaction coordinate. Based on some general assumptions of the behaviors of reaction and bath coordinates in the transition path ensemble, we proved theoretically with statistical mechanics that the emergent potential energy could serve as a benchmark of reaction coordinates and demonstrated its effectiveness by applying it to a prototypical system of biomolecular dynamics. Using the emergent potential energy as guidance, we developed a committor-free and intuition-independent method for identifying reaction coordinates in complex systems. We expect this method to be applicable to a wide range of reaction processes in complex biomolecular systems.

Approximation surrogates are used to substitute the numerical simulation model within optimization algorithms in order to reduce the computational burden on the coupled simulation-optimization methodology. Practical utility of the surrogate-based simulation-optimization have been limited mainly due to the uncertainty in surrogate model simulations. We develop a surrogate-based coupled simulation-optimization methodology for deriving optimal extraction strategies for coastal aquifer management considering the predictive uncertainty of the surrogate model. Optimization models considering two conflicting objectives are solved using a multiobjective genetic algorithm. Objectives of maximizing the pumping from production wells and minimizing the barrier well pumping for hydraulic control of saltwater intrusion are considered. Density-dependent flow and transport simulation model FEMWATER is used to generate input-output patterns of groundwater extraction rates and resulting salinity levels. The nonparametric bootstrap method is used to generate different realizations of this data set. These realizations are used to train different surrogate models using genetic programming for predicting the salinity intrusion in coastal aquifers. The predictive uncertainty of these surrogate models is quantified and ensemble of surrogate models is used in the multiple-realization optimization model to derive the optimal extraction strategies. The multiple realizations refer to the salinity predictions using different surrogate models in the ensemble. Optimal solutions are obtained for different reliability levels of the surrogate models. The solutions are compared against the solutions obtained using a chance-constrained optimization formulation and single-surrogate-based model. The ensemble-based approach is found to provide reliable solutions for coastal aquifer management while retaining the advantage of surrogate models in reducing computational burden.

Nucleic acid amplification technologies (NAATs) represent powerful tools in clinical microbiology, particularly in areas where traditional culture-based methods alone prove insufficient. A notable advantage is in reducing the time from taking samples to reporting results. This, and the specificity and sensitivity imparted by NAATs, can help to improve patient care. Both thermal and isothermal NAATs have been adapted to aid diagnosis in clinical laboratories. Current molecular diagnostic assays are generally high-tech, and are expensive to buy and perform. Easy-to-use NAATs are beginning to appear, not only facilitating acceptable throughput in clinical laboratories, but also allowing tests to move out of the laboratory, closer to the point of care. Demand for simpler, miniaturized equipment and assays, and the trend toward personalized medicine, is leading towards the development of fully integrated automation and home-use kits. The integration of diverse disciplines, such as genomics, molecular biology, microelectromechanical systems, microfluidics, microfabrication, and organic chemistry, is behind the emerging DNA microarray technology. Development of DNA microchips allows the simultaneous detection of potentially thousands of target sequences, not only favoring high throughput, but also the potential for genotyping patient subsets with respect to their response to particular drug types (pharmakogenomics). It is envisaged that the future of probe-based technologies will see the development of fully integrated assays and devices suitable for nonskilled users. PMID:15148419

Protein-based biopolymers have become a promising class of materials for both biomedical and pharmaceutical applications, as they have well-defined molecular weights, monomer compositions, as well as tunable chemical, biological, and mechanical properties. Using standard molecular biology tools, it is possible to design and construct genes encoding artificial proteins or protein-based polymers containing multiple repeats of amino acid sequences. This article reviews some of the traditional methods used for constructing DNA duplexes encoding these repeat-containing genes, including monomer generation, concatemerization, iterative oligomerization, and seamless cloning. A facile and versatile method, called modules of degenerate codons (MDC), which uses PCR and codon degeneracy to overcome some of the disadvantages of traditional methods, is introduced. Re-engineering of the random coil spacer domain of a bioactive protein, WPT2-3R, is used to demonstrate the utility of the MDC method. MDC re-constructed coding sequences facilitate further manipulations, such as insertion, deletion, and swapping of various sequence modules. A summary of some promising emerging techniques for synthesizing repetitive sequence-containing artificial proteins is also provided. PMID:16827576

The purpose of this study was to identify an optimal surfactant-enhanced aquifer remediation (SEAR) strategy for aquifers contaminated by dense non-aqueous phase liquid (DNAPL) based on an ensemble of surrogates-based optimization technique. A saturated heterogeneous medium contaminated by nitrobenzene was selected as case study. A new kind of surrogate-based SEAR optimization employing an ensemble surrogate (ES) model together with a genetic algorithm (GA) is presented. Four methods, namely radial basis function artificial neural network (RBFANN), kriging (KRG), support vector regression (SVR), and kernel extreme learning machines (KELM), were used to create four individual surrogate models, which were then compared. The comparison enabled us to select the two most accurate models (KELM and KRG) to establish an ES model of the SEAR simulation model, and the developed ES model as well as these four stand-alone surrogate models was compared. The results showed that the average relative error of the average nitrobenzene removal rates between the ES model and the simulation model for 20 test samples was 0.8%, which is a high approximation accuracy, and which indicates that the ES model provides more accurate predictions than the stand-alone surrogate models. Then, a nonlinear optimization model was formulated for the minimum cost, and the developed ES model was embedded into this optimization model as a constrained condition. Besides, GA was used to solve the optimization model to provide the optimal SEAR strategy. The developed ensemble surrogate-optimization approach was effective in seeking a cost-effective SEAR strategy for heterogeneous DNAPL-contaminated sites. This research is expected to enrich and develop the theoretical and technical implications for the analysis of remediation strategy optimization of DNAPL-contaminated aquifers.

Flooding and flash flooding are the most costly weather-related natural hazards in the United States and world. Heavy rainfall-triggered landslides are often associated with flash flood events and cause additional loss of life and property. Therefore, it is important to understand the linkage and interaction between flash flood events and landslides. It is also pertinent to build a robust coupled flash flood and landslide disaster early warning system for disaster preparedness and hazard management. In this study, we built a coupled flash flood and landslide disaster early warning system, which is aimed for operational use by the US National Weather Service, based on an existing ensemble framework by extending the model ensemble and coupling a set of distributed hydrologic models, the Coupled Routing and Excess STorage (CREST) model and the SACramento Soil Moisture Accounting (SAC-SMA) model, with two physically based landslide prediction models, the SLope-Infiltration-Distributed Equilibrium (SLIDE) model and the Transient Rainfall Infiltration and Grid-Based Regional Slope-Stability (TRIGRS) model. We tested this prototype warning system by conducting multi-year simulations driven by the Multi-Radar Multi-Sensor (MRMS) rainfall estimates at selected basins across the United States. We then comprehensively evaluated the predictive capabilities of this system against observed and reported flood and landslides events. Our results show that the system is generally capable of making accurate predictions of flash flood and landslide events in terms of their locations and time of occurrence. The recently developed ensemble framework also enables us to quantify the uncertainty of the predictions and the probabilities of anticipated disaster events.

Fluorescence bioimaging potential, both in vitro and in vivo, of a yellow emissive triazole-basedmolecular marker has been investigated and demonstrated. Three different kinds of cells, viz Bacillus thuringiensis, Candida albicans, and Techoma stans pollen grains were used to investigate the intracellular zinc imaging potential of 1 (in vitro studies). Fluorescence imaging of translocation of zinc through the stem of small herb, Peperomia pellucida, having transparent stem proved in vivo bioimaging capability of 1. This approach will enable in screening cell permeability and biostability of a newly developed probe. Similarly, the current method for detection and localization of zinc in Gram seed sprouts could be an easy and potential alternative of the existing analytical methods to investigate the efficiency of various strategies applied for increasing zinc-content in cereal crops. The probe-zinc ensemble has efficiently been applied for detecting phosphate-based biomolecules. PMID:24725748

The concept of geographic change detection is relevant in many areas. Changes in geography can reveal much information about a particular location. For example, analysis of changes in geography can identify regions of population growth, change in land use, and potential environmental disturbance. A common way to perform change detection is to use a simple method such as differencing to detect regions of change. Though these techniques are simple, often the application of these techniques is very limited. Recently, use of machine learning methods such as neural networks for change detection has been explored with great success. In this work, we explore the use of ensemble learning methodologies for detecting changes in bitemporal synthetic aperture radar (SAR) images. Ensemble learning uses a collection of weak machine learning classifiers to create a stronger classifier which has higher accuracy than the individual classifiers in the ensemble. The strength of the ensemble lies in the fact that the individual classifiers in the ensemble create a mixture of experts in which the final classification made by the ensemble classifier is calculated from the outputs of the individual classifiers. Our methodology leverages this aspect of ensemble learning by training collections of weak decision tree based classifiers to identify regions of change in SAR images collected of a region in the Staten Island, New York area during Hurricane Sandy. Preliminary studies show that the ensemble method has approximately 11.5% higher change detection accuracy than an individual classifier.

In anthropogenically heavily impacted river catchments, such as the Lusatian river catchments of Spree and Schwarze Elster (Germany), the robust assessment of possible impacts of climate change on the regional water resources is of high relevance for the development and implementation of suitable climate change adaptation strategies. Large uncertainties inherent in future climate projections may, however, reduce the willingness of regional stakeholder to develop and implement suitable adaptation strategies to climate change. This study provides an overview of different possibilities to consider uncertainties in climate change impact assessments by means of (1) an ensemblebased modelling approach and (2) the incorporation of measured and simulated meteorological trends. The ensemblebased modelling approach consists of the meteorological output of four climate downscaling approaches (DAs) (two dynamical and two statistical DAs (113 realisations in total)), which drive different model configurations of two conceptually different hydrological models (HBV-light and WaSiM-ETH). As study area serve three near natural subcatchments of the Spree and Schwarze Elster river catchments. The objective of incorporating measured meteorological trends into the analysis was twofold: measured trends can (i) serve as a mean to validate the results of the DAs and (ii) be regarded as harbinger for the future direction of change. Moreover, regional stakeholders seem to have more trust in measurements than in modelling results. In order to evaluate the nature of the trends, both gradual (Mann-Kendall test) and step changes (Pettitt test) are considered as well as both temporal and spatial correlations in the data. The results of the ensemblebased modelling chain show that depending on the type (dynamical or statistical) of DA used, opposing trends in precipitation, actual evapotranspiration and discharge are simulated in the scenario period (2031-2060). While the statistical DAs

New experimental techniques in epigenomics allow researchers to assay a diversity of highly dynamic features such as histone marks, DNA modifications or chromatin structure. The study of their fluctuations should provide insights into gene expression regulation, cell differentiation and disease. The Ensembl project collects and maintains the Ensembl regulation data resources on epigenetic marks, transcription factor binding and DNA methylation for human and mouse, as well as microarray probe mappings and annotations for a variety of chordate genomes. From this data, we produce a functional annotation of the regulatory elements along the human and mouse genomes with plans to expand to other species as data becomes available. Starting from well-studied cell lines, we will progressively expand our library of measurements to a greater variety of samples. Ensembl's regulation resources provide a central and easy-to-query repository for reference epigenomes. As with all Ensembl data, it is freely available at http://www.ensembl.org, from the Perl and REST APIs and from the public Ensembl MySQL database server at ensembldb.ensembl.org. Database URL: http://www.ensembl.org. PMID:26888907

Molecular imaging is an emerging discipline which plays critical roles in diagnosis and therapeutics. It visualizes and quantifies markers that are aberrantly expressed during the disease origin and development. Protein molecules remain to be one major class of imaging probes, and the option has been widely diversified due to the recent advances in protein engineering techniques. Antibodies are part of the immunosystem which interact with target antigens with high specificity and affinity. They have long been investigated as imaging probes and were coupled with imaging motifs such as radioisotopes for that purpose. However, the relatively large size of antibodies leads to a half-life that is too long for common imaging purposes. Besides, it may also cause a poor tissue penetration rate and thus compromise some medical applications. It is under this context that various engineered protein probes, essentially antibody fragments, protein scaffolds, and natural ligands have been developed. Compared to intact antibodies, they possess more compact size, shorter clearance time, and better tumor penetration. One major challenge of using protein probes in molecular imaging is the affected biological activity resulted from random labeling. Site-specific modification, however, allows conjugation happening in a stoichiometric fashion with little perturbation of protein activity. The present review will discuss protein-based probes with focus on their application and related site-specific conjugation strategies in tumor imaging. PMID:20232092

The phrase “corneal endothelial dystrophies” embraces a group of bilateral corneal conditions that are characterized by a non-inflammatory and progressive degradation of corneal endothelium. Corneal endothelial cells exhibit a high pump site density and, along with barrier function, are responsible for maintaining the cornea in its natural state of relative dehydration. Gradual loss of endothelial cells leads to an insufficient water outflow, resulting in corneal edema and loss of vision. Since the pathologic mechanisms remain largely unknown, the only current treatment option is surgical transplantation when vision is severely impaired. In the past decade, important steps have been taken to understand how endothelial degeneration progresses on the molecular level. Studies of affected multigenerational families and sporadic cases identified genes and chromosomal loci, and revealed either Mendelian or complex disorder inheritance patterns. Mutations have been detected in genes that carry important structural, metabolic, cytoprotective, and regulatory functions in corneal endothelium. In addition to genetic predisposition, environmental factors like oxidative stress were found to be involved in the pathogenesis of endotheliopathies. This review summarizes and crosslinks the recent progress on deciphering the molecularbases of corneal endothelial dystrophies. PMID:21855542

The application of ensemble-based algorithms for history matching reservoir models has been steadily increasing over the past decade. However, the majority of implementations in the reservoir engineering have dealt only with production history matching. During geologic sequestration, the injection of large quantities of CO2 into the subsurface may alter the stress/strain field which in turn can lead to surface uplift or subsidence. Therefore, it is essential to couple multiphase flow and geomechanical response in order to predict and quantify the uncertainty of CO2 plume movement for long-term, large-scale CO2 sequestration projects. In this work, we simulate and estimate the properties of a reservoir that is being used to store CO2 as part of the In Salah Capture and Storage project in Algeria. The CO2 is separated from produced natural gas and is re-injected into downdip aquifer portion of the field from three long horizontal wells. The field observation data includes ground surface deformations (uplift) measured using satellite-based radar (InSAR), injection well locations and CO2 injection rate histories provided by the operators. We implement variations of ensemble Kalman filter and ensemble smoother algorithms for assimilating both injection rate data as well as geomechanical observations (surface uplift) into reservoir model. The preliminary estimation results of horizontal permeability and material properties such as Young Modulus and Poisson Ratio are consistent with available measurements and previous studies in this field. Moreover, the existence of high-permeability channels (fractures) within the reservoir; especially in the regions around the injection wells are confirmed. This estimation results can be used to accurately and efficiently predict and quantify the uncertainty in the movement of CO2 plume.

We present a new method of conducting fully flexible-cell molecular dynamics simulation in isothermal-isobaric ensemblebased on Langevin equations of motion. The stochastic coupling to all particle and cell degrees of freedoms is introduced in a correct way, in the sense that the stationary configurational distribution is proved to be consistent with that of the isothermal-isobaric ensemble. In order to apply the proposed method in computer simulations, a second order symmetric numerical integration scheme is developed by Trotter's splitting of the single-step propagator. Moreover, a practical guide of choosing working parameters is suggested for user specified thermo- and baro-coupling time scales. The method and software implementation are carefully validated by a numerical example.

Free energy path sampling plays an essential role in computational understanding of chemical reactions, particularly those occurring in enzymatic environments. Among a variety of molecular dynamics simulation approaches, the generalized ensemble sampling strategy is uniquely attractive for the fact that it not only can enhance the sampling of rare chemical events but also can naturally ensure consistent exploration of environmental degrees of freedom. In this review, we plan to provide a tutorial-like tour on an emerging topic: generalized ensemble sampling of enzyme reaction free energy path. The discussion is largely focused on our own studies, particularly ones based on the metadynamics free energy sampling method and the on-the-path random walk path sampling method. We hope that this mini presentation will provide interested practitioners some meaningful guidance for future algorithm formulation and application study. PMID:27498634

This paper describes the simple construction of a unique class of supramolecular ensembles formed by electrostatic self-assembly between charged conjugated polymers and fluorophore-coupled glycoligands (glycoprobes) for the selective fluorogenic detection of receptor proteins at both the molecular and cellular levels. We show that positively and negatively charged diazobenzene-containing poly(p-phenylethynylenes) (PPEs) can be used to form stable fluorogenic probes with fluorescein-based (negatively charged) and rhodamine B based (positively charged) glycoprobes by electrostatic interaction. The structures of the ensembles have been characterized by spectroscopic and microscopic techniques. The supramolecular probes formed show quenched fluorescence in an aqueous buffer solution, which can be specifically recovered, in a concentration-dependent manner, through competitive complexation with a selective protein receptor, over a range of other unselective proteins. The ensembles also show selective fluorescence enhancement with a live cell that expresses the glycoligand receptor but not a control cell without receptor expression. PMID:27159586

In this work, we apply a detailed all-atom model with a transferable knowledge-based potential to study the folding kinetics of Formin-Binding protein, FBP28, which is a canonical three-stranded β-sheet WW domain. Replica exchange Monte Carlo (REMC) simulations starting from random coils find native-like (C α RMSD of 2.68Å) lowest energy structure. We also study the folding kinetics of FBP28 WW domain by performing a large number of ab initio Monte Carlo folding simulations. Using these trajectories, we examine the order of formation of two β –hairpins, the folding mechanism of each individual β– hairpin, and transition state ensemble (TSE) of FBP28 WW domain and compare our results with experimental data and previous computational studies. To obtain detailed structural information on the folding dynamics viewed as an ensemble process, we perform a clustering analysis procedure based on graph theory. Further, a rigorous Pfold analysis is used to obtain representative samples of the TSEs showing good quantitative agreement between experimental and simulated Φ values. Our analysis shows that the turn structure between first and second β strands is a partially stable structural motif that gets formed before entering the TSE in FBP28 WW domain and there exist two major pathways for the folding of FBP28 WW domain, which differ in the order and mechanism of hairpin formation. PMID:21365688

By means of employing 11-mercaptoundecanoic acid (11-MUA) as a reducing agent and protecting ligand, we present straightforward one-pot preparation of fluorescent Ag/Au bimetallic nanoclusters (namely AgAuNCs@11-MUA) from AgNO3 and HAuCl4 in alkaline aqueous solution at room temperature. It is found that the fluorescence of AgAuNCs@11-MUA has been selectively quenched by Cu(2+) ions, and the nonfluorescence off-state of the as-prepared AgAuNCs@11-MUA-Cu(2+) ensemble can be effectively switched on upon the addition of histidine and cysteine. By incorporating Ni(2+) ions and N-ethylmaleimide, this phenomenon is further exploited as an integrated logic gate and a specific fluorescence turn-on assay for selectively and sensitively sensing histidine and cysteine has been designed and established based on the original noncovalent AgAuNCs@11-MUA-Cu(2+) ensemble. Under the optimal conditions, histidine and cysteine can be detected in the concentration ranges of 0.25-9 and 0.25-7 μM; besides, the detection limits are found to be 87 and 111 nM (S/N = 3), respectively. Furthermore, we demonstrate that the proposed AgAuNCs@11-MUA-based fluorescent assay can be successfully utilized for biological fluids sample analysis. PMID:25761537

New experimental techniques in epigenomics allow researchers to assay a diversity of highly dynamic features such as histone marks, DNA modifications or chromatin structure. The study of their fluctuations should provide insights into gene expression regulation, cell differentiation and disease. The Ensembl project collects and maintains the Ensembl regulation data resources on epigenetic marks, transcription factor binding and DNA methylation for human and mouse, as well as microarray probe mappings and annotations for a variety of chordate genomes. From this data, we produce a functional annotation of the regulatory elements along the human and mouse genomes with plans to expand to other species as data becomes available. Starting from well-studied cell lines, we will progressively expand our library of measurements to a greater variety of samples. Ensembl’s regulation resources provide a central and easy-to-query repository for reference epigenomes. As with all Ensembl data, it is freely available at http://www.ensembl.org, from the Perl and REST APIs and from the public Ensembl MySQL database server at ensembldb.ensembl.org. Database URL: http://www.ensembl.org PMID:26888907

The Global Modeling and Assimilation Office is preparing to upgrade its three-dimensional variational system to a hybrid approach in which the ensemble is generated using a square-root ensemble Kalman filter (EnKF) and the variational problem is solved using the Grid-point Statistical Interpolation system. As in most EnKF applications, we found it necessary to employ a combination of multiplicative and additive inflations, to compensate for sampling and modeling errors, respectively and, to maintain the small-member ensemble solution close to the variational solution; we also found it necessary to re-center the members of the ensemble about the variational analysis. During tuning of the filter we have found re-centering and additive inflation to play a considerably larger role than expected, particularly in a dual-resolution context when the variational analysis is ran at larger resolution than the ensemble. This led us to consider a hybrid strategy in which the members of the ensemble are generated by simply converting the variational analysis to the resolution of the ensemble and applying additive inflation, thus bypassing the EnKF. Comparisons of this, so-called, filter-free hybrid procedure with an EnKF-based hybrid procedure and a control non-hybrid, traditional, scheme show both hybrid strategies to provide equally significant improvement over the control; more interestingly, the filter-free procedure was found to give qualitatively similar results to the EnKF-based procedure.

Fluorescence lifetime is a powerful contrast mechanism for in vivo molecular imaging. In this chapter, we describe instrumentation and methods to optimally exploit lifetime contrast using a time domain fluorescence tomography system. The key features of the system are the use of point excitation in free-space using ultrashort laser pulses and non-contact detection using a gated, intensified CCD camera. The surface boundaries of the imaging volume are acquired using a photogrammetric camera integrated with the imaging system, and implemented in theoretical models of light propagation in biological tissue. The time domain data are optimally analyzed using a lifetime-based tomography approach, which is based on extracting a tomographic set of lifetimes and decay amplitudes from the long time decay portion of the time domain data. This approach improves the ability to locate in vivo targets with a resolution better than conventional optical methods. The application of time domain lifetime multiplexing and tomography are illustrated using phantoms and tumor bearing mouse model of breast adenocarcinoma. In the latter application, the time domain approach allows an improved detection of fluorescent protein signals from intact nude mice in the presence of background autofluorescence. This feature has potential applications for longitudinal pre-clinical evaluation of drug treatment response as well as to address fundamental questions related to tumor physiology and metastasis. PMID:21153381

A number of reports have been made in recent times on label-free detection of nucleic acid sequences. However, most of these studies deal with ensemble measurements, therefore lacking in molecular level resolution. These assays have usually employed ssDNA sensor probes, and often suffered from problems of irreproducibility and poor sequence-selectivity. Herein, the applicability of surface-anchored single stranded locked nucleic acid (ssLNA) probes has been assessed in the detection of target DNA sequences, as an alternative to the DNA-based assay. Importantly, the effectiveness of the LNA-based assay in identifying different types of single nucleobase mismatches has been tested. Since the duplex melting temperature is an indicator of duplex stability, the ensemble on-surface Tm values of the surface-confined LNA-DNA duplexes have been compared to the duplex unbinding force values obtained from atomic force spectroscopy (AFS) experiments. A common mismatch discrimination pattern elicited by both the ensemble and the molecular level AFS approach could be identified. Apart from quantitative delineation of the different types of mismatches, the label-free AFS analysis confirms different degrees of efficiency of the purine and pyrimidine bases, present on the LNA backbone, in discriminating different nucleobase mismatch types. Importantly, the LNA-based AFS analysis can distinguish between the disease-relevant gene fragments, e.g., multidrug-resistant Mycobacterium tuberculosis (MTB) mutation, and the wild type. Since LNA probes are nuclease-resistant, these findings could potentially pave way to diagnostic applications of the LNA-based AFS assay. PMID:27124266

A method for estimating gas permeability through a zeolite membrane, using a molecular simulation technique and a theoretical permeation model, is presented. The estimate of permeability is derived from a combination of an absorption isotherm and self-diffusion coefficient based on the adsorption-diffusion model. The adsorption isotherm and self-diffusion coefficients needed for the estimation were calculated using conventional Monte Carlo and molecular dynamics simulations. The calculated self-diffusion coefficient was converted to the mutual diffusion coefficient and the permeability estimated using the Fickian equation. The method was applied to the prediction of permeabilities of methane and ethylene in silicalite at 301 K. Calculated permeabilities were larger than the experimental values by more than an order of magnitude. However, the anisotropic permeability was consistent with the experimental data and the results obtained using a grand canonical ensemblemolecular dynamics technique (Pohl et al., Mol.Phys. 1996, 89(6), 1725--1731).

Most genomic variants associated with phenotypic traits or disease do not fall within gene coding regions, but in regulatory regions, rendering their interpretation difficult. We collected public data on epigenetic marks and transcription factor binding in human cell types and used it to construct an intuitive summary of regulatory regions in the human genome. We verified it against independent assays for sensitivity. The Ensembl Regulatory Build will be progressively enriched when more data is made available. It is freely available on the Ensembl browser, from the Ensembl Regulation MySQL database server and in a dedicated track hub. PMID:25887522

A new water-soluble cycloruthenated complex Ru(bthiq)(dcbpy)2+ (1, Hbthiq = 1-(2-benzo[b]thiophenyl)isoquinoline, dcbpy = 4,4‧-dicarboxylate-2,2‧-bipyridine) was designed and synthesized to form its mercuric ensemble (1-Hg2+) to achieve visual detection of iodide anions. The binding constant of 1-Hg2+ is calculated to be 2.40 × 104 M-1, which is lower than that of HgI2. Therefore, the addition of I- to the aqueous solution of 1-Hg2+lead to significant color changes from yellow to deep-red by the release of 1. The results showed that iodide anions could be easily detected by the naked eyes. The detection limit of iodide anion is calculated as 0.77 μM. In addition, an easily-prepared test strip of 1-Hg2+ was obtained successfully to detect iodide anions.

A new approach is presented for data assimilation using the ensemble adjustment Kalman filter (EAKF) technique for surface measurements of carbon monoxide in a single tracer version of the community air quality model. An implementation of the EAKF known as the Data Assimilation Research Testbed at the National Center for Atmospheric Research was used for developing the model. Three different sets of numerical experiments were performed to test the effectiveness of the procedure and the range of key parameters used in implementing the procedure. The model domain includes much of the northeastern United States. The first two numerical experiments use idealized measurements derived from defined model runs, and the last test uses measurements of carbon monoxide from approximately 220 Air Quality System monitoring sites over the northeastern United States, maintained by the U.S. Environmental Protection Agency. In each case, the proposed method provided better results than the method without data assimilation.

Upper-Level Lows (ULL-s) are closed; cyclonically circulating eddies isolated from the main western stream in the middle and upper troposphere. They are also sometimes called "cold drops" because the air within an Upper Level low is colder than in its surroundings. The cold air within usually does not show up on the surface, meaning the vertical temperature gradient is high, which in turn causes instability and heavy storms, especially during the summer. An ULL-s diameter is about a couple hundred km-s, so it looks like a miniature cyclone. Our former studies focused mainly on the cold drops' statistics and meteorology, as well as a few case studies. Since ULL's occur rarely, we developed a new ULL-recognition process to increase the number of samples available. In our current studies first of all, we gathered 150 days when cold drops occurred in the past 15 years. 6 different meteorological parameters - 500 hPa height, 500 hPa temperature, temperature advection, 300 hPa wind speed, potential temperature of the 2 potential vorticity unit and isentropic potential vorticity of 315 K potential temperature level were investigated in our studied. Interactions of these variables were deeply investigated. In all cases of above mentioned ULLs. Predictability of the intensity and geographical position of the ULLs were made both in deterministic and ensemble models. For supporting operational activity in the Hungarian Meteorological Service a new ensemble plume containing 500 hPa temperature, potential temperature of 300 hPa, potential isentropic temperature at 315 K level and 300 hPa windspeed was developed.

Bacteriophage virion proteins and non-virion proteins have distinct functions in biological processes, such as specificity determination for host bacteria, bacteriophage replication and transcription. Accurate identification of bacteriophage virion proteins from bacteriophage protein sequences is significant to understand the complex virulence mechanism in host bacteria and the influence of bacteriophages on the development of antibacterial drugs. In this study, an ensemble method for bacteriophage virion protein prediction from bacteriophage protein sequences is put forward with hybrid feature spaces incorporating CTD (composition, transition and distribution), bi-profile Bayes, PseAAC (pseudo-amino acid composition) and PSSM (position-specific scoring matrix). When performing on the training dataset 10-fold cross-validation, the presented method achieves a satisfactory prediction result with a sensitivity of 0.870, a specificity of 0.830, an accuracy of 0.850 and Matthew’s correlation coefficient (MCC) of 0.701, respectively. To evaluate the prediction performance objectively, an independent testing dataset is used to evaluate the proposed method. Encouragingly, our proposed method performs better than previous studies with a sensitivity of 0.853, a specificity of 0.815, an accuracy of 0.831 and MCC of 0.662 on the independent testing dataset. These results suggest that the proposed method can be a potential candidate for bacteriophage virion protein prediction, which may provide a useful tool to find novel antibacterial drugs and to understand the relationship between bacteriophage and host bacteria. For the convenience of the vast majority of experimental scientists, a user-friendly and publicly-accessible web-server for the proposed ensemble method is established. PMID:26370987

A system for decision tree ensembles that includes a module to read the data, a module to sort the data, a module to evaluate a potential split of the data according to some criterion using a random sample of the data, a module to split the data, and a module to combine multiple decision trees in ensembles. The decision tree method is based on statistical sampling techniques and includes the steps of reading the data; sorting the data; evaluating a potential split according to some criterion using a random sample of the data, splitting the data, and combining multiple decision trees in ensembles.

Cysteine (Cys) and histidine (His) both play indispensable roles in many important biological activities. An enhanced Cys level can result in Alzheimer's and cardiovascular diseases. Likewise, His plays a significant role in the growth and repair of tissues as well as in controlling the transmission of metal elements in biological bases. Therefore, it is meaningful to detect Cys and His simultaneously. In this work, a novel terbium (III) coordination polymer-Cu (II) ensemble (Tb(3+)/GMP-Cu(2+)) was proposed. Guanosine monophosphate (GMP) can self-assemble with Tb(3+) to form a supramolecular Tb(3+) coordination polymer (Tb(3+)/GMP), which can be suited as a time-resolved probe. The fluorescence of Tb(3+)/GMP would be quenched upon the addition of Cu(2+), and then the fluorescence of the as-prepared Tb(3+)/GMP-Cu(2+) ensemble would be restored again in the presence of Cys or His. By incorporating N-Ethylmaleimide and Ni(2+) as masking agents, Tb(3+)/GMP-Cu(2+) was further exploited as an integrated logic system and a specific time-resolved fluorescent "turn-on" assay for simultaneously sensing His and Cys was designed. Meanwhile it can also be used in plasma samples, showing great potential to meet the need of practical application. PMID:27343597

Ensemble Data Mining Methods, also known as Committee Methods or Model Combiners, are machine learning methods that leverage the power of multiple models to achieve better prediction accuracy than any of the individual models could on their own. The basic goal when designing an ensemble is the same as when establishing a committee of people: each member of the committee should be as competent as possible, but the members should be complementary to one another. If the members are not complementary, Le., if they always agree, then the committee is unnecessary---any one member is sufficient. If the members are complementary, then when one or a few members make an error, the probability is high that the remaining members can correct this error. Research in ensemble methods has largely revolved around designing ensembles consisting of competent yet complementary models.

We describe different Bayesian ensemble refinement methods, examine their interrelation, and discuss their practical application. With ensemble refinement, the properties of dynamic and partially disordered (bio)molecular structures can be characterized by integrating a wide range of experimental data, including measurements of ensemble-averaged observables. We start from a Bayesian formulation in which the posterior is a functional that ranks different configuration space distributions. By maximizing this posterior, we derive an optimal Bayesian ensemble distribution. For discrete configurations, this optimal distribution is identical to that obtained by the maximum entropy "ensemble refinement of SAXS" (EROS) formulation. Bayesian replica ensemble refinement enhances the sampling of relevant configurations by imposing restraints on averages of observables in coupled replica molecular dynamics simulations. We show that the strength of the restraints should scale linearly with the number of replicas to ensure convergence to the optimal Bayesian result in the limit of infinitely many replicas. In the "Bayesian inference of ensembles" method, we combine the replica and EROS approaches to accelerate the convergence. An adaptive algorithm can be used to sample directly from the optimal ensemble, without replicas. We discuss the incorporation of single-molecule measurements and dynamic observables such as relaxation parameters. The theoretical analysis of different Bayesian ensemble refinement approaches provides a basis for practical applications and a starting point for further investigations. PMID:26723635

We describe different Bayesian ensemble refinement methods, examine their interrelation, and discuss their practical application. With ensemble refinement, the properties of dynamic and partially disordered (bio)molecular structures can be characterized by integrating a wide range of experimental data, including measurements of ensemble-averaged observables. We start from a Bayesian formulation in which the posterior is a functional that ranks different configuration space distributions. By maximizing this posterior, we derive an optimal Bayesian ensemble distribution. For discrete configurations, this optimal distribution is identical to that obtained by the maximum entropy "ensemble refinement of SAXS" (EROS) formulation. Bayesian replica ensemble refinement enhances the sampling of relevant configurations by imposing restraints on averages of observables in coupled replica molecular dynamics simulations. We show that the strength of the restraints should scale linearly with the number of replicas to ensure convergence to the optimal Bayesian result in the limit of infinitely many replicas. In the "Bayesian inference of ensembles" method, we combine the replica and EROS approaches to accelerate the convergence. An adaptive algorithm can be used to sample directly from the optimal ensemble, without replicas. We discuss the incorporation of single-molecule measurements and dynamic observables such as relaxation parameters. The theoretical analysis of different Bayesian ensemble refinement approaches provides a basis for practical applications and a starting point for further investigations.

In this study, a time-delay estimation method based on Ensemble Local Mean Decomposition (ELMD) method and high-order ambiguity function (HAF) is proposed for locating natural gas pipeline leaks. The leakage signals were decomposed using ELMD, and numerous production functions (PFs) were obtained. An adaptive selection method based on Kullback-Leibler (K-L) divergence was proposed to process these PF components and choose the characteristic PFs that contain most of the leakage information. The HAF was employed to analyze the instantaneous parameters of the characteristic PFs and calculate the difference in arrival time of characteristic frequencies. From the time difference and the signal propagation speed, the natural gas pipeline leakage location can be determined. The experiment results show that the proposed method can locate leaks with higher accuracy than cross-correlation method.

The design of the prototype protective ensemble was finalized. Prototype ensembles were fabricated and then subjected to a series of qualification tests which were based upon the protective ensemble performance standards PEPS requirements. Engineering drawings and purchase specifications were prepared for the new protective ensemble.

Over the last decade, many studies demonstrated that spatial information on the distributed physiogeographical characteristics and hydrological responses of rivers basins can be gained from remote sensing observations. Moreover, the onset of new satellite constellations and technologies enables the supply and processing of multi-mission satellite data at a temporal frequency that starts to become compatible with operational water resources management requirements. Nonetheless a time continuity that is crucial in monitoring applications cannot be obtained by the sole use of remote sensing observations. The information that may be extracted from discrete Earth observation data has to be used as time-varying state or flux data in flood forecasting systems. In this framework, the near all-weather, 24 hours capabilities of imaging radars overcome the limitations in collecting data during flood events, related to the sensors operating in the visible and thermal portions of the electromagnetic spectrum, making this technique very suitable for the spatial characterization of floods. Moreover, through the integration of radar imagery of flood events with high precision digital elevation models, distributed inundation depths with associated uncertainty are extracted from remote sensing observations. This paper focuses on the sequential assimilation of SAR-derived water stages into a modelling sequence where the output of hydrologic models (rainfall-runoff models) serves as input of 1-D hydraulic models and investigates the reliability and usefulness of a systematic remote sensing of floods for operational forecasting studies. A thorough statistical analysis of both remotely sensing-derived and simulated water stages represents a prerequisite for performing such assimilation studies. By using perturbed model parameters, initial conditions and meteorological forcings, an ensemble of hydraulic model applications is generated. The methodology consists of adjusting the water

This paper proposes a novel approach for improving the accuracy of statistical prediction methods in spatially normalized analysis. This is achieved by incorporating registration uncertainty into an ensemble learning scheme. A probabilistic registration method is used to estimate a distribution of probable mappings between subject and atlas space. This allows the estimation of the distribution of spatially normalized feature data, e.g., grey matter probability maps. From this distribution, samples are drawn for use as training examples. This allows the creation of multiple predictors, which are subsequently combined using an ensemble learning approach. Furthermore, extra testing samples can be generated to measure the uncertainty of prediction. This is applied to separating subjects with Alzheimer's disease from normal controls using a linear support vector machine on a region of interest in magnetic resonance images of the brain. We show that our proposed method leads to an improvement in discrimination using voxel-based morphometry and deformation tensor-based morphometry over bootstrap aggregating, a common ensemble learning framework. The proposed approach also generates more reasonable soft-classification predictions than bootstrap aggregating. We expect that this approach could be applied to other statistical prediction tasks where registration is important. PMID:23288332

The publications in macro-molecularly imprinted polymers have increased drastically in recent years with the development of water-based polymer systems. The macroporous structure of cryogels has allowed the use of these materials within different applications, particularly in affinity purification and molecular imprinting based methods. Due to their high selectivity, specificity, efficient mass transfer and good reproducibility, molecularly imprinted cryogels (MICs) have become attractive for researchers in the separation and purification of proteins. In this review, the recent developments in affinity based cryogels and molecularly imprinted cryogels in protein purification are reviewed comprehensively. PMID:26454622

The sustainability of future water resources is of paramount importance and is affected by many factors, including population, wealth and climate. Inherent in current methods to estimate these factors in the future is the uncertainty of their prediction. In this study, we integrate a large ensemble of scenarios—internally consistent across economics, emissions, climate, and population—to develop a risk portfolio of water stress over a large portion of Asia that includes China, India, and Mainland Southeast Asia in a future with unconstrained emissions. We isolate the effects of socioeconomic growth from the effects of climate change in order to identify themore » primary drivers of stress on water resources. We find that water needs related to socioeconomic changes, which are currently small, are likely to increase considerably in the future, often overshadowing the effect of climate change on levels of water stress. As a result, there is a high risk of severe water stress in densely populated watersheds by 2050, compared to recent history. There is strong evidence to suggest that, in the absence of autonomous adaptation or societal response, a much larger portion of the region’s population will live in water-stressed regions in the near future. Lastly, tools and studies such as these can effectively investigate large-scale system sensitivities and can be useful in engaging and informing decision makers.« less

Characterizing precipitation seasonality and variability in the face of future uncertainty is important for a well-informed climate change adaptation strategy. Using the Colwell index of predictability and monthly normalized precipitation data from the Coupled Model Intercomparison Project Phase 5 (CMIP5) multi-model ensembles, this study identifies spatial hotspots of changes in precipitation predictability in the United States under various climate scenarios. Over the historic period (1950–2005), the recurrent pattern of precipitation is highly predictable in the East and along the coastal Northwest, and is less so in the arid Southwest. Comparing the future (2040–2095) to the historic period, larger changes in precipitation predictability are observed under Representative Concentration Pathways (RCP) 8.5 than those under RCP 4.5. Finally, there are region-specific hotspots of future changes in precipitation predictability, and these hotspots often coincide with regions of little projected change in total precipitation, with exceptions along the wetter East and parts of the drier central West. Therefore, decision-makers are advised to not rely on future total precipitation as an indicator of water resources. Changes in precipitation predictability and the subsequent changes on seasonality and variability are equally, if not more, important factors to be included in future regional environmental assessment.

The sustainability of future water resources is of paramount importance and is affected by many factors, including population, wealth and climate. Inherent in current methods to estimate these factors in the future is the uncertainty of their prediction. In this study, we integrate a large ensemble of scenarios--internally consistent across economics, emissions, climate, and population--to develop a risk portfolio of water stress over a large portion of Asia that includes China, India, and Mainland Southeast Asia in a future with unconstrained emissions. We isolate the effects of socioeconomic growth from the effects of climate change in order to identify the primary drivers of stress on water resources. We find that water needs related to socioeconomic changes, which are currently small, are likely to increase considerably in the future, often overshadowing the effect of climate change on levels of water stress. As a result, there is a high risk of severe water stress in densely populated watersheds by 2050, compared to recent history. There is strong evidence to suggest that, in the absence of autonomous adaptation or societal response, a much larger portion of the region's population will live in water-stressed regions in the near future. Tools and studies such as these can effectively investigate large-scale system sensitivities and can be useful in engaging and informing decision makers. PMID:27028871

Abstract The potential of using a dynamical-statistical method for long-lead drought prediction was investigated. In particular, the APEC Climate Center one-tier multimodel ensemble (MME) was downscaled for predicting the standardized precipitation evapotranspiration index (SPEI) over 60 stations in South Korea. SPEI depends on both precipitation and temperature, and can incorporate the effect of global warming on the balance between precipitation and evapotranspiration. It was found that the one-tier MME has difficulty in capturing the local temperature and rainfall variations over extratropical land areas, and has no skill in predicting SPEI during boreal winter and spring. On the other hand, temperature and precipitation predictions were substantially improved in the downscaled MME. In conjunction with variance inflation, downscaled MME can give reasonably skillful 6 month-lead forecasts of SPEI for the winter to spring period. Our results could lead to more reliable hydrological extreme predictions for policymakers and stakeholders in the water management sector, and for better mitigation and climate adaptations.

The sustainability of future water resources is of paramount importance and is affected by many factors, including population, wealth and climate. Inherent in current methods to estimate these factors in the future is the uncertainty of their prediction. In this study, we integrate a large ensemble of scenarios—internally consistent across economics, emissions, climate, and population—to develop a risk portfolio of water stress over a large portion of Asia that includes China, India, and Mainland Southeast Asia in a future with unconstrained emissions. We isolate the effects of socioeconomic growth from the effects of climate change in order to identify the primary drivers of stress on water resources. We find that water needs related to socioeconomic changes, which are currently small, are likely to increase considerably in the future, often overshadowing the effect of climate change on levels of water stress. As a result, there is a high risk of severe water stress in densely populated watersheds by 2050, compared to recent history. There is strong evidence to suggest that, in the absence of autonomous adaptation or societal response, a much larger portion of the region’s population will live in water-stressed regions in the near future. Tools and studies such as these can effectively investigate large-scale system sensitivities and can be useful in engaging and informing decision makers. PMID:27028871

Characterizing precipitation seasonality and variability in the face of future uncertainty is important for a well-informed climate change adaptation strategy. Using the Colwell index of predictability and monthly normalized precipitation data from the Coupled Model Intercomparison Project Phase 5 (CMIP5) multi-model ensembles, this study identifies spatial hotspots of changes in precipitation predictability in the United States under various climate scenarios. Over the historic period (1950–2005), the recurrent pattern of precipitation is highly predictable in the East and along the coastal Northwest, and is less so in the arid Southwest. Comparing the future (2040–2095) to the historic period, larger changes in precipitation predictability are observed under Representative Concentration Pathways (RCP) 8.5 than those under RCP 4.5. Finally, there are region-specific hotspots of future changes in precipitation predictability, and these hotspots often coincide with regions of little projected change in total precipitation, with exceptions along the wetter East and parts of the drier central West. Therefore, decision-makers are advised to not rely on future total precipitation as an indicator of water resources. Changes in precipitation predictability and the subsequent changes on seasonality and variability are equally, if not more, important factors to be included in future regional environmental assessment. PMID:27425819

Characterizing precipitation seasonality and variability in the face of future uncertainty is important for a well-informed climate change adaptation strategy. Using the Colwell index of predictability and monthly normalized precipitation data from the Coupled Model Intercomparison Project Phase 5 (CMIP5) multi-model ensembles, this study identifies spatial hotspots of changes in precipitation predictability in the United States under various climate scenarios. Over the historic period (1950-2005), the recurrent pattern of precipitation is highly predictable in the East and along the coastal Northwest, and is less so in the arid Southwest. Comparing the future (2040-2095) to the historic period, larger changes in precipitation predictability are observed under Representative Concentration Pathways (RCP) 8.5 than those under RCP 4.5. Finally, there are region-specific hotspots of future changes in precipitation predictability, and these hotspots often coincide with regions of little projected change in total precipitation, with exceptions along the wetter East and parts of the drier central West. Therefore, decision-makers are advised to not rely on future total precipitation as an indicator of water resources. Changes in precipitation predictability and the subsequent changes on seasonality and variability are equally, if not more, important factors to be included in future regional environmental assessment. PMID:27425819

Given the changing climate, advance information on hydrological extremes such as droughts will help in planning for disaster mitigation and facilitate better decision making for water availability management. A deficit of precipitation for long-term time scales beyond 6 months has impacts on the hydrological sectors such as ground water, streamflow, and reservoir storage. The potential of using a dynamical-statistical method for long-lead drought prediction was investigated. In particular, the APEC Climate Center (APCC) 1-Tier multi-model ensemble (MME) was downscaled for predicting the standardized precipitation evapotranspiration index (SPEI) over 60 stations in South Korea. SPEI depends on both of precipitation and temperature, and can incorporate the impact of global warming on the balance between precipitation and evapotranspiration. It was found that 1-Tier MME has difficulties in capturing the local temperature and rainfall variations over extratropical land areas, and has no skill in predicting SPEI during boreal winter and spring. On the other hand, temperature and precipitation predictions were substantially improved in the downscaled MME (DMME). In conjunction with variance inflation, DMME can give reasonably skillful six-month-lead forecasts of SPEI for the winter-to-spring period. The results could potentially improve hydrological extreme predictions using meteorological forecasts for policymaker and stakeholders in water management sector for better climate adaption.

The paper describes a probabilistic prediction scheme of the intraseasonal oscillation of Indian summer monsoon (ISM) in the extended range (ER, ~3-4weeks) using a self-organizing map (SOM)-based technique. SOM is used to derive a set of patterns through empirical model reduction. An ensemble method of forecast is then developed for these reduced modes based on the principle of analogue prediction. A total of 900 ensembles is created based on the variations of one of the parameters, like length of the observation sample, number of patterns, number of lags, and number of input variables, keeping the others constant. Deterministic correlation skill at fourth pentad lead (15-20 days) from the current model is 0.47 (for development period, 1951-1999) and 0.43 (for hindcast period, 2000-2011) over the monsoon zone of India. This method effectively takes care of the stochastic uncertainties associated with a deterministic prediction scheme and provides better guidance to the user community. A large part of the uncertainty in the model's prediction skill is related to the interannual variability of the prediction skill of the active-break spells. The model has problems in forecasting the unusually long active/break spells during the monsoon season, especially during September. Forecasts from certain initial conditions are less predictable than those from others. We describe some probable mechanisms from the literature for such problems in the model. This study will provide a benchmark to evaluate dynamical models' skills in predicting the ISM in ER time scale in future.

Oil dispersed in the water column remains sheltered from wind forcing, so that an altered drift path is a key consequence of using chemical dispersants. In this study, ensemble simulations were conducted based on 7 years of simulated atmospheric and marine conditions, evaluating 2,190 hypothetical spills from each of 636 cells of a regular grid covering the inner German Bight (SE North Sea). Each simulation compares two idealized setups assuming either undispersed or fully dispersed oil. Differences are summarized in a spatial map of probabilities that chemical dispersant applications would help prevent oil pollution from entering intertidal coastal areas of the Wadden Sea. High probabilities of success overlap strongly with coastal regions between 10 m and 20 m water depth, where the use of chemical dispersants for oil spill response is a particularly contentious topic. The present study prepares the ground for a more detailed net environmental benefit analysis (NEBA) accounting also for toxic effects.

We assess hydroclimatic projections for the Murray-Darling Basin (MDB) using an ensemble of 39 Intergovernmental Panel on Climate Change AR4 climate model runs based on the A1B emissions scenario. The raw model output for precipitation, P, was adjusted using a quantile-based bias correction approach. We found that the projected change, ΔP, between two 30 year periods (2070-2099 less 1970-1999) was little affected by bias correction. The range for ΔP among models was large (˜±150 mm yr-1) with all-model run and all-model ensemble averages (4.9 and -8.1 mm yr-1) near zero, against a background climatological P of ˜500 mm yr-1. We found that the time series of actually observed annual P over the MDB was indistinguishable from that generated by a purely random process. Importantly, nearly all the model runs showed similar behavior. We used these facts to develop a new approach to understanding variability in projections of ΔP. By plotting ΔP versus the variance of the time series, we could easily identify model runs with projections for ΔP that were beyond the bounds expected from purely random variations. For the MDB, we anticipate that a purely random process could lead to differences of ±57 mm yr-1 (95% confidence) between successive 30 year periods. This is equivalent to ±11% of the climatological P and translates into variations in runoff of around ±29%. This sets a baseline for gauging modeled and/or observed changes.

In data mining, one often needs to analyze datasets with a very large number of attributes. Performing machine learning directly on such data sets is often impractical because of extensive run times, excessive complexity of the fitted model (often leading to overfitting), and the well-known "curse of dimensionality." In practice, to avoid such problems, feature selection and/or extraction are often used to reduce data dimensionality prior to the learning step. However, existing feature selection/extraction algorithms either evaluate features by their effectiveness across the entire data set or simply disregard class information altogether (e.g., principal component analysis). Furthermore, feature extraction algorithms such as principal components analysis create new features that are often meaningless to human users. In this article, we present input decimation, a method that provides "feature subsets" that are selected for their ability to discriminate among the classes. These features are subsequently used in ensembles of classifiers, yielding results superior to single classifiers, ensembles that use the full set of features, and ensemblesbased on principal component analysis on both real and synthetic datasets.

In many cases the stability of a protein has to be increased to permit its biotechnological use. Rational methods of protein stabilization based on optimizing electrostatic interactions have provided some fine successful predictions. However, the precise calculation of stabilization energies remains challenging, one reason being that the electrostatic effects on the unfolded state are often neglected. We have explored here the feasibility of incorporating Poisson-Boltzmann model electrostatic calculations performed on representations of the unfolded state as large ensembles of geometrically optimized conformations calculated using the ProtSA server. Using a data set of 80 electrostatic mutations experimentally tested in two-state proteins, the predictive performance of several such models has been compared to that of a simple one that considers an unfolded structure of non-interacting residues. The unfolded ensemble models, while showing correlation between the predicted stabilization values and the experimental ones, are worse than the simple model, suggesting that the ensembles do not capture well the energetics of the unfolded state. A more attainable goal is classifying potential mutations as either stabilizing or non-stabilizing, rather than accurately calculating their stabilization energies. To implement a fast classification method that can assist in selecting stabilizing mutations, we have used a much simpler electrostatic model based only on the native structure and have determined its precision using different stabilizing energy thresholds. The binary classifier developed finds 7 true stabilizing mutants out of every 10 proposed candidates and can be used as a robust tool to propose stabilizing mutations. PMID:26530878

This is a Matlab toolbox for investigating the application of cluster ensembles to data classification, with the objective of improving the accuracy and/or speed of clustering. The toolbox divides the cluster ensemble problem into four areas, providing functionality for each. These include, (1) synthetic data generation, (2) clustering to generate individual data partitions and similarity matrices, (3) consensus function generation and final clustering to generate ensemble data partitioning, and (4) implementation of accuracy metrics. With regard to data generation, Gaussian data of arbitrary dimension can be generated. The kcenters algorithm can then be used to generate individual data partitions by either, (a) subsampling the data and clustering each subsample, or by (b) randomly initializing the algorithm and generating a clustering for each initialization. In either case an overall similarity matrix can be computed using a consensus function operating on the individual similarity matrices. A final clustering can be performed and performance metrics are provided for evaluation purposes.

A novel approach for representing the intramolecular polarizability as a continuum dielectric is introduced to account for molecular electronic polarization. It is shown, using a finite-difference solution to the Poisson equation, that the Electronic Polarization from Internal Continuum (EPIC) model yields accurate gas-phase molecular polarizability tensors for a test set of 98 challenging molecules composed of heteroaromatics, alkanes and diatomics. The electronic polarization originates from a high intramolecular dielectric that produces polarizabilities consistent with B3LYP/aug-cc-pVTZ and experimental values when surrounded by vacuum dielectric. In contrast to other approaches to model electronic polarization, this simple model avoids the polarizability catastrophe and accurately calculates molecular anisotropy with the use of very few fitted parameters and without resorting to auxiliary sites or anisotropic atomic centers. On average, the unsigned error in the average polarizability and anisotropy compared to B3LYP are 2% and 5%, respectively. The correlation between the polarizability components from B3LYP and this approach lead to a R2 of 0.990 and a slope of 0.999. Even the F2 anisotropy, shown to be a difficult case for existing polarizability models, can be reproduced within 2% error. In addition to providing new parameters for a rapid method directly applicable to the calculation of polarizabilities, this work extends the widely used Poisson equation to areas where accurate molecular polarizabilities matter. PMID:23646034

Biological systems are characterized by a large number of diverse interactions. Interaction maps have been used to abstract those interactions at all biological scales ranging from food webs at the ecosystem level down to protein interaction networks at the molecular scale.

discharge is more affected by parameters from the whole upstream drainage area. Understanding model output variance behavior will have a direct impact on the design and performance of the ensemble-based data assimilation platform, for which uncertainties are also modeled by variances. It will help to select more objectively RRM parameters to correct.

Accurate estimation of turbulent heat fluxes is important for water resources planning and management, irrigation scheduling, and weather forecast. Land surface models (LSMs) can be used to simulate turbulent heat fluxes over large-scale domains. However, the application of LSMs is hindered due to the high uncertainty in model parameters and state variables. In this study, a dual-pass ensemble-based data assimilation (DA) approach is developed to estimate turbulent heat fluxes. Initially, the common land model (CoLM) is used as the LSM (open-loop), and thereafter the ensemble Kalman filter is employed to optimize the CoLM parameters and variables. The first pass of the DA scheme optimizes vegetation parameters of CoLM (which are related to the leaf stomatal conductance) on a weekly-basis by assimilating the MODIS land surface temperature (LST) data. The second pass optimizes the soil moisture state of CoLM on a daily-basis by assimilating soil moisture observations from Cosmic-ray instrument. The ultimate goal is to improve turbulent heat fluxes estimates from CoLM by optimizing its vegetation parameters and soil moisture state via assimilation of LST and soil moisture data into the proposed DA system. The DA approach is tested over a wet and densely vegetated site, called Daman in northwest of China. Results indicate that the CoLM (open-loop) model typically underestimates latent heat flux and overestimates sensible heat flux. By assimilation of LST in the first pass, the turbulent heat fluxes are improved compared to those of the open-loop. These fluxes become even more accurate by assimilation of soil moisture in the second pass of the DA approach. These findings illustrate that the introduced DA approach can successfully extract information in LST and soil moisture data to optimize the CoLM parameters and states and improve the turbulent heat fluxes estimates.

Neuronal ensembles are coactive groups of neurons that may represent building blocks of cortical circuits. These ensembles could be formed by Hebbian plasticity, whereby synapses between coactive neurons are strengthened. Here we report that repetitive activation with two-photon optogenetics of neuronal populations from ensembles in the visual cortex of awake mice builds neuronal ensembles that recur spontaneously after being imprinted and do not disrupt preexisting ones. Moreover, imprinted ensembles can be recalled by single- cell stimulation and remain coactive on consecutive days. Our results demonstrate the persistent reconfiguration of cortical circuits by two-photon optogenetics into neuronal ensembles that can perform pattern completion. PMID:27516599

We describe a molecular automaton, called MAYA, which encodes a version of the game of tic-tac-toe and interactively competes against a human opponent. The automaton is a Boolean network of deoxyribozymes that incorporates 23 molecular-scale logic gates and one constitutively active deoxyribozyme arrayed in nine wells (3x3) corresponding to the game board. To make a move, MAYA carries out an analysis of the input oligonucleotide keyed to a particular move by the human opponent and indicates a move by fluorescence signaling in a response well. The cycle of human player input and automaton response continues until there is a draw or a victory for the automaton. The automaton cannot be defeated because it implements a perfect strategy. PMID:12923549

The calculation of thermochemical data requires accurate molecular energies and heat capacities. Traditional methods rely upon the standard harmonic normal mode analysis to calculate the vibrational and rotational contributions. We utilize path integral Monte Carlo (PIMC) for going beyond the harmonic analysis, to calculate the vibrational and rotational contributions to ab initio energies. This is an application and extension of a method previously developed in our group.

Precipitation is one of the most difficult weather variables to predict in hydrometeorological applications. In order to assess the uncertainty inherent in deterministic numerical weather prediction (NWP), meteorological services around the globe develop ensemble prediction systems (EPS) based on high-resolution NWP systems. With non-hydrostatic model dynamics and without parameterization of deep moist convection, high-resolution NWP models are able to describe convective processes in more detail and provide more realistic mesoscale structures. However, precipitation forecasts are still affected by displacement errors, systematic biases and fast error growth on small scales. Probabilistic guidance can be achieved from an ensemble setup which accounts for model error and uncertainty of initial and boundary conditions. The German Meteorological Service (Deutscher Wetterdienst, DWD) provides such an ensemble system based on the German-focused limited-area model COSMO-DE. With a horizontal grid-spacing of 2.8 km, COSMO-DE is the convection-permitting high-resolution part of the operational model chain at DWD. The COSMO-DE-EPS consists of 20 realizations of COSMO-DE, driven by initial and boundary conditions derived from 4 global models and 5 perturbations of model physics. Ensemble systems like COSMO-DE-EPS are often limited with respect to ensemble size due to the immense computational costs. As a consequence, they can be biased and exhibit insufficient ensemble spread, and probabilistic forecasts may be not well calibrated. In this study, probabilistic quantitative precipitation forecasts are derived from COSMO-DE-EPS and evaluated at more than 1000 rain gauges located all over Germany. COSMO-DE-EPS is a frequently updated ensemble system, initialized 8 times a day. We use the time-lagged approach to inexpensively increase ensemble spread, which results in more reliable forecasts especially for extreme precipitation events. Moreover, we will show that statistical

This paper is the first part in a series of two articles and presents a data-driven wildfire simulator for forecasting wildfire spread scenarios, at a reduced computational cost that is consistent with operational systems. The prototype simulator features the following components: a level-set-based fire propagation solver FIREFLY that adopts a regional-scale modeling viewpoint, treats wildfires as surface propagating fronts, and uses a description of the local rate of fire spread (ROS) as a function of environmental conditions based on Rothermel's model; a series of airborne-like observations of the fire front positions; and a data assimilation algorithm based on an ensemble Kalman filter (EnKF) for parameter estimation. This stochastic algorithm partly accounts for the non-linearities between the input parameters of the semi-empirical ROS model and the fire front position, and is sequentially applied to provide a spatially-uniform correction to wind and biomass fuel parameters as observations become available. A wildfire spread simulator combined with an ensemble-based data assimilation algorithm is therefore a promising approach to reduce uncertainties in the forecast position of the fire front and to introduce a paradigm-shift in the wildfire emergency response. In order to reduce the computational cost of the EnKF algorithm, a surrogate model based on a polynomial chaos (PC) expansion is used in place of the forward model FIREFLY in the resulting hybrid PC-EnKF algorithm. The performance of EnKF and PC-EnKF is assessed on synthetically-generated simple configurations of fire spread to provide valuable information and insight on the benefits of the PC-EnKF approach as well as on a controlled grassland fire experiment. The results indicate that the proposed PC-EnKF algorithm features similar performance to the standard EnKF algorithm, but at a much reduced computational cost. In particular, the re-analysis and forecast skills of data assimilation strongly relate

We present a simulation and data analysis technique to investigate first-order phase transitions and the associated transition barriers. The simulation technique is based on the real microcanonical ensemble where the sum of kinetic and potential energy is kept constant. The method is tested for the droplet condensation-evaporation transition in a Lennard-Jones system with up to 2048 particles at fixed density, using simple Metropolis-like sampling combined with a replica-exchange scheme. Our investigation of the microcanonical ensemble properties reveals that the associated transition barrier is significantly lower than in the canonical counterpart. Along the line of investigating the microcanonical ensemble behavior, we develop a framework for general ensemble evaluations. This framework is based on a clear separation between system-related and ensemble-related properties, which can be exploited to specifically tailor artificial ensembles suitable for first-order phase transitions.

We present a simulation and data analysis technique to investigate first-order phase transitions and the associated transition barriers. The simulation technique is based on the real microcanonical ensemble where the sum of kinetic and potential energy is kept constant. The method is tested for the droplet condensation-evaporation transition in a Lennard-Jones system with up to 2048 particles at fixed density, using simple Metropolis-like sampling combined with a replica-exchange scheme. Our investigation of the microcanonical ensemble properties reveals that the associated transition barrier is significantly lower than in the canonical counterpart. Along the line of investigating the microcanonical ensemble behavior, we develop a framework for general ensemble evaluations. This framework is based on a clear separation between system-related and ensemble-related properties, which can be exploited to specifically tailor artificial ensembles suitable for first-order phase transitions. PMID:27627238

Discusses topics essential to good classroom management for ensemble music teachers. Explores the importance of planning and preparation, good teaching practice within the classroom, and using an effective discipline plan to deal with any behavior problems in the classroom. Includes a bibliography of further resources. (CMK)

Protective garment ensemble with internally-mounted environmental- control unit contains its own air supply. Alternatively, a remote-environmental control unit or an air line is attached at the umbilical quick disconnect. Unit uses liquid air that is vaporized to provide both breathing air and cooling. Totally enclosed garment protects against toxic substances.

Carbon molecular sieves are used extensively in gas chromatography for the separation of permanent gases and light hydrocarbons. Carbon molecular sieves also find commercial application for the manufacture of pure hydrogen from hydrogen-rich gases such as coke-oven gas, and for the separation of air by the pressure-swing adsorption technique. The objective of this investigation was to prepare carbons from Maghara coal, recently available on the commercial market. Coal-based carbons, if they possess molecular sieve properties, are superior to molecular sieve carbons from agricultural by-products because they have more satisfactory mechanical properties.

This paper is the first part in a series of two articles and presents a data-driven wildfire simulator for forecasting wildfire spread scenarios, at a reduced computational cost that is consistent with operational systems. The prototype simulator features the following components: an Eulerian front propagation solver FIREFLY that adopts a regional-scale modeling viewpoint, treats wildfires as surface propagating fronts, and uses a description of the local rate of fire spread (ROS) as a function of environmental conditions based on Rothermel's model; a series of airborne-like observations of the fire front positions; and a data assimilation (DA) algorithm based on an ensemble Kalman filter (EnKF) for parameter estimation. This stochastic algorithm partly accounts for the nonlinearities between the input parameters of the semi-empirical ROS model and the fire front position, and is sequentially applied to provide a spatially uniform correction to wind and biomass fuel parameters as observations become available. A wildfire spread simulator combined with an ensemble-based DA algorithm is therefore a promising approach to reduce uncertainties in the forecast position of the fire front and to introduce a paradigm-shift in the wildfire emergency response. In order to reduce the computational cost of the EnKF algorithm, a surrogate model based on a polynomial chaos (PC) expansion is used in place of the forward model FIREFLY in the resulting hybrid PC-EnKF algorithm. The performance of EnKF and PC-EnKF is assessed on synthetically generated simple configurations of fire spread to provide valuable information and insight on the benefits of the PC-EnKF approach, as well as on a controlled grassland fire experiment. The results indicate that the proposed PC-EnKF algorithm features similar performance to the standard EnKF algorithm, but at a much reduced computational cost. In particular, the re-analysis and forecast skills of DA strongly relate to the spatial and temporal

The south peninsular part of India gets maximum amount of rainfall during the northeast monsoon (NEM) season [October to November (OND)] which is the primary source of water for the agricultural activities in this region. A nonlinear method viz., Extreme learning machine (ELM) has been employed on general circulation model (GCM) products to make the multi-model ensemble (MME) based estimation of NEM rainfall (NEMR). The ELM is basically is an improved learning algorithm for the single feed-forward neural network (SLFN) architecture. The 27 year (1982-2008) lead-1 (using initial conditions of September for forecasting the mean rainfall of OND) hindcast runs (1982-2008) from seven GCM has been used to make MME. The improvement of the proposed method with respect to other regular MME (simple arithmetic mean of GCMs (EM) and singular value decomposition based multiple linear regressions based MME) has been assessed through several skill metrics like Spread distribution, multiplicative bias, prediction errors, the yield of prediction, Pearson's and Kendal's correlation coefficient and Wilmort's index of agreement. The efficiency of ELM estimated rainfall is established by all the stated skill scores. The performance of ELM in extreme NEMR years, out of which 4 years are characterized by deficit rainfall and 5 years are identified as excess, is also examined. It is found that the ELM could expeditiously capture these extremes reasonably well as compared to the other MME approaches.

For most purposes the information gathered from an ensemble forecast is the ensemble mean and its uncertainty. The ensemble spread is commonly used as a measure of the uncertainty. We propose a method to assess whether the ensemble spread is a good measure of uncertainty and to bring forward an underlying spread-skill relationship. Forecasting the uncertainty should be probabilistic of nature. This implies that, if only the ensemble spread is available, a probability density function (PDF) for the uncertainty forecast must be reconstructed based on one parameter. Different models are introduced for the composition of such PDFs and evaluated for different spread-error metrics. The uncertainty forecast can then be verified based on probabilistic skill scores. For a perfectly reliable forecast the spread-error relationship is strongly heteroscedastic since the error can take a wide range of values, proportional to the ensemble spread. This makes a proper statistical assessment of the spread-skill relation intricate. However, it is shown that a logarithmic transformation of both spread and error allows for alleviating the heteroscedasticity. A linear regression analysis can then be performed to check whether the flow-dependent spread is a realistic indicator of the uncertainty and to what extent ensemble underdispersion or overdispersion depends on the ensemble spread. The methods are tested on the ensemble forecast of wind and geopotential height of the European Centre of Medium-range forecasts (ECMWF) over Europe and Africa. A comparison is also made with spread-skill analysis based on binning methods.

The performance of brain-machine interfaces (BMIs) that continuously control upper limb neuroprostheses may benefit from distinguishing periods of posture and movement so as to prevent inappropriate movement of the prosthesis. Few studies, however, have investigated how decoding behavioral states and detecting the transitions between posture and movement could be used autonomously to trigger a kinematic decoder. We recorded simultaneous neuronal ensemble and local field potential (LFP) activity from microelectrode arrays in primary motor cortex (M1) and dorsal (PMd) and ventral (PMv) premotor areas of two male rhesus monkeys performing a center-out reach-and-grasp task, while upper limb kinematics were tracked with a motion capture system with markers on the dorsal aspect of the forearm, hand, and fingers. A state decoder was trained to distinguish four behavioral states (baseline, reaction, movement, hold), while a kinematic decoder was trained to continuously decode hand end point position and 18 joint angles of the wrist and fingers. LFP amplitude most accurately predicted transition into the reaction (62%) and movement (73%) states, while spikes most accurately decoded arm, hand, and finger kinematics during movement. Using an LFP-based state decoder to trigger a spike-based kinematic decoder [r = 0.72, root mean squared error (RMSE) = 0.15] significantly improved decoding of reach-to-grasp movements from baseline to final hold, compared with either a spike-based state decoder combined with a spike-based kinematic decoder (r = 0.70, RMSE = 0.17) or a spike-based kinematic decoder alone (r = 0.67, RMSE = 0.17). Combining LFP-based state decoding with spike-based kinematic decoding may be a valuable step toward the realization of BMI control of a multifingered neuroprosthesis performing dexterous manipulation. PMID:23536714

Intrinsically disordered proteins (IDPs) are a class of proteins that do not exhibit well-defined three-dimensional structures. The absence of structure is intrinsic to their amino acid sequences, which are characterized by low hydrophobicity and high net charge per residue compared to folded proteins. Contradicting the classic structure-function paradigm, IDPs are capable of interacting with high specificity and affinity, often acquiring order in complex with protein and nucleic acid binding partners. This phenomenon is evident during cellular activities involving IDPs, which include transcriptional and translational regulation, cell cycle control, signal transduction, molecular assembly, and molecular recognition. Although approximately 30% of eukaryotic proteomes are intrinsically disordered, the nature of IDP conformational ensembles remains unclear. In this dissertation, we describe relationships connecting characteristics of IDP conformational ensembles to their primary structures and solution conditions. Using molecular simulations and fluorescence experiments on a set of base-rich IDPs, we find that net charge per residue segregates conformational ensembles along a globule-to-coil transition. Speculatively generalizing this result, we propose a phase diagram that predicts an IDP's average size and shape based on sequence composition and use it to generate hypotheses for a broad set of intrinsically disordered regions (IDRs). Simulations reveal that acid-rich IDRs, unlike their oppositely charged base-rich counterparts, exhibit disordered globular ensembles despite intra-chain repulsive electrostatic interactions. This apparent asymmetry is sensitive to simulation parameters for representing alkali and halide salt ions, suggesting that solution conditions modulate IDP conformational ensembles. We refine the ion parameters using a calibration procedure that relies exclusively on crystal lattice properties. Simulations with these parameters recover swollen

A method to simulate a dual-resolution ensemble for molecular systems is introduced. The dual-resolution system is characterized by an atomistic Hamiltonian and coarse coordinates connected by linear springs to this atomistic system. A 'dragging' update scheme based on an idea of Neal (Neal, R. M. Taking Bigger Metropolis Steps by Dragging Fast Variables; Technical Report; University of Toronto: Toronto, Canada, October, 2004; http://arxiv.org/PS_cache/math/pdf/0502/0502099v1.pdf ) is proposed. It is theoretically proven that the scheme correctly samples the dual ensemble. As a proof-of-principle we show that in an one-dimensional barrier crossing simulation, the relaxation speeds up by a factor 80. In an asymmetric two-dimensional barrier crossing problem, the speedup is a factor 20. The application to molecular simulations is discussed. PMID:26605463

The radio occultation (RO) technique using signals from the Global Navigation Satellite System (GNSS), in particular from the Global Positioning System (GPS) so far, is currently widely used to observe the atmosphere for applications such as numerical weather prediction and global climate monitoring. The ionosphere is a major error source in RO measurements at stratospheric altitudes, and a linear ionospheric correction of dual-frequency RO bending angles is commonly used to remove the first-order ionospheric effect. However, the residual ionospheric error (RIE) can still be significant so that it needs to be further mitigated for high-accuracy applications, especially above about 30 km altitude where the RIE is most relevant compared to the magnitude of the neutral atmospheric bending angle. Quantification and careful analyses for better understanding of the RIE is therefore important for enabling benchmark-quality stratospheric RO retrievals. Here we present such an analysis of bending angle RIEs covering the stratosphere and mesosphere, using quasi-realistic end-to-end simulations for a full-day ensemble of RO events. Based on the ensemble simulations we assessed the variation of bending angle RIEs, both biases and standard deviations, with solar activity, latitudinal region and with or without the assumption of ionospheric spherical symmetry and co-existing observing system errors. We find that the bending angle RIE biases in the upper stratosphere and mesosphere, and in all latitudinal zones from low to high latitudes, have a clear negative tendency and a magnitude increasing with solar activity, which is in line with recent empirical studies based on real RO data although we find smaller bias magnitudes, deserving further study in the future. The maximum RIE biases are found at low latitudes during daytime, where they amount to within -0.03 to -0.05 μrad, the smallest at high latitudes (0 to -0.01 μrad; quiet space weather and winter conditions

The radio occultation (RO) technique using signals from the Global Navigation Satellite System (GNSS), in particular from the Global Positioning System (GPS) so far, is meanwhile widely used to observe the atmosphere for applications such as numerical weather prediction and global climate monitoring. The ionosphere is a major error source in RO measurements at stratospheric altitudes and a linear ionospheric correction of dual-frequency RO bending angles is commonly used to remove the first-order ionospheric effect. However, the residual ionopheric error (RIE) can still be significant so that it needs to be further mitigated for high accuracy applications, especially above about 30 km altitude where the RIE is most relevant compared to the magnitude of the neutral atmospheric bending angle. Quantification and careful analyses for better understanding of the RIE is therefore important towards enabling benchmark-quality stratospheric RO retrievals. Here we present such an analysis of bending angle RIEs covering the stratosphere and mesosphere, using quasi-realistic end-to-end simulations for a full-day ensemble of RO events. Based on the ensemble simulations we assessed the variation of bending angle RIEs, both biases and SDs, with solar activity, latitudinal region, and with or without the assumption of ionospheric spherical symmetry and of co-existing observing system errors. We find that the bending angle RIE biases in the upper stratosphere and mesosphere, and in all latitudinal zones from low- to high-latitudes, have a clear negative tendency and a magnitude increasing with solar activity, in line with recent empirical studies based on real RO data. The maximum RIE biases are found at low latitudes during daytime, where they amount to with in -0.03 to -0.05 μrad, the smallest at high latitudes (0 to -0.01 μrad; quiet space weather and winter conditions). Ionospheric spherical symmetry or asymmetries about the RO event location have only a minor influence on

The capability of an ensemble Kalman filter (EnKF) to simultaneously estimate multiple parameters in a physically-based land surface hydrologic model using multivariate field observations is tested at a small watershed (0.08 km2). Multivariate, high temporal resolution, in situ measurements of discharge, water table depth, soil moisture, and sensible and latent heat fluxes encompassing five months of 2009 are assimilated. It is found that, for five out of the six parameters, the EnKF estimated parameter values from different test cases converge strongly, and the estimates after convergence are close to the manually calibrated parameter values. The EnKF estimated parameters and manually calibrated parameters yield similar model performance, but the EnKF sequential method significantly decreases the time and labor required for calibration. The results demonstrate that, given a limited number of multi-state, site-specific observations, an automated sequential calibration method (EnKF) can be used to optimize physically-based land surface hydrologic models.

Multiscale analysis provides an algorithm for the efficient simulation of macromolecular assemblies. This algorithm involves the coevolution of a quasiequilibrium probability density of atomic configurations and the Langevin dynamics of spatial coarse-grained variables denoted order parameters (OPs) characterizing nanoscale system features. In practice, implementation of the probability density involves the generation of constant OP ensembles of atomic configurations. Such ensembles are used to construct thermal forces and diffusion factors that mediate the stochastic OP dynamics. Generation of all-atom ensembles at every Langevin timestep is computationally expensive. Here, multiscale computation for macromolecular systems is made more efficient by a method that self-consistently folds in ensembles of all-atom configurations constructed in an earlier step, history, of the Langevin evolution. This procedure accounts for the temporal evolution of these ensembles, accurately providing thermal forces and diffusions. It is shown that efficiency and accuracy of the OP-based simulations is increased via the integration of this historical information. Accuracy improves with the square root of the number of historical timesteps included in the calculation. As a result, CPU usage can be decreased by a factor of 3-8 without loss of accuracy. The algorithm is implemented into our existing force-field based multiscale simulation platform and demonstrated via the structural dynamics of viral capsomers. PMID:22978601

Multiscale analysis provides an algorithm for the efficient simulation of macromolecular assemblies. This algorithm involves the coevolution of a quasiequilibrium probability density of atomic configurations and the Langevin dynamics of spatial coarse-grained variables denoted order parameters (OPs) characterizing nanoscale system features. In practice, implementation of the probability density involves the generation of constant OP ensembles of atomic configurations. Such ensembles are used to construct thermal forces and diffusion factors that mediate the stochastic OP dynamics. Generation of all-atom ensembles at every Langevin time step is computationally expensive. Here, multiscale computation for macromolecular systems is made more efficient by a method that self-consistently folds in ensembles of all-atom configurations constructed in an earlier step, history, of the Langevin evolution. This procedure accounts for the temporal evolution of these ensembles, accurately providing thermal forces and diffusions. It is shown that efficiency and accuracy of the OP-based simulations is increased via the integration of this historical information. Accuracy improves with the square root of the number of historical timesteps included in the calculation. As a result, CPU usage can be decreased by a factor of 3-8 without loss of accuracy. The algorithm is implemented into our existing force-field based multiscale simulation platform and demonstrated via the structural dynamics of viral capsomers. PMID:22978601

Although significant progress has been made in understanding the correlation between large-scale atmospheric circulation patterns and regional streamflow anomalies, there is a general perception that seasonal climate forecasts are not being used to the fullest extent possible for optimal water resources management. Possible contributing factors are limited knowledge and understanding of climate processes and prediction capabilities, noise in climate signals and inaccuracies in forecasts, and hesitancy on the part of water managers to apply new information or methods that could expose them to greater liability. This work involves a decision support model based on streamflow ensembles developed for the Lower Colorado River Authority in Central Texas. Predicative skill is added to ensemble forecasts that are based on climatology by conditioning the ensembles on observable climate indicators, including streamflow (persistence), soil moisture, land surface temperatures, and large-scale recurrent patterns such as the El Ni¤o-Southern Oscillation, Pacific Decadal Oscillation, and the North Atlantic Oscillation. A Bayesian procedure for updating ensemble probabilities is outlined, and various skill scores are reviewed for evaluating forecast performance. Verification of the ensemble forecasts using a resampling procedure indicates a small but potentially significant improvement in forecast skill that could be exploited in seasonal water management decisions. The ultimate goal of this work will be explicit incorporation of climate forecasts in reservoir operating rules and estimation of the value of the forecasts.

Mathematical models are powerful tools for epidemiology and can be used to compare control actions. However, different models and model parameterizations may provide different prediction of outcomes. In other fields of research, ensemble modeling has been used to combine multiple projections. We explore the possibility of applying such methods to epidemiology by adapting Bayesian techniques developed for climate forecasting. We exemplify the implementation with single model ensemblesbased on different parameterizations of the Warwick model run for the 2001 United Kingdom foot and mouth disease outbreak and compare the efficacy of different control actions. This allows us to investigate the effect that discrepancy among projections based on different modeling assumptions has on the ensemble prediction. A sensitivity analysis showed that the choice of prior can have a pronounced effect on the posterior estimates of quantities of interest, in particular for ensembles with large discrepancy among projections. However, by using a hierarchical extension of the method we show that prior sensitivity can be circumvented. We further extend the method to include a priori beliefs about different modeling assumptions and demonstrate that the effect of this can have different consequences depending on the discrepancy among projections. We propose that the method is a promising analytical tool for ensemble modeling of disease outbreaks. PMID:25927892

QCDml is an XML-based markup language designed for sharing QCD configurations and ensembles world-wide via the International Lattice Data Grid (ILDG). Based on the latest release, we present key ingredients of the QCDml in order to provide some starting points for colleagues in this community to markup valuable configurations and submit them to the ILDG.

Reconstructing phylogenies from nucleotide sequences is a challenge for students because it strongly depends on evolutionary models and computer tools that are frequently updated. We present here an inquiry-based course aimed at learning how to trace a phylogeny based on sequences existing in public databases. Computer tools are freely available…

The pioneering work of Adleman (1994) demonstrated that DNA molecules in test tubes can be manipulated to perform a certain type of mathematical computation. This has stimulated a theoretical interest in the possibility of constructing DNA-basedmolecular computers. To gauge the practicality of realizing such microscopic computers, it was thought necessary to learn as much as possible from the biology of the living cell--presently the only known DNA-basedmolecular computer in existence. Here the recently developed theoretical model of the living cell (the Bhopalator) and its associated theories (e.g. cell language), principles, laws and concepts (e.g. conformons, IDS's) are briefly reviewed and summarized in the form of a set of five laws of 'molecular semiotics' (synonyms include 'microsemiotics', 'cellular semiotics', or 'cytosemiotics') the study of signs mediating measurement, computation, and communication on the cellular and molecular levels. Hopefully, these laws will find practical applications in designing DNA-based computing systems. PMID:10636037

The Wang-Sheeley-Arge(WSA)-Enlil-cone modeling system is used for making routine arrival time forecasts of the Earth-directed "halo" coronal mass ejections (CMEs), since they typically produce the most geoeffective events. A major objective of this work is to better understand the sensitivity of the WSA-Enlil modeling results to input model parameters and how these parameters contribute to the overall model uncertainty and performance. We present ensemble modeling results for a simple halo CME event that occurred on 15 February 2011 and a succession of three halo CME events that occurred on 2-4 August 2011. During this period the Solar TErrestrial RElations Observatory (STEREO) A and B spacecraft viewed the CMEs over the solar limb, thereby providing more reliable constraints on the initial CME geometries during the manual cone fitting process. To investigate the sensitivity of the modeled CME arrival times to small variations in the input cone properties, for each CME event we create an ensemble of numerical simulations based on multiple sets of cone parameters. We find that the accuracy of the modeled arrival times not only depends on the initial input CME geometry, but also on the reliable specification of the background solar wind, which is driven by the input maps of the photospheric magnetic field. As part of the modeling ensemble, we simulate the CME events using the traditional daily updated maps as well as those that are produced by the Air Force data Assimilative Photospheric flux Transport (ADAPT) model, which provide a more instantaneous snapshot of the photospheric field distribution. For the August 2011 events, in particular, we find that the accuracy in the arrival time predictions also depends on whether the cone parameters for all three CMEs are specified in a single WSA-Enlil simulation. The inclusion/exclusion of one or two of the preceding CMEs affects the solar wind conditions through which the succeeding CME propagates.

Human DNA methyltransferase1 (hDNMT1) is responsible for preserving DNA methylation patterns that play important regulatory roles in differentiation and development. Misregulation of DNA methylation has thus been linked to many syndromes, life style diseases, and cancers. Developing specific inhibitors of hDNMT1 is an important challenge in the area since the currently targeted cofactor and substrate binding site share structural features with various proteins. In this work, we generated a structural model of the active form of hDNMT1 and identified that the 5-methylcytosine (5-mC) binding site of the hDNMT1 is structurally unique to the protein. This site has been previously demonstrated to be critical for methylation activity. We further performed multiple nanosecond time scale atomistic molecular dynamics simulations of the structural model followed by virtual screening of the Asinex database to identify inhibitors targeting the 5-mC site. Two compounds were discovered that inhibited hDNMT1 in vitro, one of which also showed inhibition in vivo corroborating the screening procedure. This study thus identifies and attempts to validate for the first time a unique site of hDNMT1 that could be harnessed for rationally designing highly selective and potent hypomethylating agents. PMID:26850820

A new methodology is proposed for the efficient determination of Green`s functions and eigenstates for quantum systems of two or more dimensions. For a given Hamiltonian, the best possible separable approximation is obtained from the set of all Hilbert space operators. It is shown that this determination itself, as well as the solution of the resultant approximation, are problems of reduced dimensionality for most systems of physical interest. Moreover, the approximate eigenstates constitute the optimal separable basis, in the sense of self-consistent field theory. These distorted waves give rise to a Born series with optimized convergence properties. Analytical results are presented for an application of the method to the two-dimensional shifted harmonic oscillator system. The primary interest however, is quantum reactive scattering in molecular systems. For numerical calculations, the use of distorted waves corresponds to numerical preconditioning. The new methodology therefore gives rise to an optimized preconditioning scheme for the efficient calculation of reactive and inelastic scattering amplitudes, especially at intermediate energies. This scheme is particularly suited to discrete variable representations (DVR`s) and iterative sparse matrix methods commonly employed in such calculations. State to state and cumulative reactive scattering results obtained via the optimized preconditioner are presented for the two-dimensional collinear H + H{sub 2} {yields} H{sub 2} + H system. Computational time and memory requirements for this system are drastically reduced in comparison with other methods, and results are obtained for previously prohibitive energy regimes.

This is a Matlab toolbox for investigating the application of cluster ensembles to data classification, with the objective of improving the accuracy and/or speed of clustering. The toolbox divides the cluster ensemble problem into four areas, providing functionality for each. These include, (1) synthetic data generation, (2) clustering to generate individual data partitions and similarity matrices, (3) consensus function generation and final clustering to generate ensemble data partitioning, and (4) implementation of accuracy metrics. Withmore » regard to data generation, Gaussian data of arbitrary dimension can be generated. The kcenters algorithm can then be used to generate individual data partitions by either, (a) subsampling the data and clustering each subsample, or by (b) randomly initializing the algorithm and generating a clustering for each initialization. In either case an overall similarity matrix can be computed using a consensus function operating on the individual similarity matrices. A final clustering can be performed and performance metrics are provided for evaluation purposes.« less

Following the heatstroke prevention guideline by the Ministry of Health, Labor, and Welfare of Japan, "safe hours" for heavy and light labor are estimated based on hourly wet-bulb globe temperature (WBGT) obtained from the three-member ensemble multi-period (the 2000s, 2030s, 2050s, 2070s, and 2090s) climate projections using dynamical downscaling approach. Our target cities are Tokyo and Osaka, Japan. The results show that most of the current climate daytime hours are "light labor safe,", but these hours are projected to decrease by 30-40 % by the end of the twenty-first century. A 60-80 % reduction is projected for heavy labor hours, resulting in less than 2 hours available for safe performance of heavy labor. The number of "heavy labor restricted days" (days with minimum daytime WBGT exceeding the safe level threshold for heavy labor) is projected to increase from ~5 days in the 2000s to nearly two-thirds of the days in August in the 2090s.

Following the heatstroke prevention guideline by the Ministry of Health, Labor, and Welfare of Japan, "safe hours" for heavy and light labor are estimated based on hourly wet-bulb globe temperature (WBGT) obtained from the three-member ensemble multi-period (the 2000s, 2030s, 2050s, 2070s, and 2090s) climate projections using dynamical downscaling approach. Our target cities are Tokyo and Osaka, Japan. The results show that most of the current climate daytime hours are "light labor safe,", but these hours are projected to decrease by 30-40% by the end of the twenty-first century. A 60-80% reduction is projected for heavy labor hours, resulting in less than 2 hours available for safe performance of heavy labor. The number of "heavy labor restricted days" (days with minimum daytime WBGT exceeding the safe level threshold for heavy labor) is projected to increase from ~5 days in the 2000s to nearly two-thirds of the days in August in the 2090s. PMID:25935576

Background Microarray is a powerful technology enabling to monitor tens of thousands of genes in a single experiment. Most microarrays are now using oligo-sets. The design of the oligo-nucleotides is time consuming and error prone. Genome wide microarray oligo-sets are designed using as large a set of transcripts as possible in order to monitor as many genes as possible. Depending on the genome sequencing state and on the assembly state the knowledge of the existing transcripts can be very different. This knowledge evolves with the different genome builds and gene builds. Once the design is done the microarrays are often used for several years. The biologists working in EADGENE expressed the need of up-to-dated annotation files for the oligo-sets they share including information about the orthologous genes of model species, the Gene Ontology, the corresponding pathways and the chromosomal location. Results The results of SigReannot on a chicken micro-array used in the EADGENE project compared to the initial annotations show that 23% of the oligo-nucleotide gene annotations were not confirmed, 2% were modified and 1% were added. The interest of this up-to-date annotation procedure is demonstrated through the analysis of real data previously published. Conclusion SigReannot uses the oligo-nucleotide design procedure criteria to validate the probe-gene link and the Ensembl transcripts as reference for annotation. It therefore produces a high quality annotation based on reference gene sets. PMID:19615116

A direct extraction method of tumor response based on ensemble empirical mode decomposition (EEMD) is proposed for early breast cancer detection by ultra-wide band (UWB) microwave imaging. With this approach, the image reconstruction for the tumor detection can be realized with only extracted signals from as-detected waveforms. The calibration process executed in the previous research for obtaining reference waveforms which stand for signals detected from the tumor-free model is not required. The correctness of the method is testified by successfully detecting a 4 mm tumor located inside the glandular region in one breast model and by the model located at the interface between the gland and the fat, respectively. The reliability of the method is checked by distinguishing a tumor buried in the glandular tissue whose dielectric constant is 35. The feasibility of the method is confirmed by showing the correct tumor information in both simulation results and experimental results for the realistic 3-D printed breast phantom. PMID:26552095

Generalized-ensemble simulations, such as replica exchange and serial generalized-ensemble methods, are powerful simulation tools to enhance sampling of free energy landscapes in systems with high energy barriers. In these methods, sampling is enhanced through instantaneous transitions of replicas, i.e., copies of the system, between different ensembles characterized by some control parameter associated with thermodynamical variables (e.g., temperature or pressure) or collective mechanical variables (e.g., interatomic distances or torsional angles). An interesting evolution of these methodologies has been proposed by replacing the conventional instantaneous (trial) switches of replicas with noninstantaneous switches, realized by varying the control parameter in a finite time and accepting the final replica configuration with a Metropolis-like criterion based on the Crooks nonequilibrium work (CNW) theorem. Here we revise these techniques focusing on their correlation with the CNW theorem in the framework of Markovian processes. An outcome of this report is the derivation of the acceptance probability for noninstantaneous switches in serial generalized-ensemble simulations, where we show that explicit knowledge of the time dependence of the weight factors entering such simulations is not necessary. A generalized relationship of the CNW theorem is also provided in terms of the underlying equilibrium probability distribution at a fixed control parameter. Illustrative calculations on a toy model are performed with serial generalized-ensemble simulations, especially focusing on the different behavior of instantaneous and noninstantaneous replica transition schemes. PMID:26565367

Generalized-ensemble simulations, such as replica exchange and serial generalized-ensemble methods, are powerful simulation tools to enhance sampling of free energy landscapes in systems with high energy barriers. In these methods, sampling is enhanced through instantaneous transitions of replicas, i.e., copies of the system, between different ensembles characterized by some control parameter associated with thermodynamical variables (e.g., temperature or pressure) or collective mechanical variables (e.g., interatomic distances or torsional angles). An interesting evolution of these methodologies has been proposed by replacing the conventional instantaneous (trial) switches of replicas with noninstantaneous switches, realized by varying the control parameter in a finite time and accepting the final replica configuration with a Metropolis-like criterion based on the Crooks nonequilibrium work (CNW) theorem. Here we revise these techniques focusing on their correlation with the CNW theorem in the framework of Markovian processes. An outcome of this report is the derivation of the acceptance probability for noninstantaneous switches in serial generalized-ensemble simulations, where we show that explicit knowledge of the time dependence of the weight factors entering such simulations is not necessary. A generalized relationship of the CNW theorem is also provided in terms of the underlying equilibrium probability distribution at a fixed control parameter. Illustrative calculations on a toy model are performed with serial generalized-ensemble simulations, especially focusing on the different behavior of instantaneous and noninstantaneous replica transition schemes.

Ensemble species distribution models combine the strengths of several species environmental matching models, while minimizing the weakness of any one model. Ensemble models may be particularly useful in risk analysis of recently arrived, harmful invasive species because species may not yet have spread to all suitable habitats, leaving species-environment relationships difficult to determine. We tested five individual models (logistic regression, boosted regression trees, random forest, multivariate adaptive regression splines (MARS), and maximum entropy model or Maxent) and ensemble modeling for selected nonnative plant species in Yellowstone and Grand Teton National Parks, Wyoming; Sequoia and Kings Canyon National Parks, California, and areas of interior Alaska. The models are based on field data provided by the park staffs, combined with topographic, climatic, and vegetation predictors derived from satellite data. For the four invasive plant species tested, ensemble models were the only models that ranked in the top three models for both field validation and test data. Ensemble models may be more robust than individual species-environment matching models for risk analysis. ?? 2010 Society for Risk Analysis.

Ensemble species distribution models combine the strengths of several species environmental matching models, while minimizing the weakness of any one model. Ensemble models may be particularly useful in risk analysis of recently arrived, harmful invasive species because species may not yet have spread to all suitable habitats, leaving species-environment relationships difficult to determine. We tested five individual models (logistic regression, boosted regression trees, random forest, multivariate adaptive regression splines (MARS), and maximum entropy model or Maxent) and ensemble modeling for selected nonnative plant species in Yellowstone and Grand Teton National Parks, Wyoming; Sequoia and Kings Canyon National Parks, California, and areas of interior Alaska. The models are based on field data provided by the park staffs, combined with topographic, climatic, and vegetation predictors derived from satellite data. For the four invasive plant species tested, ensemble models were the only models that ranked in the top three models for both field validation and test data. Ensemble models may be more robust than individual species-environment matching models for risk analysis. PMID:20136746

Background Computational prediction of protein subcellular localization can greatly help to elucidate its functions. Despite the existence of dozens of protein localization prediction algorithms, the prediction accuracy and coverage are still low. Several ensemble algorithms have been proposed to improve the prediction performance, which usually include as many as 10 or more individual localization algorithms. However, their performance is still limited by the running complexity and redundancy among individual prediction algorithms. Results This paper proposed a novel method for rational design of minimalist ensemble algorithms for practical genome-wide protein subcellular localization prediction. The algorithm is based on combining a feature selection based filter and a logistic regression classifier. Using a novel concept of contribution scores, we analyzed issues of algorithm redundancy, consensus mistakes, and algorithm complementarity in designing ensemble algorithms. We applied the proposed minimalist logistic regression (LR) ensemble algorithm to two genome-wide datasets of Yeast and Human and compared its performance with current ensemble algorithms. Experimental results showed that the minimalist ensemble algorithm can achieve high prediction accuracy with only 1/3 to 1/2 of individual predictors of current ensemble algorithms, which greatly reduces computational complexity and running time. It was found that the high performance ensemble algorithms are usually composed of the predictors that together cover most of available features. Compared to the best individual predictor, our ensemble algorithm improved the prediction accuracy from AUC score of 0.558 to 0.707 for the Yeast dataset and from 0.628 to 0.646 for the Human dataset. Compared with popular weighted voting basedensemble algorithms, our classifier-basedensemble algorithms achieved much better performance without suffering from inclusion of too many individual predictors. Conclusions We

Two structure determination methods, based on the molecular dynamics flexible fitting (MDFF) paradigm, are presented that resolve sub-5 Å cryo-electron microscopy (EM) maps with either single structures or ensembles of such structures. The methods, denoted cascade MDFF and resolution exchange MDFF, sequentially re-refine a search model against a series of maps of progressively higher resolutions, which ends with the original experimental resolution. Application of sequential re-refinement enables MDFF to achieve a radius of convergence of ~25 Å demonstrated with the accurate modeling of β-galactosidase and TRPV1 proteins at 3.2 Å and 3.4 Å resolution, respectively. The MDFF refinements uniquely offer map-model validation and B-factor determination criteria based on the inherent dynamics of the macromolecules studied, captured by means of local root mean square fluctuations. The MDFF tools described are available to researchers through an easy-to-use and cost-effective cloud computing resource on Amazon Web Services. PMID:27383269

Two structure determination methods, based on the molecular dynamics flexible fitting (MDFF) paradigm, are presented that resolve sub-5 Å cryo-electron microscopy (EM) maps with either single structures or ensembles of such structures. The methods, denoted cascade MDFF and resolution exchange MDFF, sequentially re-refine a search model against a series of maps of progressively higher resolutions, which ends with the original experimental resolution. Application of sequential re-refinement enables MDFF to achieve a radius of convergence of ~25 Å demonstrated with the accurate modeling of β-galactosidase and TRPV1 proteins at 3.2 Å and 3.4 Å resolution, respectively. The MDFF refinements uniquely offer map-model validation and B-factor determination criteria based on the inherent dynamics of the macromolecules studied, captured by means of local root mean square fluctuations. The MDFF tools described are available to researchers through an easy-to-use and cost-effective cloud computing resource on Amazon Web Services. DOI: http://dx.doi.org/10.7554/eLife.16105.001 PMID:27383269

Ensemble approaches have been shown to enhance classification by combining the outputs from a set of voting classifiers. Diversity in error patterns among base classifiers promotes ensemble performance. Multi-task learning is an important characteristic for Neural Network classifiers. Introducing a secondary output unit that receives different…

Supramolecular polymers, polymeric systems beyond the molecule, have attracted more and more attention from scientists due to their applications in various fields, including stimuli-responsive materials, healable materials, and drug delivery. Due to their good selectivity and convenient enviro-responsiveness, crown ether-basedmolecular recognition motifs have been actively employed to fabricate supramolecular polymers with interesting properties and novel applications in recent years. In this tutorial review, we classify supramolecular polymers based on their differences in topology and cover recent advances in the marriage between crown ether-basedmolecular recognition and polymer science. PMID:22012256

Melanoma is the deadliest type of skin cancer, yet it is the most treatable kind depending on its early diagnosis. The early prognosis of melanoma is a challenging task for both clinicians and dermatologists. Due to the importance of early diagnosis and in order to assist the dermatologists, we propose an automated framework based on ensemble learning methods and dermoscopy images to differentiate melanoma from dysplastic and benign lesions. The evaluation of our framework on the recent and public dermoscopy benchmark (PH2 dataset) indicates the potential of proposed method. Our evaluation, using only global features, revealed that ensembles such as random forest perform better than single learner. Using random forest ensemble and combination of color and texture features, our framework achieved the highest sensitivity of 94% and specificity of 92%.

We theoretically study the properties of the optimal size distribution in the ensemble of hollow gold nanoshells (HGNs) that exhibits the best performance at in vivo biomedical applications. For the first time, to the best of our knowledge, we analyze the dependence of the optimal geometric means of the nanoshells’ thicknesses and core radii on the excitation wavelength and the type of human tissue, while assuming lognormal fit to the size distribution in a real HGN ensemble. Regardless of the tissue type, short-wavelength, near-infrared lasers are found to be the most effective in both absorption- and scattering-based applications. We derive approximate analytical expressions enabling one to readily estimate the parameters of optimal distribution for which an HGN ensemble exhibits the maximum efficiency of absorption or scattering inside a human tissue irradiated by a near-infrared laser. PMID:23537206

Recent work in classification indicates that significant improvements in accuracy can be obtained by growing an ensemble of classifiers and having them vote for the most popular class. This paper focuses on ensembles of decision trees that are created with a randomized procedure based on sampling. Randomization can be introduced by using random samples of the training data (as in bagging or boosting) and running a conventional tree-building algorithm, or by randomizing the induction algorithm itself. The objective of this paper is to describe the first experiences with a novel randomized tree induction method that uses a sub-sample of instances at a node to determine the split. The empirical results show that ensembles generated using this approach yield results that are competitive in accuracy and superior in computational cost to boosting and bagging.

Recent work in classification indicates that significant improvements in accuracy can be obtained by growing an ensemble of classifiers and having them vote for the most popular class. This paper focuses on ensembles of decision trees that are created with a randomized procedure based on sampling. Randomization can be introduced by using random samples of the training data (as in bagging or arcing) and running a conventional tree-building algorithm, or by randomizing the induction algorithm itself. The objective of this paper is to describe our first experiences with a novel randomized tree induction method that uses a subset of samples at a node to determine the split. Our empirical results show that ensembles generated using this approach yield results that are competitive in accuracy and superior in computational cost.

We describe a method for precise estimation of the polarization of a mesoscopic spin ensemble by using its coupling to a single two-level system. Our approach requires a minimal number of measurements on the two-level system for a given measurement precision. We consider the application of this method to the case of nuclear-spin ensemble defined by a single electron-charged quantum dot: we show that decreasing the electron spin dephasing due to nuclei and increasing the fidelity of nuclear-spin-based quantum memory could be within the reach of present day experiments.

The objectives of the study are to gain a better understanding of the characteristics of iceberg motion and the factors controlling iceberg drift, and to develop an iceberg ensemble drift forecast system to be operated by the Canadian Atmospheric Environment Service. An extensive review of field and theoretical studies on iceberg behaviour, and the factors controlling iceberg motion has been carried out. Long term and short term behaviour of icebergs are critically examined. A quantitative assessment of the effects of the factors controlling iceberg motion is presented. The study indicated that wind and currents are the primary driving forces. Coriolis Force and ocean surface slope also have significant effects. As for waves, only the higher waves have a significant effect. Iceberg drift is also affected by iceberg size characteristics. Based on the findings of the study a comprehensive computerized forecast system to predict the drift of iceberg ensembles off Canada's east coast has been designed. The expected accuracy of the forecast system is discussed and recommendations are made for future improvements to the system.

This review describes the rationale, early stage development, and initial human application of neural interface systems (NISs) for humans with paralysis. NISs are emerging medical devices designed to allow persons with paralysis to operate assistive technologies or to reanimate muscles based upon a command signal that is obtained directly from the brain. Such systems require the development of sensors to detect brain signals, decoders to transform neural activity signals into a useful command, and an interface for the user. We review initial pilot trial results of an NIS that is based on an intracortical microelectrode sensor that derives control signals from the motor cortex. We review recent findings showing, first, that neurons engaged by movement intentions persist in motor cortex years after injury or disease to the motor system, and second, that signals derived from motor cortex can be used by persons with paralysis to operate a range of devices. We suggest that, with further development, this form of NIS holds promise as a useful new neurotechnology for those with limited motor function or communication. We also discuss the additional potential for neural sensors to be used in the diagnosis and management of various neurological conditions and as a new way to learn about human brain function. PMID:17272345

We developed advanced techniques for the growth of self-assembled quantum dots (QDs) for fabricating a broadband light source that can be applied to optical coherence tomography (OCT). Four QD ensembles and strain reducing layers (SRLs) were grown in selective areas on a wafer by the use of a 90° rotational metal mask. The SRL thickness was varied to achieve appropriate shifts in the peak wavelength of the QD emission spectrum of up to 120nm. The four-color QD ensembles were expected to have a broad bandwidth of more than 160nm due to the combination of excited state emissions when introduced in a current-induced broadband light source such as a superluminescent diode (SLD). Furthermore, a desired shape of the SLD spectrum can be obtained by controlling the injection current applied to each QD ensemble. The broadband and spectrum shape controlled light source is promising for high-resolution and low-noise OCT systems.

The increased exposure of human populations to heat stress is one of the likely consequences of global warming, and it has detrimental effects on health and labor capacity. Here, we consider the evolution of heat stress under climate change using 21 general circulation models (GCMs). Three heat stress indicators, based on both temperature and humidity conditions, are used to investigate present-day model biases and spreads in future climate projections. Present day estimates of heat stress indicators from observational data shows that humid tropical areas tend to experience more frequent heat stress than other regions do, with a total frequency of heat stress 250-300 d yr-1. The most severe heat stress is found in the Sahel and south India. Present-day GCM simulations tend to underestimate heat stress over the tropics due to dry and cold model biases. The model based estimates are in better agreement with observation in mid to high latitudes, but this is due to compensating errors in humidity and temperature. The severity of heat stress is projected to increase by the end of the century under climate change scenario RCP8.5, reaching unprecedented levels in some regions compared with observations. An analysis of the different factors contributing to the total spread of projected heat stress shows that spread is primarily driven by the choice of GCMs rather than the choice of indicators, even when the simulated indicators are bias-corrected. This supports the utility of the multi-model ensemble approach to assess the impacts of climate change on heat stress.

This study embeds a multilevel Monte Carlo sampling strategy into the Monte Carlo step of the ensemble Kalman filter (EnKF) in the setting of finite dimensional signal evolution and noisy discrete-time observations. The signal dynamics is assumed to be governed by a stochastic differential equation (SDE), and a hierarchy of time grids is introduced for multilevel numerical integration of that SDE. Finally, the resulting multilevel EnKF is proved to asymptotically outperform EnKF in terms of computational cost versus approximation accuracy. The theoretical results are illustrated numerically.

A low cost detector for particles in molecular beam experiments is presented which can easily be mounted in a molecular beam apparatus. The detector is based on microfabricated cantilevers, which can be employed either as single sensors or as sensor arrays. The single cantilever technique has been used to measure the absolute number of atoms coming out of a pulsed laser vaporization cluster source. The particles are detected by the shift of the thermally excited resonance frequency of the cantilever due to the cluster deposition. We have determined with the single cantilever the ratio of neutral to ionized clusters and we have investigated the cluster generation at different source conditions. In addition to this, a microfabricated cantilever array has been used to measure molecular beam profiles, which opens new possibilities for molecular beam deflection experiments.

Molecular partitioning based on the kinetic energy density is performed to a number of chemical species, which show non-nuclear attractors (NNA) in their gradient maps of the electron density. It is found that NNAs are removed using this molecular partitioning and although the virial theorem is not valid for all of the basins obtained in the being used AIM, all of the atoms obtained using the new approach obey this theorem. A comparison is also made between some atomic topological parameters which are obtained from the new partitioning approach and those calculated based on the electron density partitioning.

Background: Classification of endometrial carcinomas (ECs) by morphologic features is inconsistent, and yields limited prognostic and predictive information. A new system for classification based on the molecular categories identified in The Cancer Genome Atlas is proposed. Methods: Genomic data from the Cancer Genome Atlas (TCGA) support classification of endometrial carcinomas into four prognostically significant subgroups; we used the TCGA data set to develop surrogate assays that could replicate the TCGA classification, but without the need for the labor-intensive and cost-prohibitive genomic methodology. Combinations of the most relevant assays were carried forward and tested on a new independent cohort of 152 endometrial carcinoma cases, and molecular vs clinical risk group stratification was compared. Results: Replication of TCGA survival curves was achieved with statistical significance using multiple different molecular classification models (16 total tested). Internal validation supported carrying forward a classifier based on the following components: mismatch repair protein immunohistochemistry, POLE mutational analysis and p53 immunohistochemistry as a surrogate for ‘copy-number' status. The proposed molecular classifier was associated with clinical outcomes, as was stage, grade, lymph-vascular space invasion, nodal involvement and adjuvant treatment. In multivariable analysis both molecular classification and clinical risk groups were associated with outcomes, but differed greatly in composition of cases within each category, with half of POLE and mismatch repair loss subgroups residing within the clinically defined ‘high-risk' group. Combining the molecular classifier with clinicopathologic features or risk groups provided the highest C-index for discrimination of outcome survival curves. Conclusions: Molecular classification of ECs can be achieved using clinically applicable methods on formalin-fixed paraffin-embedded samples, and provides

A novel three-terminal hot-electron device, the induced base transistor (IBT), has been fabricated by molecular beam epitaxy. Two-dimensional electron gas induced by the applied collector field in an undoped GaAs quantum well is used as the base of the IBT. The common-base current gain alpha has been achieved as high as 0.96 under a collector bias of 2.5 V and an emitter current of 3 mA.

Knowledge of protein-protein interactions and their binding sites is indispensable for in-depth understanding of the networks in living cells. With the avalanche of protein sequences generated in the postgenomic age, it is critical to develop computational methods for identifying in a timely fashion the protein-protein binding sites (PPBSs) based on the sequence information alone because the information obtained by this way can be used for both biomedical research and drug development. To address such a challenge, we have proposed a new predictor, called iPPBS-Opt, in which we have used: (1) the K-Nearest Neighbors Cleaning (KNNC) and Inserting Hypothetical Training Samples (IHTS) treatments to optimize the training dataset; (2) the ensemble voting approach to select the most relevant features; and (3) the stationary wavelet transform to formulate the statistical samples. Cross-validation tests by targeting the experiment-confirmed results have demonstrated that the new predictor is very promising, implying that the aforementioned practices are indeed very effective. Particularly, the approach of using the wavelets to express protein/peptide sequences might be the key in grasping the problem's essence, fully consistent with the findings that many important biological functions of proteins can be elucidated with their low-frequency internal motions. To maximize the convenience of most experimental scientists, we have provided a step-by-step guide on how to use the predictor's web server (http://www.jci-bioinfo.cn/iPPBS-Opt) to get the desired results without the need to go through the complicated mathematical equations involved. PMID:26797600

Plasmonic-based electrochemical impedance spectroscopy (P-EIS) is developed to investigate molecular binding on surfaces. Its basic principle relies on the sensitive dependence of surface plasmon resonance (SPR) signal on surface charge density, which is modulated by applying an AC potential to a SPR chip surface. The AC component of the SPR response gives the electrochemical impedance, and the DC component provides the conventional SPR detection. The plasmonic-based impedance measured over a range of frequency is in quantitative agreement with the conventional electrochemical impedance. Compared to the conventional SPR detection, P-EIS is sensitive to molecular binding taking place on the chip surface, and less sensitive to bulk refractive index changes or non-specific binding. Moreover, this new approach allows for simultaneous SPR and surface impedance analysis of molecular binding processes. PMID:22122514

Inorganic nanoparticles including semiconductor quantum dots, iron oxide nanoparticles, and gold nanoparticles have been developed as contrast agents for diagnostics by molecular imaging. Compared to traditional contrast agents, nanoparticles offer several advantages: their optical and magnetic properties can be tailored by engineering the composition, structure, size, and shape; their surfaces can be modified with ligands to target specific biomarkers of disease; the contrast enhancement provided can be equivalent to millions of molecular counterparts; and they can be integrated with a combination of different functions for multi-modal imaging. Here, we review recent advances in the development of contrast agents based on inorganic nanoparticles for molecular imaging, with a touch on contrast enhancement, surface modification, tissue targeting, clearance, and toxicity. As research efforts intensify, contrast agents based on inorganic nanoparticles that are highly sensitive, target-specific, and safe to use are expected to enter clinical applications in the near future. PMID:21074494

An approach for simulating bionanosystems such as viruses and ribosomes is presented. This calibration-free approach is based on an all-atom description for bionanosystems, a universal interatomic force field, and a multiscale perspective. The supramillion-atom nature of these bionanosystems prohibits the use of a direct molecular dynamics approach for phenomena such as viral structural transitions or self-assembly that develop over milliseconds or longer. A key element of these multiscale systems is the cross-talk between, and consequent strong coupling of processes over many scales in space and time. Thus, overall nanoscale features of these systems control the relative probability of atomistic fluctuations, while the latter mediate the average forces and diffusion coefficients that induce the dynamics of these nanoscale features. This feedback loop is overlooked in typical coarse-grained methods. We elucidate the role of interscale cross-talk and overcome bionanosystem simulation difficulties with (1) automated construction of order parameters (OPs) describing suprananometer scale structural features, (2) construction of OP-dependent ensembles describing the statistical properties of atomistic variables that ultimately contribute to the entropies driving the dynamics of the OPs, and (3) the derivation of a rigorous equation for the stochastic dynamics of the OPs. As the OPs capture hydrodynamic modes in the host medium, ``long-time tails'' in the correlation functions yielding the generalized diffusion coefficients do not emerge. Since the atomic-scale features of the system are treated statistically, several ensembles are constructed that reflect various experimental conditions. Attention is paid to the proper use of the Gibbs hypothesized equivalence of long-time and ensemble averages to accommodate the varying experimental conditions. The theory provides a basis for a practical, quantitative bionanosystem modeling approach that preserves the cross

The fully analytic energy gradient has been developed and implemented for the restricted open-shell Hartree–Fock (ROHF) method based on the fragment molecular orbital (FMO) theory for systems that have multiple open-shell molecules. The accuracy of the analytic ROHF energy gradient is compared with the corresponding numerical gradient, illustrating the accuracy of the analytic gradient. The ROHF analytic gradient is used to perform molecular dynamics simulations of an unusual open-shell system, liquid oxygen, and mixtures of oxygen and nitrogen. These molecular dynamics simulations provide some insight about how triplet oxygen molecules interact with each other. Timings reveal that the method can calculate the energy gradient for a system containing 4000 atoms in only 6 h. Therefore, it is concluded that the FMO-ROHF method will be useful for investigating systems with multiple open shells.

We use the supersymmetric formalism to derive an integral formula for the density of states of the Gaussian Orthogonal Ensemble, and then apply saddle-point analysis to give a new derivation of the 1/N-correction to Wigner's law. This extends the work of Disertori on the Gaussian Unitary Ensemble. We also apply our method to the interpolating ensembles of Mehta–Pandey.

The Sankofa Drum and Dance Ensemble is a Ghanaian drum and dance ensemble that focusses on music in the Ewe tradition. It is based in an elementary school in the Greater Toronto Area and consists of students in Grade 4 through Grade 8. Students in the ensemble study Ghanaian traditional Ewe drumming and dancing in the oral tradition. Nine students…

A laboratory experiment is described in which students measure the amount of cetirizine in allergy-treatment tablets based on molecular recognition. The basis of recognition is competition of cetirizine with phenolphthalein to form an inclusion complex with ß-cyclodextrin. Phenolphthalein is pinkish under basic condition, whereas it's complex form…

The Molecular Probe Data Base (MPDB) was designed to collect and make information on synthetic oligonucleotides available on-line. This paper briefly describes its purpose, contents and structure, forms and mode of data distribution. Particular emphasis is given to recent data extension and system enhancements that have been carried out in order to simplify access to MPDB for unskilled users. PMID:8332523

This paper analyzes the hardware and software features that would be desirable in a computer-based semantic network system for representing biology knowledge. It then describes in detail a prototype network of molecular biology knowledge that has been developed using Filevision software and a Macintosh computer. The prototype contains about 100…

The past decades the numerical weather prediction community has witnessed a paradigm shift from deterministic to probabilistic forecast and state estimation (Buizza and Leutbecher, 2015; Buizza et al., 2008), in an attempt to quantify the uncertainties associated with initial-condition and model errors. An important benefit of a probabilistic framework is the improved prediction of extreme events. However, one may ask to what extent such model estimates contain information on the occurrence probability of extreme events and how this information can be optimally extracted. Different approaches have been proposed and applied on real-world systems which, based on extreme value theory, allow the estimation of extreme-event probabilities conditional on forecasts and state estimates (Ferro, 2007; Friederichs, 2010). Using ensemble predictions generated with a model of low dimensionality, a thorough investigation is presented quantifying the change of predictability of extreme events associated with ensemble post-processing and other influencing factors including the finite ensemble size, lead time and model assumption and the use of different covariates (ensemble mean, maximum, spread...) for modeling the tail distribution. Tail modeling is performed by deriving extreme-quantile estimates using peak-over-threshold representation (generalized Pareto distribution) or quantile regression. Common ensemble post-processing methods aim to improve mostly the ensemble mean and spread of a raw forecast (Van Schaeybroeck and Vannitsem, 2015). Conditional tail modeling, on the other hand, is a post-processing in itself, focusing on the tails only. Therefore, it is unclear how applying ensemble post-processing prior to conditional tail modeling impacts the skill of extreme-event predictions. This work is investigating this question in details. Buizza, Leutbecher, and Isaksen, 2008: Potential use of an ensemble of analyses in the ECMWF Ensemble Prediction System, Q. J. R. Meteorol

Recent work in classification indicates that significant improvements in accuracy can be obtained by growing an ensemble of classifiers and having them vote for the most popular class. Implicit in many of these techniques is the concept of randomization that generates different classifiers. In this paper, they focus on ensembles of decision trees that are created using a randomized procedure based on histograms. Techniques, such as histograms, that discretize continuous variables, have long been used in classification to convert the data into a form suitable for processing and to reduce the compute time. The approach combines the ideas behind discretization through histograms and randomization in ensembles to create decision trees by randomly selecting a split point in an interval around the best bin boundary in the histogram. The experimental results with public domain data show that ensembles generated using this approach are competitive in accuracy and superior in computational cost to other ensembles techniques such as boosting and bagging.

An ensemble-based approach is applied to better estimate source impacts on fine particulate matter (PM2.5) and quantify uncertainties in various source apportionment (SA) methods. The approach combines source impacts from applications of four individual SA methods: three receptor-based models and one chemical transport model (CTM). Receptor models used are the chemical mass balance methods CMB-LGO (Chemical Mass Balance-Lipschitz global optimizer) and CMB-MM (molecular markers) as well as a factor analytic method, Positive Matrix Factorization (PMF). The CTM used is the Community Multiscale Air Quality (CMAQ) model. New source impact estimates and uncertainties in these estimates are calculated in a two-step process. First, an ensemble average is calculated for each source category using results from applying the four individual SA methods. The root mean square error (RMSE) between each method with respect to the average is calculated for each source category; the RMSE is then taken to be the updated uncertainty for each individual SA method. Second, these new uncertainties are used to re-estimate ensemble source impacts and uncertainties. The approach is applied to data from daily PM2.5 measurements at the Atlanta, GA, Jefferson Street (JST) site in July 2001 and January 2002. The procedure provides updated uncertainties for the individual SA methods that are calculated in a consistent way across methods. Overall, the ensemble has lower relative uncertainties as compared to the individual SA methods. Calculated CMB-LGO uncertainties tend to decrease from initial estimates, while PMF and CMB-MM uncertainties increase. Estimated CMAQ source impact uncertainties are comparable to other SA methods for gasoline vehicles and SOC but are larger than other methods for other sources. In addition to providing improved estimates of source impact uncertainties, the ensemble estimates do not have unrealistic extremes as compared to individual SA methods and avoids zero impact

Predicting root zone (0-100 cm) soil moisture (RZSM) content at a catchment-scale is essential for drought and flood predictions, irrigation planning, weather forecasting, and many other applications. Satellites, such as the NASA Soil Moisture Active Passive (SMAP), can estimate near-surface (0-5 cm) soil moisture content globally at coarse spatial resolutions. We develop a hierarchical Ensemble Kalman Filter (EnKF) data assimilation modeling system to downscale satellite-based near-surface soil moisture and to estimate RZSM content across the Shale Hills Critical Zone Observatory at a 1-m resolution in combination with ground-based soil moisture sensor data. In this example, a simple infiltration model within the EnKF-model has been parameterized for 6 soil-terrain units to forecast daily RZSM content in the catchment from 2009 - 2012 based on AMSRE. LiDAR-derived terrain variables define intra-unit RZSM variability using a novel covariance localization technique. This method also allows the mapping of uncertainty with our RZSM estimates for each time-step. A catchment-wide satellite-to-surface downscaling parameter, which nudges the satellite measurement closer to in situ near-surface data, is also calculated for each time-step. We find significant differences in predicted root zone moisture storage for different terrain units across the experimental time-period. Root mean square error from a cross-validation analysis of RZSM predictions using an independent dataset of catchment-wide in situ Time-Domain Reflectometry (TDR) measurements ranges from 0.060-0.096 cm3 cm-3, and the RZSM predictions are significantly (p < 0.05) correlated with TDR measurements [r = 0.47-0.68]. The predictive skill of this data assimilation system is similar to the Penn State Integrated Hydrologic Modeling (PIHM) system. Uncertainty estimates are significantly (p < 0.05) correlated to cross validation error during wet and dry conditions, but more so in dry summer seasons. Developing an

This paper describes a knowledge-based system for molecular diagnostics, and its application to fully automated diagnosis of X-linked genetic disorders. Molecular diagnostic information is used in clinical practice for determining genetic risks, such as carrier determination and prenatal diagnosis. Initially, blood samples are obtained from related individuals, and PCR amplification is performed. Linkage-basedmolecular diagnosis then entails three data analysis steps. First, for every individual, the alleles (i.e., DNA composition) are determined at specified chromosomal locations. Second, the flow of genetic material among the individuals is established. Third, the probability that a given individual is either a carrier of the disease or affected by the disease is determined. The current practice is to perform each of these three steps manually, which is costly, time consuming, labor-intensive, and error-prone. As such, the knowledge-intensive data analysis and interpretation supersede the actual experimentation effort as the major bottleneck in molecular diagnostics. By examining the human problem solving for the task, we have designed and implemented a prototype knowledge-based system capable of fully automating linkage-basedmolecular diagnostics in X-linked genetic disorders, including Duchenne Muscular Dystrophy (DMD). Our system uses knowledge-based interpretation of gel electrophoresis images to determine individual DNA marker labels, a constraint satisfaction search for consistent genetic flow among individuals, and a blackboard-style problem solver for risk assessment. We describe the system`s successful diagnosis of DMD carrier and affected individuals from raw clinical data.

The notion of representative statistical ensembles, correctly representing statistical systems, is strictly formulated. This notion allows for a proper description of statistical systems, avoiding inconsistencies in theory. As an illustration, a Bose-condensed system is considered. It is shown that a self-consistent treatment of the latter, using a representative ensemble, always yields a conserving and gapless theory.

States that bass players should be allowed to play chamber music because it is an essential component to all string students' musical development. Expounds that bassists can successfully enjoy chamber music through participation in a bass ensemble. Gives suggestions on how to form a bass ensemble and on the repertoire of music. (CMK)

The quest for a molecular rectifier is among the major challenges of molecular electronics. We introduce three simple rules to design an efficient rectifying molecule and demonstrate its functioning at the theoretical level, relying on the NEGF-DFT technique. The design rules notably require both the introduction of asymmetric anchoring moieties and a decoupling bridge. They lead to a new rectification mechanism based on the compression and control of the HOMO/LUMO gap by the electrode Fermi levels, arising from a pinning effect. Significant rectification ratios up to 2 orders of magnitude are theoretically predicted as the mechanism opposes resonant to nonresonant tunneling. PMID:25706442

Enzymes are widely used for the synthesis of pharmaceuticals, agrochemicals, and food additives because they can catalyze high enantioselective transformations. In order to construct selective enzymes by protein engineering, it is important to understand the molecular basis of enzyme-substrate interactions that contribute to enantioselectivity. The haloalkane dehalogenase DbjA showed high enantioselectivity for two racemic mixtures: α-bromoesters and β-bromoalkanes. Thermodynamic analysis, protein crystallography, and computer simulations indicated that DbjA carries two bases for the enantiodiscrimination of each racemic mixture. This study helps us understand the molecular basis of the enantioselectivity and opens up new possibilities for constructing enantiospecific biocatalysts through protein engineering.

An electronic shift-register memory at the molecular level is described. The memory elements are based on a chain of electron-transfer molecules and the information is shifted by photoinduced electron-transfer reactions. This device integrates designed electronic molecules onto a very large scale integrated (silicon microelectronic) substrate, providing an example of a 'molecular electronic device' that could actually be made. The design requirements for such a device and possible synthetic strategies are discussed. Devices along these lines should have lower energy usage and enhanced storage density.

We discuss the geometry of trees endowed with a causal structure using the conventional framework of equilibrium statistical mechanics. We show how this ensemble is related to popular growing network models. In particular we demonstrate that on a class of afine attachment kernels the two models are identical but they can differ substantially for other choice of weights. We show that causal trees exhibit condensation even for asymptotically linear kernels. We derive general formulae describing the degree distribution, the ancestor--descendant correlation and the probability that a randomly chosen node lives at a given geodesic distance from the root. It is shown that the Hausdorff dimension dH of the causal networks is generically infinite.

Microchips, constructed with a variety of microfabrication technologies (photolithography, micropatterning, microjet printing, light-directed chemical synthesis, laser stereochemical etching, and microcontact printing) are being applied to molecular biology. The new microchip-based analytical devices promise to solve the analytical problems faced by many molecular biologists (eg, contamination, low throughput, and high cost). They may revolutionize molecular biology and its application in clinical medicine, forensic science, and environmental monitoring. A typical biochemical analysis involves three main steps: (1) sample preparation, (2) biochemical reaction, and (3) detection (either separation or hybridization may be involved) accompanied by data acquisition and interpretation. The construction of a miniturized analyzer will therefore necessarily entail the miniaturization and integration of all three of these processes. The literature related to the miniaturization of these three processes indicates that the greatest emphasis so far is on the investigation and development of methods for the detection of nucleic acid, followed by the optimization of a biochemical reaction, such as the polymerase chain reaction. The first step involving sample preparation has received little attention. In this review the state of the art of, microchip-based, miniaturized analytical processes (eg, sample preparation, biochemical reaction, and detection of products) are outlined and the applications of microchip-based devices in the molecular diagnosis of genetic diseases are discussed. PMID:10462559

Based on multiple parallel short molecular dynamics simulation trajectories, we designed the reweighted ensemble dynamics (RED) method to more efficiently sample complex (biopolymer) systems, and to explore their hierarchical metastable states. Here we further present an improvement to depress statistical errors of the RED and we discuss a few keys in practical application of the RED, provide schemes on selection of basis functions, and determination of the free parameter in the RED. We illustrate the application of the improvements in two toy models and in the solvated alanine dipeptide. The results show the RED enables us to capture the topology of multiple-state transition networks, to detect the diffusion-like dynamical behavior in an entropy-dominated system, and to identify solvent effects in the solvated peptides. The illustrations serve as general applications of the RED in more complex biopolymer systems. Project supported by the National Natural Science Foundation of China (Grant No. 11175250).

Design of elementary molecular logic gates is the key and the fundamental of performing complicated Boolean calculations. Herein, we report a strategy for constructing a DNA-based OR gate by using the mechanism of sequence recognition and the principle of fluorescence resonance energy transfer (FRET). In this system, the gate is entirely composed of a single strand of DNA (A, B and C) and the inputs are the molecular beacon probes (MB1 and MB2). Changes in fluorescence intensity confirm the realization of the OR logic operation and electrophoresis experiments verify these results. Our successful application of DNA to perform the binary operation represents that DNA can serve as an efficient biomaterial for designing molecular logic gates and devices. PMID:22278176

A tight binding model is used to investigate photoinduced tunneling current through a molecular bridge coupled to two semiconductor electrodes. A quantum master equation is developed within a non-Markovian theory based on second-order perturbation theory with respect to the molecule-semiconductor electrode coupling. The spectral functions are generated using a one dimensional alternating bond model, and the coupling between the molecule and the electrodes is expressed through a corresponding correlation function. Since the molecular bridge orbitals are inside the bandgap between the conduction and valence bands, charge carrier tunneling is inhibited in the dark. Subject to the dipole interaction with the laser field, virtual molecular states are generated via the absorption and emission of photons, and new tunneling channels open. Interesting phenomena arising from memory are noted. Such a phenomenon could serve as a switch.

The paper presents an open source system that allows the user to interact with a 3D molecular viewer using associated hand gestures for rotating, scaling and panning the rendered model. The novelty of this approach is that the entire application is browser-based and doesn't require installation of third party plug-ins or additional software components in order to visualize the supported chemical file formats. This kind of solution is suitable for instruction of users in less IT oriented environments, like medicine or chemistry. For rendering various molecular geometries our team used GLmol (a molecular viewer written in JavaScript). The interaction with the 3D models is made with Leap Motion controller that allows real-time tracking of the user's hand gestures. The first results confirmed that the resulting application leads to a better way of understanding various types of translational bioinformatics related problems in both biomedical research and education. PMID:27350455

A tight binding model is used to investigate photoinduced tunneling current through a molecular bridge coupled to two semiconductor electrodes. A quantum master equation is developed within a non-Markovian theory based on second-order perturbation theory with respect to the molecule-semiconductor electrode coupling. The spectral functions are generated using a one dimensional alternating bond model, and the coupling between the molecule and the electrodes is expressed through a corresponding correlation function. Since the molecular bridge orbitals are inside the bandgap between the conduction and valence bands, charge carrier tunneling is inhibited in the dark. Subject to the dipole interaction with the laser field, virtual molecular states are generated via the absorption and emission of photons, and new tunneling channels open. Interesting phenomena arising from memory are noted. Such a phenomenon could serve as a switch.

This study introduces a highly accurate data-driven method to predict streamflow frequency statistics based on known drainage area characteristics which yields insights into the dominant controls of regional streamflow. The model is enhanced by explicit consideration of human interference in local hydrology. The basic idea is to use decision trees (i.e., regression trees) to regionalize the dataset and create a model tree by fitting multi-linear equations to the leaves of the regression tree. We improve model accuracy and obtain a measure of variable importance by creating an ensemble of randomized model trees using bootstrap aggregation (i.e., bagging). The database used to induce the models is built from public domain drainage area characteristics for 715 USGS stream gages (455 in Texas and 260 in Illinois). The database includes information on natural characteristics such as precipitation, soil type and slope, as well as anthropogenic ones including land cover, human population and water use. Model accuracy was evaluated using cross-validation and several performance metrics. During the validation, the gauges that are withheld from the analysis represent ungauged watersheds. The proposed method outperforms standard regression models such as the method of residuals for predictions in ungauged watersheds. Importantly, out-of-bag variable importance combined with models for 17 points along the flow duration curve (FDC) (i.e., from 0% to 100% exceedance frequency) yields insight into the dominant controls of regional streamflow. The most discriminant variables for high flows are drainage area and seasonal precipitation. Discriminant variables for low flows are more complex and model accuracy is improved with base-flow data, which is particularly difficult to obtain for ungauged sites. Consideration of human activities, such as percent urban and water use, is also shown to improve accuracy of low flow predictions. Drainage area characteristics, especially

Area Under the ROC Curve (AUC) is often used to measure the performance of an estimator in binary classification problems. An AUC-maximizing classifier can have significant advantages in cases where ranking correctness is valued or if the outcome is rare. In a Super Learner ensemble, maximization of the AUC can be achieved by the use of an AUC-maximining metalearning algorithm. We discuss an implementation of an AUC-maximization technique that is formulated as a nonlinear optimization problem. We also evaluate the effectiveness of a large number of different nonlinear optimization algorithms to maximize the cross-validated AUC of the ensemble fit. The results provide evidence that AUC-maximizing metalearners can, and often do, out-perform non-AUC-maximizing metalearning methods, with respect to ensemble AUC. The results also demonstrate that as the level of imbalance in the training data increases, the Super Learner ensemble outperforms the top base algorithm by a larger degree. PMID:27227721

This chapter is devoted to the hierarchical QSAR technology (HiT QSAR) based on simplex representation of molecular structure (SiRMS) and its application to different QSAR/QSPR tasks. The essence of this technology is a sequential solution (with the use of the information obtained on the previous steps) of the QSAR paradigm by a series of enhanced models based on molecular structure description (in a specific order from 1D to 4D). Actually, it's a system of permanently improved solutions. Different approaches for domain applicability estimation are implemented in HiT QSAR. In the SiRMS approach every molecule is represented as a system of different simplexes (tetratomic fragments with fixed composition, structure, chirality, and symmetry). The level of simplex descriptors detailed increases consecutively from the 1D to 4D representation of the molecular structure. The advantages of the approach presented are an ability to solve QSAR/QSPR tasks for mixtures of compounds, the absence of the "molecular alignment" problem, consideration of different physical-chemical properties of atoms (e.g., charge, lipophilicity), and the high adequacy and good interpretability of obtained models and clear ways for molecular design. The efficiency of HiT QSAR was demonstrated by its comparison with the most popular modern QSAR approaches on two representative examination sets. The examples of successful application of the HiT QSAR for various QSAR/QSPR investigations on the different levels (1D-4D) of the molecular structure description are also highlighted. The reliability of developed QSAR models as the predictive virtual screening tools and their ability to serve as the basis of directed drug design was validated by subsequent synthetic, biological, etc. experiments. The HiT QSAR is realized as the suite of computer programs termed the "HiT QSAR" software that so includes powerful statistical capabilities and a number of useful utilities.

The α-thalassemia is one of the most common hereditary disorders worldwide. Currently, molecular diagnostics is the only available tool to achieve an accurate diagnosis. The purpose of this study was to characterize the molecularbases of these syndromes in our environment and to establish genotype-phenotype associations. Through a combination of different molecular techniques and fluorescent in situ hybridization (FISH),we were able to find α-thalassemic mutations in 145 of the 184 patients (78.8%) studied with hematological parameters compatible with α-thalassemia. Deletions of the α-globin genes resulted the major molecular cause of the disease, and the most frequent mutation was -α(3.7), found in homozygous and heterozygous genotypes. In patients with α° phenotypes, other prevalent mutations were( _MED) and (_CAL/CAMP). The description of a sub-telomeric deletion in a patient with α-thalassemia and mental retardation was also achieved. β-thalassemic mutations in heterozygous state were found in 7.6% of the patients, who presented α-thalassemic clinical features (microcytosis and Hb A₂levels below 3.5%). Hematologic profiles for the α+ and α° genotypes were established for adult and pediatric patients. Hopefully, this work will provide guidelines for the detection of possible α-thalassemic carriers. It also highlights the collaborative work of hematologists, the biochemical and molecular biology laboratory and genetists, in order to provide appropriate genetic counseling. PMID:25919868

Molecular crowding is a new approach to stabilizing binding sites and improving molecular recognition. In this work, the concept was applied to the preparation of imprinted monolithic columns for CEC. The imprinted monolithic column was synthesized using a mixture of d-zopiclone (d-ZOP)(template), methacrylic acid, ethylene glycol dimethacrylate, and poly(methyl methacrylate) (PMMA) (molecular crowding agent). The resulting PMMA-based imprinted capillary was able to separate ZOP enantiomers in CEC mode. The resolution of enantiomer separation achieved on the d-ZOP-imprinted monolithic column was up to 2.09. Some polymerization factors, such as template-monomer molar ratio, functional monomer-cross-linker molar ratio and the composition of the porogen, on the imprinting effect of resulting molecularly imprinted polymer (MIP) monolithic column were systematically investigated. Chromatographic parameters, including pH values, the content of acetonitrile and the salt concentration on chiral separation were also studied. The results indicated the addition of PMMA resulted in MIPs with superior retention properties and excellent selectivity for d-ZOP, as compared to the MIPs prepared without addition of the crowding-inducing agent. The results revealed that molecular crowding is an effective method for the preparation of a highly efficient MIP stationary phase for chiral separation in CEC. PMID:25404035

An important challenge for scientific research is the production of artificial systems able to mimic the recognition mechanisms occurring at the molecular level in living systems. A valid contribution in this direction resulted from the development of molecular imprinting. By means of this technology, selective molecular recognition sites are introduced in a polymer, thus conferring it bio-mimetic properties. The potential applications of these systems include affinity separations, medical diagnostics, drug delivery, catalysis, etc. Recently, bio-sensing systems using molecularly imprinted membranes, a special form of imprinted polymers, have received the attention of scientists in various fields. In these systems imprinted membranes are used as bio-mimetic recognition elements which are integrated with a transducer component. The direct and rapid determination of an interaction between the recognition element and the target analyte (template) was an encouraging factor for the development of such systems as alternatives to traditional bio-assay methods. Due to their high stability, sensitivity and specificity, bio-mimetic sensors-based membranes are used for environmental, food, and clinical uses. This review deals with the development of molecularly imprinted polymers and their different preparation methods. Referring to the last decades, the application of these membranes as bio-mimetic sensor devices will be also reported. PMID:25196110

Ras proteins are classical members of small GTPases that function as molecular switches by alternating between inactive GDP-bound and active GTP-bound states. Ras activation is regulated by guanine nucleotide exchange factors that catalyze the exchange of GDP by GTP, and inactivation is terminated by GTPase-activating proteins that accelerate the intrinsic GTP hydrolysis rate by orders of magnitude. In this review, we focus on data that have accumulated over the past few years pertaining to the conformational ensembles and the allosteric regulation of Ras proteins and their interpretation from our conformational landscape standpoint. The Ras ensemble embodies all states, including the ligand-bound conformations, the activated (or inactivated) allosteric modulated states, post-translationally modified states, mutational states, transition states, and nonfunctional states serving as a reservoir for emerging functions. The ensemble is shifted by distinct mutational events, cofactors, post-translational modifications, and different membrane compositions. A better understanding of Ras biology can contribute to therapeutic strategies. PMID:26815308

The directive-based programming models are one solution for exploiting many-core coprocessors to increase simulation rates in molecular dynamics. They offer the potential to reduce code complexity with offload models that can selectively target computations to run on the CPU, the coprocessor, or both. In our paper, we describe modifications to the LAMMPS molecular dynamics code to enable concurrent calculations on a CPU and coprocessor. We also demonstrate that standard molecular dynamics algorithms can run efficiently on both the CPU and an x86-based coprocessor using the same subroutines. As a consequence, we demonstrate that code optimizations for the coprocessor also result in speedups on the CPU; in extreme cases up to 4.7X. We provide results for LAMMAS benchmarks and for production molecular dynamics simulations using the Stampede hybrid supercomputer with both Intel (R) Xeon Phi (TM) coprocessors and NVIDIA GPUs: The optimizations presented have increased simulation rates by over 2X for organic molecules and over 7X for liquid crystals on Stampede. The optimizations are available as part of the "Intel package" supplied with LAMMPS. (C) 2015 Elsevier B.V. All rights reserved.

The directive-based programming models are one solution for exploiting many-core coprocessors to increase simulation rates in molecular dynamics. They offer the potential to reduce code complexity with offload models that can selectively target computations to run on the CPU, the coprocessor, or both. In our paper, we describe modifications to the LAMMPS molecular dynamics code to enable concurrent calculations on a CPU and coprocessor. We also demonstrate that standard molecular dynamics algorithms can run efficiently on both the CPU and an x86-based coprocessor using the same subroutines. As a consequence, we demonstrate that code optimizations for the coprocessor also resultmore » in speedups on the CPU; in extreme cases up to 4.7X. We provide results for LAMMAS benchmarks and for production molecular dynamics simulations using the Stampede hybrid supercomputer with both Intel (R) Xeon Phi (TM) coprocessors and NVIDIA GPUs: The optimizations presented have increased simulation rates by over 2X for organic molecules and over 7X for liquid crystals on Stampede. The optimizations are available as part of the "Intel package" supplied with LAMMPS. (C) 2015 Elsevier B.V. All rights reserved.« less

Directive-based programming models are one solution for exploiting many-core coprocessors to increase simulation rates in molecular dynamics. They offer the potential to reduce code complexity with offload models that can selectively target computations to run on the CPU, the coprocessor, or both. In this paper, we describe modifications to the LAMMPS molecular dynamics code to enable concurrent calculations on a CPU and coprocessor. We demonstrate that standard molecular dynamics algorithms can run efficiently on both the CPU and an x86-based coprocessor using the same subroutines. As a consequence, we demonstrate that code optimizations for the coprocessor also result in speedups on the CPU; in extreme cases up to 4.7X. We provide results for LAMMPS benchmarks and for production molecular dynamics simulations using the Stampede hybrid supercomputer with both Intel® Xeon Phi™ coprocessors and NVIDIA GPUs. The optimizations presented have increased simulation rates by over 2X for organic molecules and over 7X for liquid crystals on Stampede. The optimizations are available as part of the "Intel package" supplied with LAMMPS.

We present a generalization of the giant molecular cloud identification problem based on cluster analysis. The method we designed, SCIMES (Spectral Clustering for Interstellar Molecular Emission Segmentation) considers the dendrogram of emission in the broader framework of graph theory and utilizes spectral clustering to find discrete regions with similar emission properties. For Galactic molecular cloud structures, we show that the characteristic volume and/or integrated CO luminosity are useful criteria to define the clustering, yielding emission structures that closely reproduce `by-eye' identification results. SCIMES performs best on well-resolved, high-resolution data, making it complementary to other available algorithms. Using 12CO(1-0) data for the Orion-Monoceros complex, we demonstrate that SCIMES provides robust results against changes of the dendrogram-construction parameters, noise realizations and degraded resolution. By comparing SCIMES with other cloud decomposition approaches, we show that our method is able to identify all canonical clouds of the Orion-Monoceros region, avoiding the overdivision within high-resolution survey data that represents a common limitation of several decomposition algorithms. The Orion-Monoceros objects exhibit hierarchies and size-line width relationships typical to the turbulent gas in molecular clouds, although `the Scissors' region deviates from this common description. SCIMES represents a significant step forward in moving away from pixel-based cloud segmentation towards a more physical-oriented approach, where virtually all properties of the ISM can be used for the segmentation of discrete objects.

The interrelationships of major clades within the Arthropoda remain one of the most contentious issues in systematics, which has traditionally been the domain of morphologists. A growing body of DNA sequences and other types of molecular data has revitalized study of arthropod phylogeny and has inspired new considerations of character evolution. Novel hypotheses such as a crustacean-hexapod affinity were based on analyses of single or few genes and limited taxon sampling, but have received recent support from mitochondrial gene order, and eye and brain ultrastructure and neurogenesis. Here we assess relationships within Arthropoda based on a synthesis of all well sampled molecular loci together with a comprehensive data set of morphological, developmental, ultrastructural and gene-order characters. The molecular data include sequences of three nuclear ribosomal genes, three nuclear protein-coding genes, and two mitochondrial genes (one protein coding, one ribosomal). We devised new optimization procedures and constructed a parallel computer cluster with 256 central processing units to analyse molecular data on a scale not previously possible. The optimal 'total evidence' cladogram supports the crustacean-hexapod clade, recognizes pycnogonids as sister to other euarthropods, and indicates monophyly of Myriapoda and Mandibulata.

All-component molecular dynamics studies were used to probe a library of oseltamivir molecularly imprinted polymer prepolymerization mixtures. Polymers included one of five functional monomers (acrylamide, hydroxyethylmethacrylate, methacrylic acid, 2-(triflouromethyl)acrylic acid, 4-vinylpyridine) and one of three porogens (acetonitrile, chloroform, methanol) combined with the crosslinking agent ethylene glycol dimethacrylate and initiator 2,2'-azobis(2-methylpropionitrile). Polymers were characterized by nitrogen gas sorption measurements and SEM, and affinity studies performed using radioligand binding in various media. In agreement with the predictions made from the simulations, polymers prepared in acetonitrile using either methacrylic or trifluoromethacrylic acid demonstrated the highest affinities for oseltamivir. Further, the ensemble of interactions observed in the methanol system provided an explanation for the morphology of polymers prepared in this solvent. The materials developed here offer potential for use in solid-phase extraction or for catalysis. The results illustrate the strength of this in silico strategy as a potential prognostic tool in molecularly imprinted polymer design. PMID:27043914

Approaches that combine experimental data and computational molecular dynamics (MD) to determine atomic resolution ensembles of biomolecules require the measurement of abundant experimental data. NMR residual dipolar couplings (RDCs) carry rich dynamics information, however, difficulties in modulating overall alignment of nucleic acids have limited the ability to fully extract this information. We present a strategy for modulating RNA alignment that is based on introducing variable dynamic kinks in terminal helices. With this strategy, we measured seven sets of RDCs in a cUUCGg apical loop and used this rich data set to test the accuracy of an 0.8 μs MD simulation computed using the Amber ff10 force field as well as to determine an atomic resolution ensemble. The MD-generated ensemble quantitatively reproduces the measured RDCs, but selection of a sub-ensemble was required to satisfy the RDCs within error. The largest discrepancies between the RDC-selected and MD-generated ensembles are observed for the most flexible loop residues and backbone angles connecting the loop to the helix, with the RDC-selected ensemble resulting in more uniform dynamics. Comparison of the RDC-selected ensemble with NMR spin relaxation data suggests that the dynamics occurs on the ps-ns time scales as verified by measurements of R(1ρ) relaxation-dispersion data. The RDC-satisfying ensemble samples many conformations adopted by the hairpin in crystal structures indicating that intrinsic