Grant aim

Obtaining of quantitative relationships between plant actions (especially with regard to antioxidant activity) and the genotype, chemical composition, and chemical structure of compounds with biological potency from plant parts (leaves, flowers, fruits, etc..)and obtaining of quantitative measures for assessing biodiversity and bioconservation potential as support for use in decision problems.

Grant activities

Month

Activity

01

Collecting of the information available in the literature at international level

02

03

Collecting of the information available in the literature at national and regional (Transylvania region) level

04

05

Creating a database: defining the data types and storing manner

06

07

Creating a database: establishing of the relationships among information

08

09

Design of the queries

10

11

Using of the database; seeking for data

12

13

Runs of 'training versus test' experiments

14

15

Crossover analysis

16

17

Intensive measures analysis

18

19

Extensive measures analysis

20

21

Relating intensive measures with experimental data

22

23

Relating extensive measures with experimental data

24

25

Capitalization of the knowledge

26

27

Grant results

Month

Result summary

01

Information in the literature were collected and analyzed. Two works were selected for further study:

Different databases, from which one is of interest: Herbalgram (http://cms.herbalgram.org/) refering 'usturoi' (garlic) - Garlic.html;

Other database (not specific to Romania): http://herb.umd.umich.edu/ - A Database of Foods, Drugs, Dyes and Fibers of Native American Peoples, Derived from Plants.

ATLAS OF SEEDS AND FRUITS OF CENTRAL AND EAST-EUROPEAN FLORA: The Carpathian Mountains Region (Vít Bojnanskũ and Agáta Fargaová, Springer 2007, 1046 pag.) - http://springerlink.com/content/978-1-4020-5361-0/#section=700133 - book which was actually aquisitioned later in electronic version from Springer

Three dimensional structure of the chemical compounds has a major implication for the biological activity. The complex type of the 3D structure of the chemical compounds were taken into a deeper study. Storing as the chemical structure as text file was done.
A table with four fields were created for storing the chemical compounds: Id (identifier), CID (PubChem ID), Names (text), Structure (3D, in full), Structure (3D, hydrogen depleted).
Image:

06

Relation between chemical structure and biological activity were considered. A study relating the potency of converting solar energy into chemical energy by the chlorophyls were conducted. 3D models of the cholorophyls structures were obtained and were used to relate with solar energy conversion efficiency.
Image:

A series of plants from the opposite case, whithout chlorophyls were taken into study: algae from algele Prototheca genus.From NucCore database were downloaded a number of 15 nucleotide sequences for the species: blaschkeae, cutis, moriformis, stagnora, ulmea, and wickerhamii - the last one as complete genome. An analysis of the gene sequences were conducted using literals alignment (see image below).

As can be seen above, a formula for alignments with at least all literals being used were obtained. The formula (see image above) contains two terms, from which the second one is the number of states from Bose-Einstein Condensate (see for instance: Einstein A, 1925. Quantum theory of the monatomic ideal gas. Meeting reports of the Prussian Academy of Sciences 1:3-10) and thus it seems that sequence alignment is a factorization of the condensate matter.
The results of sequence alignment is given below (see image below).

Column 'probabilitate' from the figure above give the exact probability of an alignment by chance, based on the sequences taken into the study.
As can be seen, for about half of these pairs, the alignment by chance is rejected and thus can be sayd that these species has certainly a common ancestor.

08

A study regarding the use of the 2X2 contingency (observed vs. model) and their linkage measures were conducted (see image below).

A wider analysis regarding the contingency in effect of essential oil extracts (mixture of compounds) from plant species on bacteria species were conducted. The posibility of factorization of the effect were explored.
A paper capitalized later the research results (image below).

As can be seen from the picture below, the factorizing of the effect reveals a different spread of factors influence: much wider is the spread of 'plant' factor than the spread of 'bacteria' factor.

10

Storing of the phylogeny for a certain plant can become a 'hard problem' as can be seen from the study conducted on this subject. By using the same Prototheca genus a study regarding its classification were conducted. As the literature shows, exists different classifications (see image below, from resulted paper).

The two classification systems used for comparision were stored locally in two databases (see images below).

Same problem of plant extracts (or the effect of mixing for chemical compounds) were taken into the analysis in order to reveal a similarity based on chemical composition (derived from plant metabolism).
The analysis were capitalized in a paper (see image below).

An important results were obtained: a classification based onmetabolism, which reveal a totally different one than the one from phylogeny (see image below).

12

By using a recenty feature of Google Chrome scripting language (so called HTML 5.0 standard, released by Google on August 2011, used here in November 2011) a well known old problem time consuming were solved: uploading of the multiple files to a server from 'one click' (or maybe two), not file-by-file, colectively selected and uploaded. It is a very important problem, because when working with chemical compounds, the compounds must bee find in different databases, downloaded locally, checked, optimized, and uploaded to a local database for analysis. And this procedure should be done in one click, but the security reasons existing in the previous versions of the HTML language denied this option, of multiple selection of files to be uploaded.
The image below gives the developed script using this feature (see image below).

A valuable database containing a large number of experimental measurements were found. It is Dr. Duke's Phytochemical and Ethnobotanical Databases (http://www.ars-grin.gov/duke/). This database has been used to extract useful information. A series of steps has been followed in order to access the data in a relational database manner. These steps are:

All these programs are online available (http://l.academicdirect.org/Horticulture/GAs/62371/Duke_db/) and the image below is a snapshoot of the portal created:

14

As continuation of the analysis conducted in the previous month, identification of the groups of data in such (as Duke's) databases were the subject of the investigation. For these particular cases, when a large block of data is available, seeking for linearities is possible. A procedure for seeking these linearities were developed. Two online applications are available as well as their results of analysis of the Duke's database:

Identified blocks of data meeting the defined criteria (for instance for a simple linearity dependence, we should consider at least 6 pairs of data) further analysis can be conducted using commercial software such Statistica or Excel.
The image below contains a such kind of analysis based on the filter (Organism='(ipr mus)')and(Measurement='') and is for intraperitoneal (administration into the peritoneal cavity) on mouse (of about 25g weight) and relates lowest lethal dose (LDlo) with 50% survival chance lethal dose (LD50) and the model given is on both scales expressed though logarithms.

15

Data treatment were taken into study in order to obtain the coefficients for crossover. Following list iterates the steps:

Obtaining the pairs of experimental data;

Obtaining of the Training+Test superset (in which all values of each pair has a numerical measurement);

Seeking for outliers; a outlier identification were depicted below; see for instance that both outlier detection and normalization of the data can be considered as being paired procedures (one without the other has no sense);

Checkig of accomplishing of some normalization of the data; uses Chi-Square test, Anderson-Darling test, Kolmogorov-Smirnov test;

Split of the data in Traing and Test sets in a ratio of Training:Test = 2:1;

Using of the Training model for training (obtaining of the regression coefficients);

Using of the Test data for testing the model (calculation of the correlation under the supposition of the model obtained from training);

16

In order to give a true estimate of the crossover when only a part (M < N) of a paired data (of size N) is taken all possible extactions should be made, and the the average result is a true estimate of the cross validation. In order to do this, the algorithm described in the paper [Phillip J. CHASE, 1970. Algorithm 382: Combinations of M out of N Objects [G6]. Communications of the Association for Computing Machinery 13(6):368-368] were used to implement the succesive draws of M elements from a set of N. The procedure has ben further tested on the full (15 pairs of data) and the normalized data (14 pairs of data) from the Duke's database. The results are given in the next image.

Results shows that the perpendicular offsets performs worster in the presence of the outliers.

17

Intensive measures for diversity were taken into analysis. Two measures were selected for further analysis: Renyi Entropies family (left in the image below) as being representative for observed diversity and Fisher's alpha (right in the image below) as being reprezentative for estimated diversity [Refs: Rényi, A. 1961. On measures of information and entropy. Proceedings of the 4th Berkeley Symposium on Mathematics, Statistics and Probability, 547-561; Fisher, RA. 1943. Part 3. A theoretical distribution for the apparent abundance of different species. Journal of animal ecology 12:54-58].

where N = sum(Ni) and S = count(Ni) and Ni are the number of individuals observed for each species in part.

18

Mobility. A series of conclusions has been drawn from the study visit in Germany and Holland, given below:

'German' solution to competitiveness is seen at the local government (the State) to engage in partnership offerings (tools) to private owners (banks and investment funds) and public (local authorities), institutes of education and research (including universities), and entrepreneurs in defining projects of research and innovation policies (the so-called 'clusters') in consultancy projects and dissemination (through limited liability companies established joint ownership), and knowledge transfer projects in innovation (managed and funded by its first two instruments);

'Dutch' solution to competitiveness is to create a robust national financing activities mainly aimed at shortening innovation and technology transfer is addressed mostly small and medium size investors through a grant process in 2 steps the success rate from phase 1 funding (approximately 10% of total funding) in phase 2 funding is between 30% and 50%;

Both solutions aimed at attracting private sector research and innovation by funding almost exclusively to innovation component.

19

Among with Fisher's method, other methods of diversity estimation were taken into study, namely:

The rarefaction method for estimating the diversity from the sample were implemented (see image below).

The method where used to draw the rarefaction curve for the sample of flowers drawn from the field (v:14; r:10; o:9; y:10; w:1). The analysis shown a nice curve but with a very intensive calculation (see image below).

21

The same study from previous month were implemented with the result from combinatorics giving the number of rarefied colors. The implemented code is given below:

With this result, a much easier and faster rarefaction curve were obtained (see below):

22

The use of the entropy measures (Renyi) to compare the genus based on chemical composition were conducted. The results are given below:

The use of the molecular families of deschiptors is a manner to relate the chemical structure with the biological activity. A study regarding the distribution of the correlation coefficients were conducted in order to identify the type of the distribution for the set of descriptors providing an agreement between the chemical structure and the observed property by using molecular descriptors obtained via MDFV methodology [Bolboaca SD, Jäntschi L, 2009. Comparison of QSAR Performances on Carboquinone Derivatives, TheScientificWorldJO 9(10):1148-1166. DOI: 10.1100/tsw.2009.131]. The toxicity measured at different stages of development for the species Arbacia punctulata, Dinophilus gyrociliatus, Sciaenops ocellatus, Opossum shrimp and Ulva fasciata. A number of 24 observed biological activities for a number of 8 compounds served in this investigation [U.S. Geological Survey, Marine Ecotoxicology Research Station, Texas A&M University-Corpus Christi, Center for Coastal Studies. Development of marine sediment toxicity for ordnance compounds and toxicity identification evaluation studies at select naval facilities. http://web.ead.anl.gov/ecorisk/issue/pdf/tox_marine_sed.pdf]. The results shown a partition of distribution functions as below.

A study continuing the research from PhD Thesis in Horticulture (2010) were conducted in order to estimate the moments of evolutions in different selection and survival strategies. The study reveals that the relative moments of evolutions are shaped by a one-parameter degeneration of the log-Pearson type III distribution. The results conducted on a given data sample allowed to extract the parameters of these distributions (see image below).

Further capitalization of the knowledge from the study conducted in the project were regarding the distribution of the seeds sizes (data from the buyed book describing the species from Transylvania region). A very nice picture of the seeds sizes distribution were obtained (see image below).

Further capitalization of the knowledge from the study conducted in the project were regarding the effect of the leverage and of the influential on the quality of the structure-activity relationships.
The study shows that the Di model has the biggest change relative to the initial model (see image below)