2.
• Scien3sts
and
plant
breeders
want
a
few
hundred
germplasm
accessions
to
evaluate
for
a
par3cular
trait.
• How
does
the
scien3st
select
a
small
subset
likely
to
have
the
useful
trait?
• More
than
560
000
wheat
accessions
in
genebanks
worldwide.
3
Slide
adopted
from
a
slide
by
Ken
Street,
ICARDA
(FIGS
team)

4.
• The
scien3st
or
the
breeder
need
a
smaller
subset
to
cope
with
the
ﬁeld
screening
experiments.
• A
common
approach
is
to
create
a
so-­‐called
core
collec,on.
Sir
OVo
H.
Frankel
(1900-­‐1998)
proposed
that
a
limited
or
"core
collec3on"
could
be
established
from
an
exis3ng
collec3on.
With
minimum
similarity
between
its
entries
the
core
collec3on
is
of
limited
size
and
chosen
to
represent
the
gene,c
diversity
of
a
large
collec3on,
a
crop,
a
wild
species
or
group
of
species
5
(1984)
.

5.
• Given
that
the
trait
property
you
are
looking
for
is
rela3vely
rare:
• Perhaps
as
rare
as
a
unique
allele
for
one
single
landrace
cul3var...
• Ge_ng
what
you
want
is
largely
a
ques3on
of
LUCK!
6
Slide
adopted
from
a
slide
by
Ken
Street,
ICARDA
(FIGS
team)

7.
Objec,ve
of
this
study:
– Explore
climate
data
as
a
predic3on
model
for
“pre-­‐screening”
of
crop
traits
BEFORE
full
scale
ﬁeld
trials.
– Iden3ﬁca3on
of
landraces
with
a
higher
probability
of
holding
an
interes3ng
trait
property.
8

8.
• Primi,ve
crops
and
tradi,onal
landraces
are
the
source
of
exo3c
traits,
crop
proper3es.
• Traits
from
landraces
are
an
interes3ng
source
of
novel
traits
for
improvement
of
modern
crops.
• Landraces
are
ogen
not
described
for
the
economically
valuable
trait
in
ques3on.
• Iden3ﬁca3on
of
crop
traits
are
ogen
the
result
of
a
larger
ﬁeld
trial
screening
project
(thousands
of
individual
plants).
• Large
scale
ﬁeld
trials
are
very
costly
(land
area
and
human
working
hours).
9

9.
The
underlying
assump3on
is
that
the
climate
at
the
original
source
loca3on,
where
the
landrace
was
developed
during
long-­‐term
tradi3onal
cul3va3on,
is
correlated
to
trait.
The
aim
is
to
build
a
computer
model
explaining
the
crop
trait
score
(dependent
variables)
from
the
climate
data
(independent
variables).
10

14.
The
climate
data
is
extracted
from
the
WorldClim
dataset.
hVp://www.worldclim.org/
Data
from
weather
sta3ons
worldwide
are
combined
to
a
con3nuous
surface
layer.
Climate
data
for
each
landrace
is
extracted
from
this
surface
layer.
Precipita3on:
20
590
sta3ons
Temperature:
7
280
sta3ons
15

15.
This
study
is
part
of
a
new
method
to
predict
crop
traits
of
primi3ve
cul3vated
material
from
climate
variables
by
using
mul3variate
sta3s3cal
methods.
16

16.
FIGS
The
FIGS
technology
takes
much
of
the
guess
work
out
of
choosing
which
accessions
are
most
likely
to
contain
the
speciﬁc
characteris3cs
being
sought
by
plant
breeders
to
improve
plant
produc3vity
across
numerous
challenging
environments.
hVp://www.ﬁgstraitmine.org/
17
17

20.
• The
fundamental
ecological
niche
of
an
organism
was
formalized
by
Hutchinson[1]
in
1957
as
a
mul3dimensional
hypercube
deﬁning
the
ecological
condi3ons
that
allow
a
species
to
exist.
• Full
understanding
of
all
the
environmental
condi3ons
for
any
organism
is
a
monumental
task
[2].
• A
computer
model
of
the
occurrence
locali3es
together
with
associated
environmental
condi3ons
such
as
rainfall,
temperature,
day
length
etc.,
provides
an
approxima3on
of
the
fundamental
niche.
• Popular
soCware
implementa3ons
for
modeling
the
ecological
niche
include
openModeller,
MaxEnt,
BioCLIM,
DesktopGARP,
etc.
21
George
Evelyn
Hutchinson
(1903
–
1991)

21.
A flexible, user friendly, cross-
platform environment where the entire process of a
fundamental niche modeling experiment can be
carried out.
Input: species occurrence and environmental data.
Output: a fundamental niche model and projection
of the model into an environmental scenario.
hVp://openmodeller.sourceforge.net/
22

23.
– The
ini3al
model
is
developed
from
the
training
set
– Fine
tuning
of
model
parameters
and
se_ngs
– No
model
can
ever
be
absolutely
correct!
– A
simula3on
model
can
only
be
an
approxima3on
– A
model
is
always
created
for
a
speciﬁc
purpose!
– The
simula3on
model
is
applied
to
make
predic3ons
based
on
new
fresh
data
– Be
aware
of
extrapola3on
24

24.
– For
the
ini3al
calibra3on
or
training
step.
– Further
calibra3on,
tuning
step
– Ogen
cross-­‐valida3on
on
the
training
set
is
used
to
reduce
the
consump3on
of
raw
data.
– For
the
model
valida3on
or
goodness
of
ﬁt
tes3ng.
– External
data,
not
used
in
the
model
calibra3on.
25

26.
Name
of
the
sta3s3c
Symbol
Range
*
Correla3on
coeﬃcient
r
-­‐1
to
1
*
Coeﬃcient
of
determina3on
r2
0
to
1
•
A
number
of
diﬀerent
coeﬃcients
are
developed
to
measure
correla3on
in
diﬀerent
situa3ons.
•
The
best
known
is
the
Pearson
product-­‐
moment
correla,on
coeﬃcient.
•
The
indicates
the
strength
and
direc3on
of
a
linear
rela3onship
between
two
random
variables.
•
The
indicates
how
well
future
outcomes
are
The
covariance
of
the
two
variables
is
divided
by
the
likely
to
be
predicted
by
a
sta3s3cal
model.
product
of
their
standard
devia3ons.
27

27.
The
distance
between
the
model
(predic3ons)
and
the
reference
values
(valida3on)
is
the
residuals.
Example
of
a
bad
model
calibra3on
Cross-­‐valida3on
indicates
the
appropriate
model
Be
aware
of
over-­‐ﬁ_ng!
NB!
Model
valida3on!
complexity.
28

32.
From
a
total
of
19
landrace
accessions
included
in
the
dataset,
only
4
of
the
landrace
accessions
included
geo-­‐referenced
coordinates
in
the
NordGen
SESTO
database.
10
accessions
were
geo-­‐referenced
from
the
reported
place
name
and
descrip3ons
of
the
original
gathering
site
included
in
SESTO
and
other
sources.
For
5
accessions
there
were
not
enough
informa3on
available
to
locate
the
original
gathering
loca3on.
Right
side
illustra.on
Example
of
georeferencing
for
NGB9529,
landrace
reported
as
origina@ng
from
Lyderupgaard
using
KRAK.dk
and
maps.google.com
33

34.
Score
plots
The
observa3ons
made
at
Priekuli
(Latvia)
are
separated
from
the
observa3ons
made
at
Bjørke
(Norway)
and
Landskrona
(Sweden)
in
PC1
and
PC2.
The
combined
observa3ons
from
each
year
(2002
and
2003)
are
less
separated.
The
two
replicate
series
are
NOT
separated
35

35.
The
bi-­‐plot
shows
heading
days
and
ripening
days
as
the
most
inﬂuen3al
trait
variables
for
the
separa3on
of
the
observa3ons
from
the
diﬀerent
observa3on
loca3ons.
Length
of
plant
par3cipate
in
spreading
out
the
scores
(in
PC1
and
PC2),
but
is
less
ac3ve
in
the
separa3on
of
the
groups.
The
inﬂuence
plot
(residuals
against
leverage)
shows
sample
observed
at
Priekuli
in
2003
(replicate
2)
with
a
very
high
leverage
-­‐
well
separated
from
the
“data
cloud”.
Ager
looking
into
the
raw
data
(see
next
slide),
this
data
point
was
removed
as
outlier
(set
to
NaN).
36

36.
Sample
(FRO)
observed
at
Priekuli
in
2003
(replicate
2)
has
the
lowest
score
for
harvest
index
in
the
en3re
dataset.
Ager
looking
into
the
raw
data
(see
the
table
above),
this
observa3on
point
was
removed
as
outlier
(set
to
NaN).
37

37.
The
ini3al
PCA
analysis
of
the
climate
data
showed
a
nice
spread
of
the
scores.
No
surprises.
The
inﬂuence
plot
iden3ﬁed
sample
(NOR)
as
a
mild
outlier.
I
decided
to
keep
this
sample,
but
to
keep
an
eye
out
for
it
in
the
mul3-­‐way
analysis.
38

41.
tmin
tmax
prec
Mode
3
(climate
variables)
have
very
diﬀerent
range
of
numerical
values
(tmin,
tmax,
and
prec).
Scaling
across
mode
3
is
thus
applied
to
the
mul3-­‐
way
models.
Leg
is
displayed
the
box-­‐plot
for
the
3-­‐way
data
unfolded
as
to
keep
the
dimensions
of
Scaling
across
mode
3
mode
3.
The
3-­‐way
climate
data
was
reasonably
well
described
by
a
PARAFAC
model
of
two
components.
42

45.
•
The
ini3al
PARAFAC
models
calibrated
from
the
4-­‐way
trait
dataset
failed
to
converge
to
any
good
models.
The
core-­‐consistency
remained
very
low.
•
The
problem
showed
to
be
lack
of
systema3c
independent
varia3on
between
instances
of
mode
3
(observa3on
years)
and
mode
4
(observa3on
loca3ons)
•
A
two
component
PARAFAC
model
was
chosen
for
the
new
3-­‐way
trait
dataset.
(NOR)
was
iden3ﬁed
as
a
mild
outlier
from
the
inﬂuence
plot.
No3ce
that
both
replica3ons
are
located
in
the
same
part
of
the
plot.
And
that
they
(together)
are
not
isolated
from
the
“data
cloud”.
46

46.
PARAFAC
split-­‐half
(mode
1)
analysis:
The
two
PARAFAC
models
each
calibrated
from
two
independent
split-­‐half
subsets,
both
converge
to
a
very
similar
solu3on
as
the
model
calibrated
from
the
complete
dataset.
The
PARAFAC
model
is
thus
a
general
and
stable
model
for
the
scope
of
Scandinavia.
47

47.
Further
search
for
any
good
PARAFAC
split-­‐half
for
the
climate
dataset:
A
systema3c
recording
of
results
from
10
diﬀerent
split-­‐half
alterna3ves
resulted
in
two
good
split-­‐half.
The
PARAFAC
model
for
the
climate
data
is
thus
reasonable
general
(for
Scandinavia),
but
less
stable
than
the
model
for
the
3-­‐way
trait
data.
48

50.
• Ogen
the
cri3cal
levels
(α)
for
the
p-­‐value
is
set
as
0.05,
0.01
and
0.001.
• For
the
modeling
of
14
samples
(landraces)
gives:
– 12
degrees
of
freedom
for
the
correla3on
tests
– One-­‐tailed
test
(looking
only
at
posi3ve
correla3on
of
predic3ons
versus
the
reference
values).
– A
coeﬃcient
of
determina3on
(r2)
larger
than
0.56
is
signiﬁcant
at
the
0.001
(0.1%)
level
for
14
values/samples.
Many
introductory
text
books
on
sta3s3cs
include
a
table
of
Cri3cal
Values
for
Pearson’s
r.
51

52.
• Latvia
2002
(LY11)
– May
2002
was
extreme
dry
in
Priekuli.
– June
2002
was
extreme
wet
in
Priekuli.
– The
wet
June
caused
germina3on
on
the
spikes
for
many
of
the
early
varie3es.
• Landskrona
2003
(LY32)
– June
2003
was
extreme
dry
in
Landskrona.
– June
was
the
3me
for
grain
ﬁlling
here.
• Too
extreme
for
the
genotype
to
be
“normally”
expressed
?
• Too
large
eﬀect
from
“G
by
E”
interac3on
?
53

62.
• The first dataset I started to work with is a “FIGS”
dataset with genebank accessions of Barley
(Hordeum vulgare ssp. vulgare) collected from
different countries worldwide and tested for
susceptibility of net blotch infection. Net blotch is
a common disease of barley caused by the fungus
Pyrenophora teres.
• The barley plants were inoculated with the fungus
and the percentage of the leaves infected with the
disease was normalized to an interval scale (1 to 9).
• 1-3 are basically resistant  group 1
• 4-6 are intermediate  group 2
• 7-9 are susceptible  group 3
66