This is machine translation

Mouseover text to see original. Click the button below to return to the English verison of the page.

Note: This page has been translated by MathWorks. Please click here
To view all translated materals including this page, select Japan from the country navigator on the bottom of this page.

Translate This Page

MathWorks Machine Translation

The automated translation of this page is provided by a general purpose third party translator tool.

MathWorks does not warrant, and disclaims all liability for, the accuracy, suitability, or fitness for purpose of the translation.

Distribution Plots

Overview

Distribution plots visually assess the
distribution of sample data by comparing the empirical distribution
of the data with the theoretical values expected from a specified
distribution . Use distribution plots in addition to more formal hypothesis
tests to determine whether the sample data comes from a specified
distribution.

Quantile-quantile (q-q) plots assess
whether two sets of sample data come from the same distribution family,
and is robust with respect to differences in location and scale. For
syntax options, see qqplot.

Cumulative distribution plots display
the empirical cumulative distribution function (cdf) of the sample
data for visual comparison to the theoretical cdf of a specified distribution.
For syntax options, see cdfplot, ecdf, and stairs.

You can create distribution plots for distributions other than
normal, or explore the distribution of censored data, using probplot.

Normal Probability Plots

Normal probability plots are used to assess whether data comes
from a normal distribution. Many statistical procedures make the assumption
that an underlying distribution is normal, so normal probability plots
can provide some assurance that the assumption is justified, or else
provide a warning of problems with the assumption. An analysis of
normality typically combines normal probability plots with hypothesis
tests for normality.

This example generates a data sample of 25 random numbers from
a normal distribution with mu = 10 and sigma
= 1, and creates a normal probability plot of the data.

The plus signs plot the empirical probability versus the data
value for each point in the data. A solid line connects the 25th and
75th percentiles in the data, and a dashed line extends it to the
ends of the data. The y-axis values are probabilities
from zero to one, but the scale is not linear. The distance between
tick marks on the y-axis matches the distance
between the quantiles of a normal distribution. The quantiles are
close together near the median (probability = 0.5) and stretch out symmetrically as you move away from
the median.

In a normal probability plot, if all the data points fall near
the line, an assumption of normality is reasonable. Otherwise, the
points will curve away from the line, and an assumption of normality
is not justified. For example, the following generates a data sample
of 100 random numbers from an exponential distribution with mu
= 10, and creates a normal probability plot of the data.

x = exprnd(10,100,1);
normplot(x)

The plot is strong evidence that the underlying distribution
is not normal.

Quantile-Quantile Plots

Quantile-quantile plots are used to determine whether two samples
come from the same distribution family. They are scatter plots of
quantiles computed from each sample, with a line drawn between the
first and third quartiles. If the data falls near the line, it is
reasonable to assume that the two samples come from the same distribution.
The method is robust with respect to changes in the location and scale
of either distribution.

The following example generates two data samples containing
random numbers from Poisson distributions with different parameter
values, and creates a quantile-quantile plot. The data in x is
from a Poisson distribution with lambda = 10, and
the data in y is from a Poisson distribution with lambda
= 5.

x = poissrnd(10,50,1);
y = poissrnd(5,100,1);
qqplot(x,y);

Even though the parameters and sample sizes are different, the
approximate linear relationship suggests that the two samples may
come from the same distribution family. As with normal probability
plots, hypothesis tests can provide additional justification for such
an assumption. For statistical procedures that depend on the two samples
coming from the same distribution, however, a linear quantile-quantile
plot is often sufficient.

The following example shows what happens when the underlying
distributions are not the same. Here, x contains
100 random numbers generated from a normal distribution with mu
= 5 and sigma = 1, while y contains
100 random numbers generated from a Weibull distribution with A
= 2 and B = 0.5.

x = normrnd(5,1,100,1);
y = wblrnd(2,0.5,100,1);
qqplot(x,y);

These samples clearly are not from the same distribution family.

Cumulative Distribution Plots

An empirical cumulative distribution function (cdf) plot shows
the proportion of data less than each x value,
as a function of x. The scale on the y-axis
is linear; in particular, it is not scaled to any particular distribution.
Empirical cdf plots are used to compare data cdfs to cdfs for particular
distributions.

The following example compares the empirical cdf for a sample
from an extreme value distribution with a plot of the cdf for the
sampling distribution. In practice, the sampling distribution would
be unknown, and would be chosen to match the empirical cdf.

Other Probability Plots

A probability plot, like the normal probability plot, is just
an empirical cdf plot scaled to a particular distribution. The y-axis
values are probabilities from zero to one, but the scale is not linear.
The distance between tick marks is the distance between quantiles
of the distribution. In the plot, a line is drawn between the first
and third quartiles in the data. If the data falls near the line,
it is reasonable to choose the distribution as a model for the data.

To create probability plots for different distributions, use
the probplot function.

The following example assesses two samples, one from a Weibull
distribution with A = 3 and B = 3,
and one from a Rayleigh distribution with B = 3,
to see if either distribution may have come from a Weibull population.