Transcription

1 Aust. N. Z. J. Statist. 42(3), 2000, PRACTICAL MAXIMUM PSEUDOLIKELIHOOD FOR SPATIAL POINT PATTERNS (with Discussion) ADRIAN BADDELEY 1 AND ROLF TURNER 2 University of Western Australia Summary This paper describes a technique for computing approximate maximum pseudolikelihood estimates of the parameters of a spatial point process. The method is an extension of Berman & Turner s (1992) device for maximizing the likelihoods of inhomogeneous spatial Poisson processes. For a very wide class of spatial point process models the likelihood is intractable, while the pseudolikelihood is known explicitly, except for the computation of an integral over the sampling region. Approximation of this integral by a finite sum in a special way yields an approximate pseudolikelihood which is formally equivalent to the (weighted) likelihood of a loglinear model with Poisson responses. This can be maximized using standard statistical software for generalized linear or additive models, provided the conditional intensity of the process takes an exponential family form. Using this approach a wide variety of spatial point process models of Gibbs type can be fitted rapidly, incorporating spatial trends, interaction between points, dependence on spatial covariates, and mark information. Key words: area-interaction process; Berman Turner device; Dirichlet tessellation; edge effects; generalized additive models; generalized linear models; Gibbs point processes; GLIM; hard core process; inhomogeneous point process; marked point processes; Markov spatial point processes; Ord s process; pairwise interaction; profile pseudolikelihood; spatial clustering; soft core process; spatial trend; S-PLUS; Strauss process; Widom Rowlinson model. 1. Introduction This paper describes a computational device for rapidly fitting statistical models to spatial point patterns. Applications are shown in Section 10. Datasets may consist of points in two or three dimensions or in space time; the points may be classified into different types or carry auxiliary observations ( marks ). Additionally there may be spatial covariates, such as topography or another spatial pattern observed in the same region. Realistic models for such data should incorporate both spatial inhomogeneity ( trend ) and dependence between points ( interaction such as clustering or regularity). Ogata & Tanemura (1981, 1984, 1985, 1986) Received August 1998; revised May 1999; accepted June Author to whom correspondence should be addressed. 1 Dept of Mathematics and Statistics, University of Western Australia, Nedlands, WA Dept of Mathematics and Statistics, University of New Brunswick, Fredericton, NB, Canada E3B 5A3. Acknowledgments. The authors warmly thank an anonymous referee and Y.C. Chin, M. Tanemura, H. Wilson, A. Särkkä, A. Penttinen and M. Van Lieshout for helpful comments, P.J. Diggle, V. Isham, Y. Ogata and A. Särkkä for supplying data files, and R.D. Harkness, M. Numata and M. Tanemura for their kind permission to use the data. Data for Figure 12 were supplied by Professor R.D. Harkness, Professor V. Isham and Dr A. Särkkä. The authors research was supported by the Australian Research Council and the Natural Sciences and Engineering Research Council of Canada.. Published by Blackwell Publishers Ltd, 108 Cowley Road, Oxford OX4 1JF, UK and 350 Main Street, Malden MA 02148, USA

2 284 ADRIAN BADDELEY AND ROLF TURNER and Penttinen (1984) developed methods for maximum likelihood estimation for such models, and applied them to real data. Recent advances have been made by Geyer & Møller (1994), Geyer (1998) and others. However, maximum likelihood is computationally intensive, and employs simulation algorithms that are specific to the chosen model. It is even more costly for inhomogeneous spatial patterns because of increased parameter dimensionality and complexity of simulation. This militates against the modern statistical practice of fitting several alternative models to the same dataset and introducing smooth functions as model terms. Few writers apart from Ogata & Tanemura (1986) have fitted inhomogeneous point process models, other than the inhomogeneous Poisson process, to real spatial data. Berman & Turner (1992) introduced a technique for maximizing the likelihoods of (a) general point processes in time, and (b) inhomogeneous Poisson processes in d-dimensional space. The intensity or conditional intensity of the process was assumed to be loglinear in the parameters. They approximated the log-likelihood by a finite sum that had the same analytical form as the (weighted) log-likelihood of a generalized linear model with Poisson responses. The approximate likelihood could then be maximized using existing software for generalized linear models. Related ideas have been explored by Lindsey (1992, 1995, 1996) and Lindsey & Mersch (1992). In this paper we extend the Berman Turner device to a much larger class of spatial point process models, namely Gibbs point processes with exponential family likelihoods. We obtain an approximation to the pseudolikelihood (Besag, 1975; Besag, 1977; Jensen & Møller, 1991) rather than to the likelihood. The maximum pseudolikelihood estimator is a practical alternative to the maximum likelihood estimator (MLE), satisfies unbiased estimating equations, and is consistent and asymptotically normal under suitable conditions. The MLE is not necessarily optimal here because the usual asymptotic theory is not applicable. Under reasonable assumptions (Diggle et al., 1994) the maximum pseudolikelihood normal equations are a special case of the Takacs Fiksel estimating equations, an application of the method of moments (Fiksel, 1984; Takacs, 1986; Fiksel, 1988). Using the extended Berman Turner device and standard statistical software, we can rapidly fit quite complex spatial stochastic models involving spatial trends and spatial covariates as well as interactions between points. The plan of the paper is as follows. Sections 2 and 3 give definitions and background. Section 4 presents our extension of the Berman Turner computational device. Section 5 treats a simple example. Section 6 develops applications of the method to specific models of spatial interaction; Section 7 develops applications for spatial inhomogeneity, and Section 8 develops applications for marked point patterns. Section 9 treats some issues in estimation and inference. The method is applied to real datasets in Section 10. Section 11 reports a simulation study of the accuracy of the technique Likelihoods 2. Background and definitions The data consist of a spatial point pattern x observed in a bounded region W of space. Thus x ={x 1,...,x n }, where the number of points n 0 is not fixed, and each x i is a point in W. The region W is a known, bounded subset of d-dimensional space R d, where d 1. Sections and 8

3 PRACTICAL MAXIMUM PSEUDOLIKELIHOOD FOR SPATIAL POINT PATTERNS 285 discuss extensions of this basic setup to incorporate spatial covariates and marked points, respectively. The data x are assumed to be a realization of a random point process X in W. Typically the null model (or the null hypothesis) is the homogeneous Poisson point process (Cox & Isham, 1980; Kingman, 1993). Other models are specified by their likelihood with respect to the Poisson process. Thus we assume X has a probability density f(x; θ) with respect to the distribution of the Poisson process with intensity 1 on W. Additionally we assume f(x; θ) > 0 implies f(y; θ) > 0 for all subsets y x. This is the class of Gibbs processes on W (see Preston, 1976; Ripley, 1989; Stoyan, Kendall & Mecke, 1995). The distribution is governed by a vector parameter θ ranging over a set R p, see Cox & Isham (1980), Geyer (1998) Basic models Specific models are detailed in Sections 6 8, but it is instructive to list three important examples. First, the homogeneous Poisson process with intensity λ>0 has density f(x; λ) = e (λ 1) W λ n(x), where n(x) denotes the number of points in x and W is the volume of W. This yields the maximum likelihood estimate ˆλ = n(x)/ W. Second, consider the inhomogeneous Poisson process on W with rate or intensity function λ : W R; see Cox & Isham (1980), Kingman (1993). In statistical models, the intensity λ θ (u) will depend on θ to reflect spatial trend (a change in intensity across the region of observation) or dependence on a covariate. The density is n(x) f(x ; θ) = i=1 ( ) λ θ (x i ) exp [λ θ (u) 1] du. (1) W Maximization of (1) generally requires iterative optimization methods. Third, the pairwise interaction process on W with trend or activity function b θ : W R and interaction function h θ : W W R has density n(x) f(x ; θ) = α(θ) i=1 b θ (x i ) i<j h θ (x i,x j ), (2) where α(θ) > 0 is the normalizing constant. Conditions must be imposed on b θ and h θ to ensure the density is well defined and integrable: in particular h θ (u, v) = h θ (v, u). Examples are given in Section 6. See the excellent surveys by Ripley (1988, 1989). Pairwise interaction models are suitable for the data in Figures 6 and 12, as shown by Ogata & Tanemura (1981, 1984, 1985, 1986), Särkkä (1993), and Takacs & Fiksel (1986). The terms b θ (x i ) in (2) influence the intensity of points and introduce a spatial trend if b θ () is not constant. The terms h θ (x i,x j ) introduce dependence ( interaction ) between different points of the process X. If h θ 1 the model reduces to an inhomogeneous Poisson process with intensity function b θ (u). The normalizing constant α(θ) in (2) is generally an intractable function of θ. Methods for approximating α() and maximizing likelihood include functional expansions of α(),

4 286 ADRIAN BADDELEY AND ROLF TURNER Monte Carlo integration, and analogues of E-M and stochastic approximation (Ogata & Tanemura, 1981, 1984, 1985, 1986; Penttinen, 1984; Moyeed & Baddeley, 1991; Geyer, 1998). Most models considered in this paper are pairwise interaction processes, but we also discuss the Widom Rowlinson ( area-interaction ) model (Section 6.2) and Ord s model (Section 6.3). 3. Pseudolikelihood It is generally difficult to evaluate and maximize the likelihoods of point processes other than the inhomogeneous Poisson process (1). Even simple exponential family models such as the pairwise interaction processes (2) include a normalizing constant which is an intractable function of θ. An alternative to the likelihood function is the pseudolikelihood (Besag, 1975, 1977; Besag, Milne & Zachary, 1982; Jensen & Møller, 1991) which we describe here; see Fiksel (1984, 1988); Takacs (1986); Ripley (1988, 1989); Särkkä (1993); Diggle et al. (1994) for other applications. Originally Besag (1975, 1977) defined the pseudolikelihood of a finite set of random variables X 1,...,X n as the product of the conditional likelihoods of each X i given the other variables {X j,j i}. This was extended (Besag, 1977; Besag et al., 1982) to point processes, for which it can be viewed as an infinite product of infinitesimal conditional probabilities Conditional intensity To construct the pseudolikelihood we need the (Papangelou) conditional intensity λ(u; x) of X at a location u W. This may be loosely interpreted as giving the conditional probability that X has a point at u given that the rest of the process coincides with x. See Kallenberg (1984) for an informal introduction, or Glötzl (1980a,b), Kallenberg (1983), Kozlov (1976) for details. For any Gibbs process on W (see Section 2) with density f, the conditional intensity at a point u W is f(x {u}) λ(u; x) = f(x) f(x) λ(x i ; x) = f(x \{x i }) (u x), (3) (x i x). (4) For example, the inhomogeneous Poisson process with intensity function λ() has conditional intensity λ(u; x) = λ(u) at all points u. The fact that this does not depend on x is a consequence of the independence properties of the Poisson process. For a general Gibbs point process, λ(u; x) depends on x. The general pairwise interaction process (2) has conditional intensity λ θ (u; x) = b θ (u) n(x) i=1 x i u h θ (u, x i ). (5) Note that λ θ (;x) is discontinuous at the data points x i, and that the intractable normalizing constant in (2) has been eliminated in the conditional intensity.

5 PRACTICAL MAXIMUM PSEUDOLIKELIHOOD FOR SPATIAL POINT PATTERNS Definition of pseudolikelihood Besag (1977) defined the pseudolikelihood of a point process with conditional intensity λ θ (u; x) over a subset A W to be ( ) ( ) PL A (θ; x) = λ θ (x i ; x) exp λ θ (u; x)du (6) A x i A and gave examples of the utility of maximum pseudolikelihood estimates. Further theory was developed in Besag et al. (1982), Jensen & Møller (1991) and Jensen & Künsch (1994). If the process is Poisson, the pseudolikelihood coincides with the likelihood (1) up to the factor exp( W ). For a pairwise interaction process (2), the pseudolikelihood is PL(θ; x) = ( n(x) i=1 b θ (x i ) i j ) ( h θ (x i,x j ) exp W n(x) b θ (u) i=1 ) h θ (u, x i )du ; (7) the intractable normalizing constant α(θ) appearing in the likelihood (2) has been replaced by an exponential integral in (7) as if the process were Poisson. We give other examples in Sections 6 and 7 below. For processes with weak interaction in the sense that λ θ (u; x) can be approximated well by a function of u only, the process is approximately Poisson and the pseudolikelihood is an approximation to the likelihood. Hence the maximum pseudolikelihood estimator (MPLE) should be efficient if interaction is weak. Folklore holds that it is inefficient for strong interactions Loglinear case In this paper we focus on Gibbs point process models for which the conditional intensity is loglinear: λ θ (u; x) = exp ( θ T S(u; x) ), (8) where S(u; x) is a vector of spatial covariates defined at each point u in W. This includes exponential family likelihoods with canonical parameter θ. Assume S(u; x) exp(θ T S(u; x)) is uniformly bounded in u W and θ, for each fixed x. Then the maximum pseudolikelihood normal equations become S(x i ; x) = x i A θ log PL A(θ; x) = 0 A S(u; x) exp ( θ T S(u; x) ) du. (9) Numerical solution of (9) usually requires iterative algorithms. Equation (9) is an unbiased estimating equation, i.e. the expectations of the left and right sides of (9) under θ are equal. The proof is an application of a non-stationary form of the Nguyen Zessin formula (Nguyen & Zessin, 1976), namely ( ) E h(x i,x) = E ( λ(u; X)h(u, X) ) du, (10) x i X A A

6 288 ADRIAN BADDELEY AND ROLF TURNER holding for all non-negative bounded measurable functions h(u, x). This extends a result of Diggle et al. (1994) that, under reasonable conditions, the normal equations in the stationary case are a special case of the Takacs Fiksel estimating equations, themselves an application of the method of moments (Fiksel, 1984, 1988; Takacs, 1986). The MPLE is known to be consistent and asymptotically normal (Jensen & Møller, 1991; Jensen & Künsch, 1994), at least for stationary pairwise interaction processes whose interaction functions satisfy suitable regularity conditions. If (8) holds, the pseudolikelihood is log-convex. If, moreover, the parameter space is a convex set R p, it follows that the maximum exists and occurs either at an interior point of where the normal equations are satisfied, or on the convex boundary of. 4. Berman Turner device for maximum pseudolikelihood This section describes the computational device that we propose for computing approximate maximum pseudolikelihood estimates. The method is an adaptation of an earlier technique of Berman & Turner (1992) for approximate maximum likelihood estimation for the inhomogeneous Poisson point process. Related ideas have been explored by Lindsey (1992, 1995, 1996) and Lindsey & Mersch (1992) Derivation Let X be a Gibbs point process with conditional intensity λ θ (u; x) and consider the pseudolikelihood (6) for X, taking A = W for simplicity. Approximate the integral in (6) by a finite sum using any quadrature rule, m λ θ (u ; x)du λ θ (u j ; x) w j, (11) W where u j,j= 1,...,m, are points in W and w j > 0 are quadrature weights summing to W. This yields an approximation to the log-pseudolikelihood, i=1 j=1 n(x) m log PL(θ; x) log λ θ (x i ; x) λ θ (u j ; x) w j. (12) Extending an observation of Berman and Turner, we note that if the list of points {u j,j = 1,...,m} includes all the data points {x i,i = 1,...,n}, we can rewrite (12) as log PL(θ; x) where λ j = λ θ (u j ) and y j = z j /w j, and j=1 m (y j log λ j λ j )w j, (13) j=1 { 1 ifuj is a data point, u j x, z j = 0 ifu j is a dummy point, u j x. (14) The right side of (13), for fixed x, is formally equivalent to the log-likelihood of independent Poisson variables Y k with means λ k taken with weights w k.

7 PRACTICAL MAXIMUM PSEUDOLIKELIHOOD FOR SPATIAL POINT PATTERNS 289 The expression (13) can therefore be maximized using standard software for fitting generalized linear models (McCullagh & Nelder, 1989) provided that (a) the software handles weighted likelihoods (with weights not necessarily summing to 1); (b) the software accepts non-integer values of the responses y j in Poisson loglinear regression and correctly maximizes the log-likelihood expression; (c) the conditional intensity function λ θ (;x), for fixed x, is related to any explanatory variables by g ( λ θ (u; x) ) = θ T S(u, x), (15) where g is a link function implemented in the software, and S(u, x) is a vector of spatial covariates (possibly depending on x) defined at each point u in W. Software packages satisfying these criteria include GLIM (Aitkin et al., 1989) and S-PLUS (Becker, Chambers & Wilks, 1988; Chambers & Hastie, 1992; Venables & Ripley, 1994). The only choice of g in (15) we consider is the log link, giving rise to the loglinear model (8). The key reason for adopting this approach is that the use of standard statistical packages rather than ad hoc software confers great advantages in applications. Modern statistical packages have a convenient notation for statistical models (Aitkin et al., 1989; Chambers & Hastie, 1992; Venables & Ripley, 1994) which makes it very easy to specify and fit a wide variety of models of the type (8). Algorithms in the package may allow one to fit very flexible model terms such as the smooth functions in a generalized additive model (Hastie & Tibshirani, 1990). Interactive software allows great freedom to re-analyse the data. The fitting algorithms are typically more reliable and stable than in home-grown software Procedure In summary, the procedure is as follows: 1. generate a set of dummy points, and combine it with the data points x i to form the set of quadrature points u j ; 2. compute the quadrature weights w j ; 3. form the indicators z j as in (14) and calculate y j = z j /w j ; 4. compute the (possibly vector) values v j = S(u j, x) of the sufficient statistic at each quadrature point; 5. invoke the model-fitting software, specifying that the model is a loglinear Poisson regression log λ j = θ T v j to be fitted to the responses y j and covariate values v j, with weights w j. The coefficient estimates returned by the software give the (approximate) MPLE ˆθ of θ. The estimates of standard errors are not applicable, since they assume independent and identically distributed (iid) Poisson observations. The software also typically returns the deviance D of the fitted model; this is related to the log-pseudolikelihood of the fitted model by n(x) log PL( ˆθ; x) = 2 1 D log w i n(x). (16) Note that the sum is over data points only. Conveniently, the null model λ j λ in the loglinear Poisson regression corresponds to the uniform Poisson point process with intensity λ. The MPLE is ˆλ = n(x)/ j w j = n(x)/ W with corresponding log-pseudolikelihood log PL(ˆλ; x) = n(x)(log n(x) log W 1). i=1

8 290 ADRIAN BADDELEY AND ROLF TURNER Figure 1. Quadrature using the Dirichlet tessellation (Berman & Turner, 1992). Left: Illustrative example of a point pattern dataset in the unit square W. Right: The Dirichlet tessellation of W based on the data points and a 5 5 grid of dummy points. Data points are marked by filled dots. The quadrature weight w j is the area of the Dirichlet tile. This formulation assumes λ(u; x) is positive everywhere. Zero values are also permissible, provided the set of zeroes does not depend on θ. Thus we formally allow negative infinite values for S(u; x). In the approximation (13) all points u j with λ(u j ; x) = 0 are dummy points. Their contribution is zero and so they should be omitted in all contexts Quadrature schemes and their accuracy Berman & Turner (1992) used the Dirichlet tessellation or Voronoi diagram (Okabe, Boots & Sugihara, 1992) to generate quadrature weights for the analogue of (11). The data points are augmented by a list of dummy points, then the Dirichlet tessellation of the combined set of points is computed as sketched in Figure 1. The quadrature weight w j associated with a (data or dummy) point u j is the area of the corresponding Dirichlet tile. A computationally cheaper scheme is to partition W into tiles T k of equal area, and in each tile place exactly one dummy point, either systematically or randomly. Ascribe to each dummy or data point u j a weight w j = a/n j where a is the area of each tile, and n j is the number of (dummy or data) points in the same tile as u j. We call these the counting weights. Note that for non-poisson processes the conditional intensity λ θ (u ; x) is typically a discontinuous function of u at the data points x i, while generically the limit as u x i exists. Thus the approximation (11) involves a discontinuity error of size n(x) i=1 ( λθ (x i ; x) lim u x i λ θ (u ; x) ) w i (17) (a sum of contributions from data points only) in addition to the quadrature error associated with the finite approximation to the integral. The discontinuity error is controlled by reducing i w i, the total quadrature weight of the contributions from the data points, usually by increasing the number m n of dummy points. See further comments at the end of Section 5.

9 PRACTICAL MAXIMUM PSEUDOLIKELIHOOD FOR SPATIAL POINT PATTERNS Example: Strauss process Next we illustrate the method as it applies to the simple Strauss process model (Strauss, 1975; Kelly & Ripley, 1976). This is a pairwise interaction process (2) in which b θ (u) β is constant and h θ (u, v) = γ if u v r, and h θ (u, v) = 1 otherwise. Here β>0 and 0 γ 1 are parameters and r>0 is a fixed interaction distance. Each pair of points closer than r units apart contributes a penalty of γ to the likelihood, lik(β, γ ; x) = αβ n(x) γ s(x) (18) (taking 0 0 = 1), where α = α(β, γ) is the normalizing constant, and s(x) = #{(i, j) : i<j, x i x j r} is the number of unordered pairs of points which lie closer than r units apart. The Strauss process is well-defined for all γ [0, 1]. If γ = 1, it reduces to the homogeneous Poisson process with intensity β. For γ = 0 it is a hard core process in which no two points ever lie closer than r units apart. For 0 <γ <1 there is inhibition between close pairs of points. The conditional intensity is λ β,γ (u; x) = βγ t(u,x), where t(u, x) = #{x i x :0< x i u r} (19) is the number of points x i x which are close to u, other than u itself. The pseudolikelihood is ( ) PL(β, γ ; x) = β n(x) γ 2s(x) exp β γ t(u,x) du, (20) W which is in the required loglinear form (8) with θ = (ln β,ln γ) T and S(u; x) = (1, t(u, x)) T. The MPLE normal equations (9) are n(x) = β γ t(u,x) du (21) W 2s(x) γ t(u,x) du = n(x) t(u, x)γ t(u,x) du. (22) W W The maximum of the pseudolikelihood may occur either at a solution of these equations or at γ = 0, 1. If r is less than the minimum interpoint distance, then s(x) = 0 and the pseudolikelihood is maximized when γ = 0. Otherwise ˆγ >0. If the solution of (21) (22) occurs where γ > 1, then, since the log-pseudolikelihood is concave, ˆγ = 1. The maximized log-pseudolikelihood is log PL( ˆβ, ˆγ ; x) = n(x) log ˆβ 2s(x) log ˆγ ˆβp( ˆγ). (23) Consistency and asymptotic normality of the MPLE follow from Jensen & Künsch (1994) and Jensen & Møller (1991).

10 292 ADRIAN BADDELEY AND ROLF TURNER To compute the approximate MPLE using the Berman Turner device we would follow the procedure in Section 4.2, fitting the loglinear model log λ j = θ 1 θ 2 v j, where v j = t(u j, x), with t(u, x) as defined in (19). A suitable S-PLUS invocation would be glm(y v, family = poisson, link = log, weights = w) where y, v, w are S-PLUS vectors of equal length containing the responses y j, the explanatory variable values v j, and the weights w j, respectively, for each quadrature point u j. If glm() yields a solution θ 2 > 0, i.e. γ>1, then the MPLE is ˆγ = 1, ˆβ = n(x)/ W. Note that the integrals in (20) (22) are polynomials in γ : p(γ ) = γ t(u,x) du = a 0 a 1 γ a K γ K, (24) W say, where a k = A k is the area of the region A k ={u W : t(u, x) = k}. Thus (21) (22) can be rewritten βp(γ ) = n(x) (25) γp (γ ) = 2s(x) p(γ ) n(x). (26) In this simple case, the MPLE can be computed by solving (25) (26) directly, although this still requires evaluation of the coefficients a j, which calls for numerical integration or computational geometry. We use this polynomial approach to check the accuracy of our method in Section 10. The quadrature approximation (11) consists of replacing p(γ ) by q(γ) = m γ t(u j,x) w j = j=1 K b k γ k, where b k = u j A k w j are approximations to the areas of the sets A k. The approximation includes discontinuity error (17) arising because the weight w i for a data point x i, with t(x i ; x) = k but lim u xi t(u; x) = k 1, is ascribed to b k rather than to b k1. The total error in approximating p(γ ) by q(γ) is bounded by E 1 = K k=1 a k b k, the sum of the errors in approximating the area a k by b k. The error in approximating γp (γ ) by γq (γ ) is bounded by E 2 = K k=1 k a k b k. To control both E 1 and E 2, dummy points must be sufficiently dense throughout W and sufficiently dense where t(u; x) is high; that is, near the data points. k=1 6. Spatial interaction terms Sections 6 8 present further examples of point processes, and examine the computational requirements for applying our method. The present section concerns point processes with various kinds of interpoint interaction (pairwise interaction and other). Inhomogeneous models are discussed in Section 7 and marked point processes in Section 8.

11 PRACTICAL MAXIMUM PSEUDOLIKELIHOOD FOR SPATIAL POINT PATTERNS Pairwise interaction models General loglinear form Consider first the general pairwise interaction process (2) and assume b θ (u) = exp ( θ T B(u) ), (27) h θ (u, v) = exp ( θ T H (u, v) ), (28) where B(u) and H (u, v) are vectors defined for every u, v W. Note H (u, v) should be a symmetric function of u and v. The conditional intensity (5) becomes n(x) ) λ θ (u; x) = exp (θ T B(u) θ T H (u, x i ). (29) This is of the loglinear form (8) required for our approximation, with S(u, x) = B(u) n(x) i=1 x i u i=1 H (u, x i ), (30) and the procedure of Section 4.2 may be applied. The log-pseudolikelihood is concave in θ so the MPLE values form a non-empty convex set. Consistency of the MPLE is not guaranteed in this generality. In the rest of this section we assume B(u) is constant; Section 7 discusses models for spatial inhomogeneity. Here it is important to note that the general form (27) assumed for b θ embraces not only parametric models but also generalized additive models (GAM; Hastie & Tibshirani, 1990) in which B(u) would be a vector of spline basis functions. However, this apparently does not extend to GAM type models for h θ, since the sufficient statistic (30) is a sum of a variable number of terms which is beyond the scope of current GAM fitting algorithms. Hence we are currently forced to consider only parametric models for interpoint interaction, such as the Strauss process Soft core process The soft core model discussed by Ogata & Tanemura (1981) is a pairwise interaction process (2) with b(u) β and ( ( σ ) 2/κ ) h(u, v) = exp (u v), u v where β>0 and 0 σ< are parameters and 0 <κ<1 is an irregular parameter which we assume known for the moment. The limit as κ 0 is the hard core process (Strauss with γ = 0, r= σ); the density is not integrable for κ 1. Thus log λ θ (u; x) = exp(θ T S(u, x)) where S(u, x) = (1, V (u, x)) T and θ = (log β,σ 2/κ ) T with V (u, x) = n u x i 2/κ. (31) i=1 x i u

13 PRACTICAL MAXIMUM PSEUDOLIKELIHOOD FOR SPATIAL POINT PATTERNS 295 γ = 1, exhibits ordered patterns for 0 <γ <1 and produces clustering when γ>1. Other properties and maximum likelihood estimation are being investigated. The conditional intensity is of the desired form log λ θ (u; x) = θ T S(u; x) putting θ = (log β,log γ) T and S(u; x) = (1,A(x {u}) A(x)) T. Results in Jensen & Møller (1991) imply that if r is known, the MPLE of (β, γ ) is consistent. We do not know whether a central limit theorem is available; the results of Jensen & Künsch (1994) do not apply. Another advantage of the Berman Turner approach here is the reduction in the computational cost because the values of D(u; x) = A(x {u}) A(x) are only required for a relatively small number of points, i.e. the quadrature points u j Ord s process Ord (Ripley, 1977 Discussion) suggested a model for regular patterns of points representing entities which compete for resources, such as trees or towns. The Dirichlet tile associated with a point can be interpreted as the territory from which it draws resources. Ord suggested densities of the form n ( f(x; θ) g θ A(xi, x) ), (33) i=1 where A(x i, x) denotes the area of the Dirichlet tile associated with x i in the pattern x, and g θ : R [0, ) is a function combining the roles of the spatial interaction and intensity terms in other models. The special case g θ (v) λ is the uniform Poisson process with intensity λ. Typically g θ () would be an increasing function, so that small tiles are penalized. Ripley (1981 p. 175) concludes his analysis of the Swedish pines data (Section 10.1) with a comment that fitting Ord s process would be an interesting alternative analysis. To our knowledge, this has not been attempted and Ord s model has not been investigated or mentioned further, except in Baddeley & Møller (1989). The process (33) exists (i.e. f is integrable) under reasonable conditions: for example, whenever g θ () is uniformly bounded. The conditional intensity is ( ( ) g θ A(xi, x {u}) ) λ θ (u, x) = g θ A(u, x {u}) ( g x i u θ A(xi, x) ), where the product is over all points x i that are Dirichlet neighbours of u in the pattern x {u}, and A(u, x {u}) is the area of the Dirichlet tile with centre u in this pattern. Explicit analytic expressions for A(u, x {u}), the pseudolikelihood, or the MPLE are not available. Geometric computation of A(u, x) is time-consuming, so a discrete approximation to the pseudolikelihood becomes a necessity. The Berman Turner device (Section 4) can be applied if the kernel is modelled in loglinear form g θ (v) = exp(θ T G(v)). Then log λ(u, x) = θ T V (u, x), where V (u, x) = G ( A(u, x {u}) ) [ ( G A(xi, x {u}) ) G(A(x i, x) )] x i u is the regression variable. Evaluating v j = V(u j, x) for all j requires computation of m 1 different Dirichlet tessellations. 7. Inhomogeneous models Few writers to date, apart from Ogata & Tanemura (1986), have fitted explicit models to point pattern data that incorporate both spatial inhomogeneity and interpoint interactions.

14 296 ADRIAN BADDELEY AND ROLF TURNER In the context of our method, it is easy to introduce a spatial trend or dependence on spatial covariates. This is simply a matter of adding more terms to the linear predictor S(u; x) in the associated Poisson loglinear regression model Spatial trend A straightforward model of spatial trend in a pairwise interaction process (2) is the loglinear form b θ (u) = exp(θ T B(u)) and h θ (u, v) = h θ (u v) = exp(θ T H(u v)) as in (27) (28), but with the assumption that H (u, v) = H(u v) depends only on u v. The spatial trend is expressed by the dependence of b θ (u) on location u, while the interpoint interaction does not exhibit trend. Typically H would be one of the pairwise interaction functions considered in Sections 5 6, while B(u) = (B 1 (u),...,b k (u)) T would be a vector of convenient scalar functions of location, such as polynomials or orthonormal functions of the coordinates. It is also possible to use the GAM approach (Hastie & Tibshirani, 1990) to model each B l (u) by a smooth function of one coordinate. By (29) we may fit these models using the method of Section 4.2 by adding the term θ T B(u) to the linear predictor in one of the models discussed in previous subsections. Ogata & Tanemura (1981) developed maximum likelihood estimation techniques for models of this form, in particular combining a spatial trend with the soft core interaction of Section The trend term θ T B(u) was a polynomial in the Cartesian coordinates. Details are given in Section More generally, the spatial interaction can also depend on location. Loglinearity is usually lost, however, and we cannot apply the method of Section 4 directly. An effective alternative way to fit models with spatially-varying interaction range is proposed by Nielsen & Jensen (1998) Spatial covariates The data may include spatial covariates such as topographic elevation, soil ph, or another observed spatial pattern. Covariates may serve to eliminate spurious trend, explain variation in intensity, or make inferences conditional upon another spatial pattern. For our purposes the spatial covariate must be incorporated as a function Z(u), u W, observed at each of the quadrature points u j. We add terms in Z(u j ) to the linear predictor. The covariate value Z(u) may be simply an observation such as ph or elevation, but often the covariate data can be transformed to yield Z(u). For example in spatial epidemiology Z(u) could be a kernel smoothed estimate of the density of the population at risk (Cuzick & Edwards, 1990). Another observed spatial pattern can be included as a spatial covariate by computing a suitable function Z(u) associated with the pattern. Berman (1986) proposed modelling the dependence of a point process X on a line segment process Y by conditioning on Y and testing whether X is inhomogeneous Poisson with an intensity λ(u) depending on the minimum distance Z(u) from location u to the nearest line segment. 8. Marked point patterns 8.1. General The observed points may also carry marks, i.e. observations m i associated with each point x i of the pattern. The full dataset is a list v ={(x 1,m 1 ),...,(x n,m n )} (x i W and m i M),

16 298 ADRIAN BADDELEY AND ROLF TURNER In the case of a multitype point process with c different types, we have M ={1, 2,...,c} and the pseudolikelihood is usually defined by PL(θ; v) = ( n(x) i=1 ( λ θ (xi,m i ); v )) ( exp c m=1 W λ θ ( (u, m); v ) du ). (36) 8.3. Berman Turner device To apply our approximation method to (35) we create a set of marked points (u j,k j ), j = 1,...,M, which include the data (x i,m i ), i = 1,...,n, and form a good quadrature rule for W M. It is usually convenient to take the Cartesian product of a set of quadrature points in W and a set of elements of M. We assume this and write the marked points as (u j,k l ) for j = 1,...,J and l = 1,...,L where u j W,k l M. Then we define the indicator z jl to equal 1 if (u j,k l ) is a data point and 0 if it is a dummy point. Let w jl be the corresponding weights for a linear quadrature rule in W M. Then the pseudolikelihood is approximated by log PL(θ; v) L l=1 j=1 J (y jl log λ jl λ jl )w jl, where λ jl = λ θ ((u j,k l ); v) and y jl = z jl /w jl. For discrete marks as in (36), the weights may simply be those for a quadrature rule in W corresponding to the points u j Example: 2-type Strauss process This is the special case of the pairwise interaction marked point process (34) in which M ={1, 2}, i.e. points belong to one of two types, and b θ (u, m) = β m, ( h θ (u, m), (u,m ) ) { γm,m if 0 < u u <r m,m, = 1 otherwise, where β 1,β 2 > 0 are intensity parameters, γ 11,γ 22,γ 12,γ 21 [0, 1] are interaction parameters, and r 11,r 22,r 12,r 21 > 0 are interaction distances, with γ 12 = γ 21 and r 12 = r 21. The density may be expressed analogously to (18) as f(v; θ) = αβ n 1(v) 1 β n 2(v) 2 γ s 11(v) 11 γ s 12(v) 12 γ s 22(v) 22, where n 1 (v), n 2 (v) are the numbers of points of type 1 and 2 respectively, and s m,m (v) is the number of pairs of distinct marked points of types m and m respectively within a distance r m,m of each other. The conditional intensity is ( ) λ θ (u, m), v = βm γ t 1((u,m),v) m1 γ t 2((u,m),v) m2 (u W,m M,) where t m ( (u, m), v ) = #{i :0< u xi r m,m,m i = m } is the number of type m points within the required distance r m,m of a point u with type m.

17 PRACTICAL MAXIMUM PSEUDOLIKELIHOOD FOR SPATIAL POINT PATTERNS 299 This model may be cast in the loglinear form (8) with parameter vector θ = (log β 1, log β 2, log γ 11, log γ 12, log γ 22 ) and five explanatory variables, namely I 1 (m), I 2 (m), I 1 (m)t 1 ((u, m), v), I 1 (m)t 2 ((u, m), v)i 2 (m)t 1 ((u, m), v) and I 2 (m)t 2 ((u, m), v), respectively, where I k (m) = 1 {m = k}. Equivalently it may be described as a nested model with one factor and two covariates, one of which is nested within the factor. The pseudolikelihood estimate of θ is consistent (Jensen & Møller, 1991). The central limit theorem of Jensen & Künsch (1994) does not strictly apply here because there are more than two parameters, but Jensen and Künsch conjecture (1994 p. 477) that a generalization does hold Edge effects 9. Estimation and inference issues For inferential purposes it matters whether we assume the data x are a realization of a finite point process X defined only inside W ( bounded case ) or a partially observed realization y W of a point process Y extending throughout R d only through the window W ( unbounded case ). In the unbounded case we have an edge effect problem: the conditional intensity λ θ (u; y) of Y may not be observable from the data x = y W, since the required information may involve points outside the observation window W. Ripley (1988), Stoyan et al. (1995) and Baddeley (1998) survey remedies for edge effects. Following are some possible strategies Periodic boundary conditions If the window W is rectangular one may apply periodic boundary conditions (Ripley, 1977) by identifying opposite sides of W so that points near the right edge (say) have neighbours near the left edge. This is also called the torus correction. It typically reduces bias but inflates variance, and is only applicable to certain shapes of W. It seems inadvisable for inhomogeneous patterns Border method The border method applies (Ripley, 1988) to any process with finite interaction range r, in the sense that λ(u; x) depends only on data points x i lying within a distance r of u. An example is the Strauss process with fixed r. Form the pseudolikelihood over the subregion W r ={u W : b(u, r) W } of all points of W lying at least r units from the boundary. For u W r the conditional intensity is observable, λ θ (u; y) = λ θ (u; y W), so the pseudolikelihood over W r can be calculated from the data. This applies both to stationary and non-stationary processes. The main drawback is that the method discards appreciable amounts of data. Also, if r is unknown, one must be wary of comparing pseudolikelihoods based on different subsets W r. One strategy is to compute all pseudolikelihoods over the same domain W R where R is the maximum r value contemplated Ripley s hybrid method Edge correction weights (Ripley, 1988; Stoyan et al., 1995; Baddeley, 1998) are widely used for estimation in stationary models. Ripley (1988 p. 67) and Venables & Ripley (1994 p. 396) extended this to maximum pseudolikelihood estimation for the Strauss process. Ripley

18 300 ADRIAN BADDELEY AND ROLF TURNER proposed that the right side of (26), which cannot be observed due to edge effects, be estimated by n(x) ˆK(r)/ W, a quantity which has approximately the same expectation. Here ˆK(r) is the estimate of K(r), the second moment function K of the process (Ripley, 1988), ˆK(r) = W n(x) 2 1 { x i x i r} e(x i,x i,w), i i where e(u, v, W) is an edge effect correction factor ensuring unbiased estimation of λ 2 K(r) if the point process is stationary and isotropic. The left side of (26) is also subject to edge effects, and is modified by using the eroded domain W r instead of W in (24). This is a hybrid of the border method and the edge correction weights strategies Edge corrected pseudolikelihood An alternative, that may be new, is to introduce edge correction weights into the pseudolikelihood itself. Consider a stationary pairwise interaction process with b θ (u) β and h θ (u, v) = exp(θ T H(u v)). Modify the model, replacing the pairwise interaction function by h E θ (u, v) = exp ( e(u, v, W)θ T H(u v) ), where e(u, v, W) is an edge correction weight as in Ripley (1988), Stoyan et al. (1995) and Baddeley (1998) which must be symmetric in u and v. This modified model has conditional intensity λ E θ (u; x) = β exp(θ T S E (u; x)), where n(x) S E (u; x) = e(u, x i, W)H (u x i ) i=1 is a plug-in estimator of the unobservable potential S(u; y) for the original model. Forming the pseudolikelihood for the modified model and deriving the normal equations, we obtain (9) with S(u; x) replaced by S E (u; x) throughout. By (10), this is an unbiased estimating equation for the modified model. It is approximately unbiased for the original model, when θ is small. This model can be fitted by our method Data augmentation The unobserved points of y outside the window which affect the value of λ θ (u; y) for u W can alternatively be regarded as missing data. One approach is data augmentation Tanner (1996 Chapter 5) which has been applied to maximum likelihood inference for point processes by Geyer (1998). This can also be applied in our context Irregular parameters and profile pseudolikelihood The point process models considered above contain irregular parameters which do not enter in the loglinear form (8) required for our method. A possible approach to estimation is by analogy with profile likelihood. Write θ = (ϕ, ψ) where ψ are the irregular parameters, so that we assume λ θ (u; x) = exp ( ϕ T S(u, x,ψ) ) instead of (8). For each fixed value of ψ the model is loglinear in ϕ, so we can apply our approximation method to maximize the pseudolikelihood over ϕ, yielding an MPLE ˆϕ(ψ)

19 PRACTICAL MAXIMUM PSEUDOLIKELIHOOD FOR SPATIAL POINT PATTERNS 301 for fixed ψ. Computing the maximized pseudolikelihood as a function of ψ yields the profile pseudolikelihood PL ( ( ˆϕ(ψ),ψ); x ) = max PL(ϕ, ψ). ϕ The global MPLE of θ is then obtained by maximizing this profile pseudolikelihood over ψ. We examine this approach for the Strauss and soft core processes in Section Parametric inference and model choice We use the parametric bootstrap for inference and model choice. We assume in practice that the MPLE ˆθ is approximately unbiased and approximately normal, although this has only been established in certain cases (Jensen & Møller, 1991; Jensen & Künsch, 1994). To obtain confidence intervals for θ, we simulate from the fitted model θ = ˆθ, obtaining M simulated values ˆθ (1),..., ˆθ (M) from the distribution of the MPLE under the fitted model. We estimate the mean vector and covariance matrix of this distribution from the simulated values, then construct confidence intervals using location models based on the multivariate normal or the bootstrap distribution. Similarly for model choice we use the bootstrap distribution of the deviance between two (nested) models. 10. Examples of applications Our analyses were performed in S-PLUS (Becker et al., 1988; Chambers & Hastie, 1992; Venables & Ripley, 1994) using the generalized linear model fitting function glm() and occasionally the generalized additive model function gam(). Some analyses were cross-checked using GLIM (Aitkin et al., 1989) Swedish pines data Figure 2 depicts the Swedish pines data of Strand (1972) which give the locations of 71 pine saplings in a 10 m 10 m square. Ripley s pioneering analysis (Ripley, 1981 Section 8.6, pp ) plotted L(t) = K(t)/π and rejected the hypothesis of a homogeneous Poisson process at the 1% level by a Monte Carlo test based on D = sup t L(t) t. Ripley then fitted a Strauss process manually, obtaining r = 0.7 m and γ = In the latest analysis, by Venables & Ripley (1994 p.396), γ was estimated to be 0.15 using maximum pseudolikelihood (using a procedure essentially the same as our polynomial approach (25) (26)) with Ripley s hybrid edge correction (Section 9.1.3). We fitted a Strauss process to these data by maximum pseudolikelihood, using both the Berman Turner device and the polynomial approach via (25) (26). We estimated β and γ, but initially held r fixed at 0.7. For the Berman Turner method, we tried varying densities of dummy points, with various edge corrections, and computed the quadrature weights using the Dirichlet and counting methods (Section 4.3). Estimates obtained for γ ranged from 0.29 to 0.20, and for β from 1.49 to A finer quadrature scheme always led to a smaller value of γ and a larger value of β. Both the Berman Turner and polynomial methods gave γ = 0.21 using a grid of dummy points. This is close to the value obtained by Ripley (1981). The corresponding value of β was 1.98 by the Berman Turner method and 2.01 by the polynomial method. Various edge corrections (Section 9.1) were tried, all using a grid of dummy points. Using the border method, eroding the window by a distance r = 0.7 m, we obtained

20 302 ADRIAN BADDELEY AND ROLF TURNER Figure 2. The Swedish pines data: locations of 71 pine saplings in a 10 m 10 m square. Extracted by Ripley (1981) from Strand (1972). Data obtained from the MASS library accompanying Venables & Ripley (1994). ˆγ = 0.13 and β values of 3.24 (Berman Turner) and 3.29 (polynomial). Periodic edge correction yielded ˆβ = 2.09, ˆγ = 0.24 (Berman Turner) and ˆβ = 2.24, ˆγ = 0.22 (polynomial). Our proposed edge corrected pseudolikelihood method was also applied, using the translation correction (Ripley, 1988; Baddeley, 1998) as the edge correction factor e(u, v, W). The parameter estimates were ˆβ = 1.97, ˆγ = The latter two edge corrections inflated the estimate of γ while the border correction deflated it. A graph of the profile log-pseudolikelihood of the interaction distance r (Figure 3) yields ˆr = 0.7, which agrees with Venables & Ripley (1994 p.396). The jaggedness of the plot is due to the discontinuity of the interpoint interaction: 1 { u x i r} and hence s(x) are discontinuous functions of r, while the left sides of (25) (26) are differentiable with respect to r. There seems little prospect of a convenient limit theory for ˆr. Next we estimated the covariance matrix of the parameter estimates using the parametric bootstrap (Section 9.3). To reduce the amount of computation we did not apply edge correction and looked at only one set of quadrature weights (based on a regular grid). However, r was estimated by profile pseudolikelihood. This version of the estimation algorithm was first applied to the data yielding ( ˆβ, ˆγ, ˆr) = (1.9781, , ). A Metropolis Hastings birth death-shift algorithm (Geyer & Møller, 1994) generated 500 simulated realizations from the Strauss process with the same parameter values. The bootstrap covariance matrix, based on 500 parametric bootstrap replicates, was Ĉ = , yielding corresponding (normal-based) 95% confidence intervals of [1.1153, ], [0.0575, ], and [0.6267, ] for β, γ, and r, respectively. Normality of the estimates is suspect. Chi-squared tests for normality on the sequences of bootstrap replicates of estimates gave P-values of 0.02, 0.22, and 0.00, respectively, for the normality of ˆβ, ˆγ, and ˆr; so ˆγ is the only estimate which may legitimately be assumed normal. However, rough 95% confidence intervals based on the empirical quantiles

Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2015 Examinations Aim The aim of the Probability and Mathematical Statistics subject is to provide a grounding in

Applications to Data Smoothing and Image Processing I MA 348 Kurt Bryan Signals and Images Let t denote time and consider a signal a(t) on some time interval, say t. We ll assume that the signal a(t) is

MATHEMATICAL METHODS OF STATISTICS By HARALD CRAMER TROFESSOK IN THE UNIVERSITY OF STOCKHOLM Princeton PRINCETON UNIVERSITY PRESS 1946 TABLE OF CONTENTS. First Part. MATHEMATICAL INTRODUCTION. CHAPTERS

Statistics in Retail Finance 1 Overview > So far we have focussed mainly on application scorecards. In this chapter we shall look at behavioural models. We shall cover the following topics:- Behavioural

Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby

Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation

Modern statistics for spatial point processes April 25, 2007 Jesper Møller and Rasmus P Waagepetersen Department of Mathematical Sciences, Aalborg University Abstract: We summarize and discuss the current

Chapter 4 Poisson Models for Count Data In this chapter we study log-linear models for count data under the assumption of a Poisson error structure. These models have many applications, not only to the

Two Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering Department of Industrial Engineering and Management Sciences Northwestern University September 15th, 2014

Local classification and local likelihoods November 18 k-nearest neighbors The idea of local regression can be extended to classification as well The simplest way of doing so is called nearest neighbor

Numerical Analysis Lecture Notes Peter J. Olver 5. Inner Products and Norms The norm of a vector is a measure of its size. Besides the familiar Euclidean norm based on the dot product, there are a number

This document is designed to help North Carolina educators teach the Common Core (Standard Course of Study). NCDPI staff are continually updating and improving these tools to better serve teachers. Algebra

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

24. The Branch and Bound Method It has serious practical consequences if it is known that a combinatorial problem is NP-complete. Then one can conclude according to the present state of science that no

Chapter 3 Stochastic Inventory Control 1 In this chapter, we consider in much greater details certain dynamic inventory control problems of the type already encountered in section 1.3. In addition to the

NON-LIFE INSURANCE PRICING USING THE GENERALIZED ADDITIVE MODEL, SMOOTHING SPLINES AND L-CURVES Kivan Kaivanipour A thesis submitted for the degree of Master of Science in Engineering Physics Department

The Calculus of Functions of Several Variables Section. Introduction to R n Calculus is the study of functional relationships and how related quantities change with each other. In your first exposure to

Factor analysis Angela Montanari 1 Introduction Factor analysis is a statistical model that allows to explain the correlations between a large number of observed correlated variables through a small number

Degrees of Freedom and Model Search Ryan J. Tibshirani Abstract Degrees of freedom is a fundamental concept in statistical modeling, as it provides a quantitative description of the amount of fitting performed

Content Area: Mathematics Grade Level Expectations: High School Standard: Number Sense, Properties, and Operations Understand the structure and properties of our number system. At their most basic level

Spring 2006 This exam contains 7 questions. You should attempt them all. Each question is divided into parts to help lead you through the material. You should attempt to complete as much of each problem

INDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS STEVEN P. LALLEY AND ANDREW NOBEL Abstract. It is shown that there are no consistent decision rules for the hypothesis testing problem

Solving Simultaneous Equations and Matrices The following represents a systematic investigation for the steps used to solve two simultaneous linear equations in two unknowns. The motivation for considering

1 VECTOR SPACES AND SUBSPACES What is a vector? Many are familiar with the concept of a vector as: Something which has magnitude and direction. an ordered pair or triple. a description for quantities such

Lecture 2. Marginal Functions, Average Functions, Elasticity, the Marginal Principle, and Constrained Optimization 2.1. Introduction Suppose that an economic relationship can be described by a real-valued

Mathematics Review for MS Finance Students Anthony M. Marino Department of Finance and Business Economics Marshall School of Business Lecture 1: Introductory Material Sets The Real Number System Functions,

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts

.6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop

. INNER PRODUCT SPACES.. Definition So far we have studied abstract vector spaces. These are a generalisation of the geometric spaces R and R. But these have more structure than just that of a vector space.

Interpreting Kullback-Leibler Divergence with the Neyman-Pearson Lemma Shinto Eguchi a, and John Copas b a Institute of Statistical Mathematics and Graduate University of Advanced Studies, Minami-azabu

NEW YORK STATE TEACHER CERTIFICATION EXAMINATIONS TEST DESIGN AND FRAMEWORK September 2014 Authorized for Distribution by the New York State Education Department This test design and framework document