Abstract

The definition and the general methods of construction of non-statistical association measures on different domains are discussed. An association measure is a function of two variables defined on a set X with involutive operation and satisfying the properties similar to the properties of the Pearson’s correlation coefficient. Such measure can be used for analysis of the possible positive and negative relationships between variables. The methods of construction of association measures using similarity measures and pseudo-difference operations associated to t-conorms are discussed. The examples of association measures on different domains are considered.

1Introduction

The association measures are widely used in data analysis. Different association and correlation measures have been introduced in statistics, data mining, fuzzy set theory etc. [1, 7, 12, 13, 17] for different types of data. The Pearson’s correlation coefficient [12]

(1)

corr(x,y)=∑i=1n(xi-x¯)(yi-y¯)∑i=1n(xi-x¯)2·∑i=1n(yi-y¯)2.

is the most popular association measure used for analysis of possible relationships between variables. Many association measures similar to the correlation coefficient have been proposed but it is an interesting problem not only to introduce a new association measure for some type of data but to analyze a class of functions similar to the correlation coefficient and to propose the methods of their generation. In [1], it was proposed the measure of correlation between fuzzy membership functions satisfying to the set of properties similar to the properties of the Pearson’s correlation coefficient. In [6], it was considered another set of properties similar to the properties of Pearson’s correlation coefficient and defining the time series shape association measures. In [7], the general methods of construction of such association measures have been proposed and the sample Pearson’s correlation coefficient was obtained as a particular case of the general approach. In [8], the methods proposed in [7] have been extended on the general case of functions A : X × X → [-1, 1] defined on a set X with involutive operation N (called reflection) and satisfying the properties similar to the properties of the Pearson’s correlation coefficient. The methods of construction of such measures [8, 9] use similarity measures and pseudo-difference operations associated with t-conorms [2, 15]. In [9], the problems appeared in the definition of the general class of functions similar to the Pearson’s correlation coefficient have been discussed. These problems have the different reasons. First, the properties of the function (1): corr(x,x) = 1 and corr(x,–x) = –1, are, really, contradictive for the n-tuple x = (0, … ,0) where it is fulfilled: x = –x. The similar problem appears, generally, for the fixed points of the reflection operation N used in the definition of the association measure A. Second, the function (1) does not defined for the constant n-tuples x = (x1, …, xn) = (s, …, s) where s is some real value because the denominator of (1) equals to 0. Similarly, it is possible that an association measure cannot be defined on all set X. Such elements of X can be excluded from the domain of the association measure or this function should be additionally defined there. Third, depending on the domain X, additionally to the general properties of the association measures it is possible to consider other properties specific for this domain. See, for example, the definition of time series shape association measure [6, 7].

The current paper tries to avoid these problems by two ways. First, to consider explicitly the association measures defined on some subset V of X where these problems disappear. Second, to define the association measure on the set X and to correct some properties required from the association measure to avoid the possible contradiction between them.

The current paper also gives the proofs of some general results considered in the previous papers of the author without proofs. Some related details can be found also in [10].

The paper has the following structure. Section 2 discusses the definitions and the properties of association measures defined on the sets with involutive operation. For example, the simple association measure on the set of real values is introduced. Section 3 considers the basic definitions and the properties of operations of fuzzy logic used in the following sections. Section 4 considers the general methods of construction of association measures and gives the proofs of the related theoretical results. Section 5 considers an example of association measure constructed by proposed methods. The conclusions are given in the last section.

2Association measures

(2)

N is called a reflection on X if it is not an identical function, i.e. for some x ∈ X it is fulfilled N (x) ≠ x. An element x ∈ X, such that

(3)

N(x)=x,

is called a fixed point of N in X.

The fixed points will be denoted by xFP, hence:

(4)

N(xFP)=xFP.

Denote FP (N, X) the set of all fixed points of N in X. This set can be empty.

Definition 2. Let X be a set with a reflection operation N on X, V be a subset of X, |V|>1, from x ∈ V it follows N (x) ∈ V and the restriction of N on V is a reflection on V. A function A : V × V → [-1, 1] satisfying for all x, y ∈ V the properties:

(5)

A(x,y)=A(y,x),(symmetry)

(6)

A(x,x)=1,(reflexivity)

(7)

Proof. Suppose Proposition 1 does not true, i.e. A is afunction satisfying on V the properties (5)-(7) and Vcontains some fixed point xFP of the reflection N.Then from Equations (4) and (6) we obtain: A (xFP, N (xFP)) = A (xFP, xFP) =1, but from Equations(7) and (6) we have: A (xFP, N (xFP)) = - A (xFP, xFP) = -1. The obtained contradiction proves the proposition. ■

(11)

A(x,N(x))=-1ifx∉FP(N,X),

(12)

A(x,xFP)=A(xFP,x)=0,forallxFP∈FP(N,X),

Proof. For the proof of Equations (8), (9) and (11) see the proof of Proposition 2. Let us prove Equation (12). From Equations (4), (7) and (5) for all x ∈ X and all xFP ∈ FP (N, X) it follows: A (x, xFP) = A (x, N (xFP)) = - A (x, xFP) hence A (x, xFP) =0 and A (xFP, x) =0. ■

Note that from Proposition 3 it follows A (xFP, xFP) =0. Although some papers require the fulfillment of (6) for all x ∈ X, in this paper the association measures of type 2 will be not considered. The property (12) of association measures of type 1 seems more reasonable. See [10].

2.1Association measures on [0,1]

In [10], it was considered an association measure of type 1 on [0,1] related with the strong negation N.

(13)

N(N(x))=x,

(14)

N(0)=1,N(1)=0.

A strong negation is a reflection operation on [0,1] with the unique fixed point denoted as c. In [10], it was considered the class of c-separable association measures of type 1 satisfying for all x, y ∈ [0, 1] the properties:

(15)

A(x,y)>0ifx,y>corx,y<c,

(16)

A(x,y)=0ifx=cory=c.

(17)

A(x,y)<0ifx<c<yory<c<x.

Such association measures can be used for analysis of associations between truth or probability values of some plausible statements P and Q. For example, the association between them is negative when one statement has high plausibility value and another one has low plausibility value.

2.2Association measures on the set of real values

Let X be a set of real values, X = R, and N (x) = - x for all x in R. We have XFP = 0 and FP (N, X) = {0}. The association measures of type 1 satisfy on R the properties:

(18)

A(x,y)=A(y,x),

(19)

A(x,x)=1ifx≠0,

(20)

A(x,-y)=-A(x,y).

From Proposition 3 we obtain the following properties of the association measures on R:

(21)

A(-x,-y)=A(x,y),

(22)

A(x,-y)=A(-x,y),

(23)

A(x,-x)=-1ifx≠0,

(24)

A(x,0)=A(0,x)=0.

Similarly to the c-separable association measure on [0,1] introduce the following definition.

Definition 5. An association measure A on the set ofreal values R is called 0-separable (or simply “separable”) if the following properties are fulfilled for all x, y ∈ R:

(25)

A(x,y)>0ifx·y>0,

(26)

A(x,y)=0ifx·y=0.

(27)

A(x,y)<0ifx·y<0,

0-separable association measures have the simple interpretation: x and y are positively associated if they have the same sign and they are negatively associated if they have the opposite signs. Based on these considerations it can be proposed the following simplest association measure on the set of real values.

Proposition 4.The function

(28)

A(x,y)=sign(x·y)={1,ifx·y>00,ifx·y=0-1,ifx·y<0,

is the 0-separable association measure of type 1 on the set of real values.

The proof is straightforward.

2.3Association measures on the set of time series

Association measures on the set of time series are considered in [6, 7]. A time series of the length n, (n > 1), is a sequence (n-tuple) of a real values x = (x1 … , xn). Consider the reflection operation N (x) = - x = (- x1, …, - xn) on the set X of all time series with the length n. Suppose p, q are real values and p ≠ 0. Define x + y = (x1 + y1, …, xn + yn) and py + q = (py1 + q, …, pyn + q). Denote q(n) a constant time series with the length n with all elements equal to q. The n-tuple xFP = 0(n) is a unique fixed point of N. We write x = const if x = q(n) for some q, and x ≠ const if xi ≠ xj for some i ≠ j from {1, …, n}. Denote XC a set of all constant time series from X.

Definition 6. Suppose V is a subset of X such that from x ∈ V it follows -x ∈ V, and x + q ∈ V for all real q. A function A : V × V → [-1, 1] satisfying on V the properties Equations (5)–(7) and the property:

(29)

A(x+q,y)=A(x,y),forallrealq,(translationinvariance)

is called a shape association measure on V. If from x ∈ V it is fulfilled px ∈ V for all p > 0 and A satisfies on V the property:

Theorem 2.Suppose X is a set with a reflection N, V ⊆ X ∖ FP (N, X) , |V|>1, V is closed under N which is a reflection onV, Sis a t-conorm and SIM is a similarity measure on X satisfying the permutation of reflections property then the function ASIM,S : V × V → [-1, 1] defined for allx, y ∈ V by

Theorem 3.If in the conditions of the Theorem 2SIMis a similarity measure onXsatisfying the properties of permutation of reflections and weak similarity of reflections then the functionASIM,S : V × V → [0, 1] defined for allx, y ∈ Xby:

(55)

ASIM,S(x,y)=SIM(x,y)⊖SSIM(x,N(y)),

is an association measure on V if one of the following is fulfilled:

(56)

1)SIM(x,N(x))=0,forallx∈V,(non-similarityofreflections)

(57)

2)thet-conormShasnonilpotentelements.

Proof. The symmetry of ASIM, S
follows from the symmetry and the permutation of reflections properties of SIM:

5Examples of association measures

The examples of association measures on different domains constructed by the methods discussed in the previous section can be found in [7–10]. The similarity measures satisfying the conditions of Theorems 2 and 3 can be obtained from the distance measures used together with some data transformation [7], from generators of strong negations [3, 5, 9, 10] etc. For example, suppose φ, ψ : [0, 1] → [0, 1] are automorphisms of [0,1] and φ defines by (41) a strong negation N on [0,1]. Then the function

(58)

SIM(x,y)=1-ψ(|φ(x)-φ(y)|),

is a similarity measure on [0,1] that can be used for constructing association measure on [0,1] related with strong negations (42)-(44) (see [10] for details). Below is an example of the simplest association measure on [0,1] related with the standard negation(42) [10]:

6Conclusion

The paper gives the definitions of the association measures generalizing the Pearson’s correlation coefficient and proposes the general methods of construction of such measures. The proofs of the main results are provided. The simple association measure on the set of real numbers is introduced. The considered methods of generation of association measures can be used for construction of association measures on different domains.

Acknowledgments

This work was partially supported by the project 20151589 of Instituto Politécnico Nacional, Mexico.