Thursday, April 23, 2009

Therefore, if a person scores low on one item, he/she should be score low on the total scores as well. Likewise, if I score higher on the item than you do, my ability would be dominant over your ability.

In IRT terminology, DIF (Differential Item Functioning) refers to "a difference in the probability of endorsing an item for members of a reference group (e.g., US workers) and a focal group (e.g., Chinese workers) having the same standing on the latent attribute measured by a test." It is related to dominance approach.

Tuesday, April 21, 2009

(disparities) = b * (proximities in terms of dissimilarities; short for 'prox' below)

interval MDS:

(disparities) = a + b * (prox)

logarithmic MDS:

(disparities) = log(prox)

(disparities) = b * log(prox)

(disparities) = a + b * log(prox)

exponential MDS

(disparities) = exp(prox)

(disparities) = b * exp(prox)

(disparities) = a + b * exp(prox)

power MDS (which includes square root with q = 0.5):

(disparities) = (prox)^q

(disparities) = b * (prox)^q

(disparities) = a + b * (prox)^q

polynomial MDS (i.e., spline MDS without interior knots)

(disparities) = a + b * (prox) + c * (prox)^2

(disparities) = a + b * (prox) + c * (prox)^2 + d * (prox)^3

However, softwares are not always clear about the kinds of metric MDS they are performing. Based on my own testing as of 04/21/09, here is a table of comparison:

Software Package

Program, version, date

Metric MDS supported

MATLAB 7.8.0.347 (R2009a)

mdscale() 1.1.6.9, 12/01/08Criterion = 'metricstress'

Ratio only

smacof in R 0.9-0 (05/24/08)

smacofSym(), metric = TRUE

Ratio only

SPSS 17.0.0 (08/23/08)

Proxscal version 1.0

Ratio, Interval, Spline

SYSTAT 12.02.00

Multidimensional ScalingShape = Square (similarities model)

Interval (Linear), Log, Power

To date, no program in any of these software packages provide combinations of two or more than two transformations, but these could be very helpful. For example, log + polynomial may be of interest, because log may be used to normalize residuals, while polynomial may be able to pick up the trend of the data. That is,

Saturday, April 18, 2009

Eigenvalue and eigenvector are those satisfying the following eigenequation:

matrix(transformation) * eigenvector = eigenvalue * eigenvector

Thus, if we can find such a eigenvector and therefore a eigenvalue, their interpretations are: after being linearly transformed by the matrix, eigenvector still has the same direction. Eigenvalue can thus be considered some essential part of the matrix, or the characteristic value of the matrix. Eigenvector can be considered a tool to extract such essential part of the matrix.

A nice explanation can be founded here; see also Borg and Groenen (2005) Chapter 7.

P is a matrix of left singular vectors, Φ is a diagonal matrix with singular values, Q is a matrix of right singular vectors. The naming choice of "singular" probably is similar to that of "eigen", because the expressions of the two decompositions are very similar and probably referring to the essential and unique quality of the matrix.

Thursday, March 5, 2009

On the top of a folded handkerchief is the ideal point, representing the highest degree of preference for a particular individual, i.e., the optimal choice within a given set of items. The closer the item is to the ideal point, the higher the preference is of this individual; thus, the individual prefers choice 1 to choice 2.

While different individuals have different ideal points on the handkerchief, unfolding the handkerchief will give us a 2D diagram showing all ideal points and all the items on a common space.

Applicaton 1: In American Idol, a set of judges rate a set of contestants. Unfolding would display the ideal point of each judge as a point, and each contestant as a point. Three pieces of information will be revealed: (a) Judges with similar ideal points would cluster; (b) Contestants rated similarly would cluster; (c) The closeness between the ideal point of a judge and a contestant indicates how high the judge would rate the contestant.

Application 2: A set of TV brands (e.g., Panasonic, Sony, ...) were rated on a set of attributes (e.g., price, quality, style, ...). In the matrix, the rows are the brands and the columns are the attributes. Unfolding would display (the ideal point of) each brand as a point and each attribute as a point. Three pieces of information: (a) Similar brands (in terms of ideal points) would cluster; (b) Similar attributes would cluster; (c) Brands rated highly on a particular attribute would appear close to that attribute.

Application 3: Unfolding can also be used to display relationships that may not be symmetric, such as desire between people, trade-flows between nations, and journal citation frequency. Each journal would appear as both a row and a column. The matrix would contain the citation frequency of the row-journal by the column-journal. Self-citing is excluded. Unfolding would produce a diagram in which each journal would appear as two points: citing others and being cited by others. Clusters would have the obvious interpretation, and the distance between a journal’s two points would reflect the imbalances in its citation.

Other variants of unfolding models:

External unfolding models. Besides the preference data, we also have a pre-existing coordinate matrix of the choice objects.

Vector model of unfolding. Representing individuals by preference vectors instead of ideal points. Because it is the direction of the vector that matters, the preference vectors are usually scaled to have equal length.

Tuesday, March 3, 2009

The purpose of Procrustes analysis is to fit one MDS solution (configuration, map), B, to another one, A, and eliminate superficial differences between B and A, by means of rotating, mirror-reflecting, dilating/magnifying, shrinking, or shifting/moving B, without changing either's shape.

Application 1. A is the physical location map, whereas B is the travel-time map produced by MDS. In Procrustes analysis, we fit B to A, which allows us to display B on the top of A and to spot differences.

Application 2. Y is easy to interpret, whereas the initial X is not. In Procrustes analysis, we fit X to Y in order to interpret X.

Application 3. F is the result from the female participants, whereas M is that from the male participants. In Procrustes analysis, we fit M to F (or F to M) so that we can compare the results from males and females on the same page (provided that the fitting is satisfactory).

Application 4. CH is is the result from Chinese participants, whereas AM is that from American participants. In Procrustes analysis, we fit CH to AM (or AM to CH) so that we can compare the cross-cultural results on the same page (provided that the fitting is satisfactory).

Thursday, January 29, 2009

Searching JPSP by scholar. The 12 results found are categorized as the following:

A. Structure of Emotion

1. Russell (1980) A circumplex model of affect: 28 emotion-denoting adjectives are reduced to a 2D space: pleasure-displeasure and arousal-sleepiness.

In the same year, Russell and Pratt (1980) also talked about the two dimensions on the meaning that persons attribute to environments.

Russell and Bullock (1985) followed up on Russell (1980) to show that the two dimensions reveal a basic property of the human conception of emotions, rather than represent an artifact that is due to semantic relations learned along with the emotion lexicon.

Russell, Weiss, and Mendelsohn (1989) followed up to develop a single-item scale, the Affect Grid, to quickly assess affect along the dimensions of pleasure-displeasure and arousal-sleepiness.

Feldman (1995) interpreted the 2D as valence-focus and arousal-focus and suggested their relation to Positive Affect and Negative Affect.

Barrett (2004) followed up on Feldman (1995) to talk about how valence-focus and arousal-focus are related to cognitive structure of emotion language vs. phenomenological experience.

Extending Russell's model, Larsen, McGraw, and Cacioppo (2001) argued that people can feel happy and sad at the same time; they do not have to experience positive-negative emotions in a bipolar way.

B. Structure of Self-Other Relationship:

2. Falbo (1977) Multidimensional scaling of power strategies: 16 strategies of "How I Get My Way." reduced to a 2D space: (a) rational/nonrational and (b) direct/indirect.

3. Bartholomew and Horowitz (1991) examined a model of individual differences in adult attachment in which two underlying dimensions, the person's internal model of the self (positive or negative) and the person's internal model of others (positive or negative), were used to define four attachment patterns. (as seen in General Discussion)4. Wiggins, Phillips, and Trapnell (1989) interpersonal circumplex: dominant/submissive and agreeable/cold-hearted.

Tuesday, January 27, 2009

To facilitate the interpretation of the dimensions in the reduced space, we may do internal or external analyses.

In internal analysis, we use the same proximities data, run alternative analysis method (e.g., cluster analysis) with them, and embed the results within MDS. If different methods all converge to the same interpretation, then it is!

In external analysis ("property fitting"), we use supplementary data. Specifically, we may try to predict the property (collected on the objects) for object_i from the 2D coordinates for the objects through multiple regression.

For example, in a study, the objects are 14 stressful experiences relevant to early parenting, and the two dimensions are labeled as "major vs. minor child problems" and "child welfare vs. self-welfare". The external property is "infuriating", and we want to predict "infuriating" for each of the 14 objects from the 2D coordinates for the 14 objects, which results in a directed line. It is found that infuriating tends to be associated with the problems of self-ware as opposed to the welfare of the child.

In external analysis, we regress a given external attribute of the objects (e.g., "infuriating") on the 2D coordinates of the objects (i.e., dim 1 and dim 2), and the resulting unstandardized multiple regression coefficients form a point in the 2D space. A directed line is then drawn from the origin to that point. Evidently, the projections of the objects on this line give a set of 2D coordinates, (dim1, dim2), which correspond best to the external attribute (Borg & Gronen, 2005, pp.77-79).

Monday, January 26, 2009

The goal of scaling is to minimize the dissimilarity of data between the original and the reduced space. Specifically,

p_ij is the proximity (typically, dissimilarity) between object_i and object_j in the original space, whereas d_ij is the Euclidean distance between object_i and object_j in the reduced space

We use a linear regression equation to predict d_ij from p_ij, and dhat_ij is the predicted value of d_ij. Then, we want to minimize the difference between d_ij and dhat_ij, using least squares. Here, we have the raw stress index (which we want to minimize):

Because the dimensions in the reduced space can be arbitrarily stretched or contracted, we normalize the raw stress index in order to achieve the following,

Also, a square root places the index in the same unit as d_ij, so we have the normalized stress index (which we want to minimize):

(Note. this is Kruskal's stress formula 1)

Typically, a monotone regression (aka., isotonic regression) is used instead of a linear regression, and it leads to minimizing distance ranks and therefore non-metric MDS. If a linear regression is used, it is metric MDS.

According to Kruskal and Wish (1978), with non-metric MDS, at least 9 objects are required for a 2D solution, while at least 13 objects are required for a 3D solution.

According to Merriam-Webster dictionary, degenerate means " being mathematically simpler (as by having a factor or constant equal to zero) than the typical case".

In MDS, a degenerate solution is one with a zero (or very close to zero) stress value but retaining no (or minimal) structural information about the data. For example, the objects cluster into a few (e.g., 2) nodes and the dimensions are uninterpretable.

Friday, January 23, 2009

Initially, researchers want to interpret a set of objects in terms of their relationships. However, the proximities (typically, dissimilarity) among them are in a high-dimensional space, which is beyond human's capacity of comprehension. Being troubled, the researchers think,

Heck! Why don't we try to project the objects into a 2D space and display them on a X-Y plane? As human beings, we are much more familiar with a X-Y plane and such an interpretation will be more exciting!

Thus, dimension reduction and therefore information loss is involved in MDS, and the general purpose of MDS program is to preserve the proximities between objects in the high-dimensional space as much as possible. An example of MDS in social psychology is that the 11 factors of the Aspiration Index are visually represented in an 2D plane. (And Don't you like it more when you are familiar with the way of interpreting the results?!)

Some notes:

1. MDS is a visualization tool. The goal is to reduce the observed complexity in the data matrix to lower dimensions (2 or 3) for humans to visualize.

2. MDS is a descriptive tool, rather than an inferential tool (de Leeuw, 2001). However, a representative sample should be recruited in order to generalize the description to the population.

3. MDS is more flexible than factor analysis: (a) it doesn't require that the underlying data are distributed as multivariate normal, and (b) it can be applied to any kind of distances or similarities, rather than just the computed correlation matrix.

4. MDS is different from cluster analysis. The goal of MDS is not to group/partition objects, but users can still visually cluster objects based on MDS.

7. The labeling of a dimension in MDS is arbitrary. The only requirement is that the two ends sum to zero at the center. It is similar to, but not the same as, bipolar, because it doesn't say anything about mutual exclusivity of the two ends in reality.

8. The number of dimensions is usually 2 (at best 3). On the one hand, the number should not be just 1; otherwise, all gradient-based methods in one-dimension will typically result in local optima. On the other hand, the number should not exceed 3; otherwise, visualization could be very difficult.

9. Another example of MDS would be to visualize the travel-times between cities. In the matrix, each row and each column would correspond to a city. MDS could then recreate a map containing the cities, solely from the matrix. This map would look similar to the actual map of city locations, but would differ in interesting ways. Cities connected by faster than average transportation passageways would appear closer together, while roadblocks would move cities apart.