Feature Extraction by Multidimensional Scaling

Multidimensional Scaling

Multidimensional scaling (MDS) has been widely used as a data visualization and dimension reduction tool in many fields including psychometrics. The goal of MDS is to locate objects in space according to their pairwise dissimilarities in such a way that similar objects are close together, while less similar objects are far apart. Mathematically, applying MDS to objects with dissimilarity matrix \(\boldsymbol D\) minimizes the objective function \[ \sum_{i<j} \left(d_{ij} - \|\boldsymbol x_i - \boldsymbol x_j\|\right)^2 \] with respect to \(\boldsymbol X = (\boldsymbol x_1, \ldots, \boldsymbol x_n)^T\), where \(\boldsymbol x_i \in \mathbb{R}^K\) is the coordinates of the \(i\)th object in \(K\)-dimensional Euclidean space and \(\| \boldsymbol x_i - \boldsymbol x_j\| = \sqrt{(\boldsymbol x_i - \boldsymbol x_j)^T (\boldsymbol x_i - \boldsymbol x_j)}\).

Each dimension of the coordinates is a feature describing the differences among the objects. To extract features from process data, we need to find a dissimilarity measure that appropriately summarizes the differences between two action sequences.

Dissimilarity between sequences

At least three aspects should be taken into account when choosing a dissimilarity measure for action sequences. . The dissimilarity measure should work for sequences whose elements are categorical. . The dissimilarity measure should work for sequences with unequal lengths. . The dissimilarity measure can account for the difference in the order of actions.

Examples of dissimilarity meansures that meet these requirements are

Order-based sequence similarity proposed by Gomez-Alonso and Valls (2008)
Optimal symbol alignment distance proposed by Herranz, Nin and Sole (2011)
Levenshtein distance

Procedure

The procedure to extract features from process data by multidimensional scaling:

Form the dissimilarity matrix \(\boldsymbol D\) of \(n\) action sequences \(\boldsymbol s_1, \boldsymbol s_2, \ldots, \boldsymbol s_n\) by calculating the pairwise dissimilarities \(d_{ij}, 1\leq i, j \leq n\) according to a chosen dissimilarity measure.
Obtain \(K\) raw features \(\tilde{\boldsymbol x}_1, \ldots, \tilde{\boldsymbol x}_K\) by applying multidimensional scaling on \(\boldsymbol{D}\).
Obtain \(K\) principal features \(\boldsymbol x_1, \ldots, \boldsymbol x_K\) by performing principal component analysis (PCA) on the \(K\) raw features.