A pre-scored verb-object co-occurrence matrix for 240 target nouns denoting goods and the 3 feature verbs own, buy and sell. This matrix is useful for illustrating the application and purpose of dimensionality reduction techniques.
DSM_GoodsMatrix
A numeric matrix with 240 rows corresponding to target nouns denoting goods and 4 columns, corresponding to
own
, buy
, sell
:association scores for co-occurrences of the nouns with the verbs own, buy and sell
fringe
:an indicator of how close each point is to the “fringe” of the data set (ranging from 0 to 1)
Co-occurrence data are based on verb-object dependency relations in the British National Corpus, obtained from DSM_VerbNounTriples_BNC
. Only nouns that co-occur with all three verbs are included in the data set.
The co-occurrence matrix is weighted with non-sparse log-likelihood (simple-ll
) and an additional logarithmic transformation (log
). Row vectors are not normalized.
The fringeness score in column fringe
indicates how close a data point is to the fringe of the data set. Values are distance quantiles based on PCA-whitened Manhattan distance from the centroid. For example, fringe >= .8
characterizes 20% of points that are closest to the fringe. Fringeness is mainly used to select points to be labelled in plots or to take stratified samples from the data set.
DSM_GoodsMatrix[c("time", "goods", "service"), ]
Run the code above in your browser using DataLab