Learn R Programming

wordspace (version 0.2-0)

DSM_GoodsMatrix: A Scored Co-occurrence Matrix of Nouns Denoting Goods (wordspace)

Description

A pre-scored verb-object co-occurrence matrix for 240 target nouns denoting goods and the 3 feature verbs own, buy and sell. This matrix is useful for illustrating the application and purpose of dimensionality reduction techniques.

Usage

DSM_GoodsMatrix

Arguments

Format

A numeric matrix with 240 rows corresponding to target nouns denoting goods and 4 columns, corresponding to

own, buy, sell:

association scores for co-occurrences of the nouns with the verbs own, buy and sell

fringe:

an indicator of how close each point is to the “fringe” of the data set (ranging from 0 to 1)

Details

Co-occurrence data are based on verb-object dependency relations in the British National Corpus, obtained from DSM_VerbNounTriples_BNC. Only nouns that co-occur with all three verbs are included in the data set.

The co-occurrence matrix is weighted with non-sparse log-likelihood (simple-ll) and an additional logarithmic transformation (log). Row vectors are not normalized.

The fringeness score in column fringe indicates how close a data point is to the fringe of the data set. Values are distance quantiles based on PCA-whitened Manhattan distance from the centroid. For example, fringe >= .8 characterizes 20% of points that are closest to the fringe. Fringeness is mainly used to select points to be labelled in plots or to take stratified samples from the data set.

Examples

Run this code
# NOT RUN {
DSM_GoodsMatrix[c("time", "goods", "service"), ]

# }

Run the code above in your browser using DataLab