coraAI
data consists of a response, journal indication
matrix, and co-citation network. This data is a subset of the Cora
text mining project (refer to reference).
The observations are text documents that consist of 879 published
papers about either Artificial Intelligence (AI
) or Machine
Learning (ML
). The journal name for each document is available
(8 journals and an other category). The observed co-citation graph
is also available, where each vertex is a document (observation), and
the edge is the count of citations in common between each document and
all other documents. The goal is to incorporate both the text information and co-citation
information for the prediction of paper subject AI
/ML
.
Another, interesting problem might be to predict the journal of the
paper given the text information and the categorization.
data(coraAI)
coraAI
data consists of three objects each discussed next. class
: categorization of the document(observation) as either
AI
or ML
. Typically the response. journals
: indication of the document as published in a specific
journal, (other, artificial-intelligence, machine-learning,
nueral-computing, ieee-trans-Nnet, ieee-tpami,
j-artificial-intelligence-research, ai-magazine, JASA) cite
: the adjacency matrix of the co-citation network for these
879 documents.The spa is particularly appealing for this data since it fits a function directly to the graph and coeficient vector to the journals. Other approaches require convergence of the journal information into a graph for processing, which is unclear when the data is a binary design matrix.
M. Culp (2011). spa: A Semi-Supervised R Package for Semi-Parametric Graph-Based Estimation. Journal of Statistical Software, 40(10), 1-29. URL http://www.jstatsoft.org/v40/i10/.