This dataset contains scored cognitive item response data from the 2009 administration of the Programme for International Student Assessment (PISA), an international study of education systems. The data, and license under which they are released, are available online at https://www.oecd.org/pisa/.
PISA
PISA
is a list
containing four elements. The first,
PISA$students
, is a data.frame
containing 233 variables across
5233 individuals, with one row per individual. All but one variable come
from the USA PISA data file "INT_COG09_S_DEC11.txt". The remaining variable,
language spoken at home, has been merged in from the student questionnaire
file "INT_STQ09_DEC11.txt". Variable names match those found in the original
files:
Unique student ID (one for each of the 5233 cases);
School ID (there are 165 different schools);
ID for the test booklet given to a particular student, of which there were 13;
Student-reported language spoken at home, with 4466 students reporting English (indicated by code 313), 484 students reporting Spanish (with code 156) and 185 students reporting "another language" (code 859);
Scored item-response data across the 189 items included in the general cognitive assessment, described below; and
Scored item-response data across the 189 items included in the general cognitive assessment, described below; and
Scored item-response data across the 189 items included in the general cognitive assessment, described below; and
PISA scale scores, referred to in the PISA technical documentation as "plausible values".
PISA scale scores, referred to in the PISA technical documentation as "plausible values".
PISA scale scores, referred to in the PISA technical documentation as "plausible values".
Next, PISA$booklets
is a data.frame
containing 4
columns and 756 rows and describes the 13 general cognitive assessment
booklets. Variables include:
The test
booklet ID, as in PISA$students
;
ID for the cluster or item subset in which an item was placed; items were fully nested within clusters; however, each item cluster appeared in four different test booklets;
Item ID, matching the
columns of PISA$students
; each item appears in PISA$booklets
four times, once for each booklet; and
The order in which the cluster was presented within a given booklet.
PISA$items
is a data.frame
containing 4 columns and 189 rows,
with one row per item. Variables include:
Item ID, as in PISA$booklets
Cluster ID,
as in PISA$booklets
Maximum possible score value, either 1 or 2 points, with dichotomous scoring (max of 1) used for the majority of items; and
The subject of an item,
equivalent to the first character in itemid
and clusterid
.
Item format, abbreviated as mc
for multiple
choice, cmc
for complex multiple choice, ocr
for open
constructed response, and ccr
for closed constructed response.
Number of options, zero except for some multiple choice items.
Finally, PISA$totals
is a list of 13
data.frame
s, one per booklet, where the columns correspond to total
scores for all students on each cluster for the corresponding booklet. These
total scores were calculated using PISA$students
and
PISA$booklets
. Elements within the PISA$totals
list are named
by booklet, and the columns in the data.frame
are named by cluster.
For example, PISA$totals$b1$m1
contains the total scores on cluster
M1 for students taking booklet 1.