PISA: Programme for International Student Assessment 2009 USA Data

Description

This dataset contains scored cognitive item response data from the 2009 administration of the Programme for International Student Assessment (PISA), an international study of education systems. The data, and license under which they are released, are available online at https://www.oecd.org/pisa/.

Usage

PISA

Arguments

Format

PISA is a list containing four elements. The first, PISA$students, is a data.frame containing 233 variables across 5233 individuals, with one row per individual. All but one variable come from the USA PISA data file "INT_COG09_S_DEC11.txt". The remaining variable, language spoken at home, has been merged in from the student questionnaire file "INT_STQ09_DEC11.txt". Variable names match those found in the original files:

list("stidstd"): Unique student ID (one for each of the 5233 cases);
list("schoolid"): School ID (there are 165 different schools);
list("bookid"): ID for the test booklet given to a particular student, of which there were 13;
list("langn"): Student-reported language spoken at home, with 4466 students reporting English (indicated by code 313), 484 students reporting Spanish (with code 156) and 185 students reporting "another language" (code 859);
list("m033q01"): Scored item-response data across the 189 items included in the general cognitive assessment, described below; and
to: Scored item-response data across the 189 items included in the general cognitive assessment, described below; and
list("s527q04t"): Scored item-response data across the 189 items included in the general cognitive assessment, described below; and
list("pv1math"): PISA scale scores, referred to in the PISA technical documentation as "plausible values".
to: PISA scale scores, referred to in the PISA technical documentation as "plausible values".
list("pv5read5"): PISA scale scores, referred to in the PISA technical documentation as "plausible values".

Next, PISA$booklets is a data.frame containing 4 columns and 756 rows and describes the 13 general cognitive assessment booklets. Variables include:

list("bookid"): The test booklet ID, as in PISA$students;
list("clusterid"): ID for the cluster or item subset in which an item was placed; items were fully nested within clusters; however, each item cluster appeared in four different test booklets;
list("itemid"): Item ID, matching the columns of PISA$students; each item appears in PISA$booklets four times, once for each booklet; and
list("order"): The order in which the cluster was presented within a given booklet.

PISA$items is a data.frame containing 4 columns and 189 rows, with one row per item. Variables include:

list("itemid"): Item ID, as in PISA$booklets
list("clusterid"): Cluster ID, as in PISA$booklets
list("max"): Maximum possible score value, either 1 or 2 points, with dichotomous scoring (max of 1) used for the majority of items; and
list("subject"): The subject of an item, equivalent to the first character in itemid and clusterid.
list("format"): Item format, abbreviated as mc for multiple choice, cmc for complex multiple choice, ocr for open constructed response, and ccr for closed constructed response.
list("noptions"): Number of options, zero except for some multiple choice items.

Finally, PISA$totals is a list of 13 data.frames, one per booklet, where the columns correspond to total scores for all students on each cluster for the corresponding booklet. These total scores were calculated using PISA$students and PISA$booklets. Elements within the PISA$totals list are named by booklet, and the columns in the data.frame are named by cluster. For example, PISA$totals$b1$m1 contains the total scores on cluster M1 for students taking booklet 1.