Dataset Info

The Dataset Info report provides both an overview and context for
the currently selected dataset. It may answer questions such as how and
when were these data collected? What's the scope of the dataset? If the
owner of the dataset chose to provide other information about the
dataset, that is displayed here as well.

Dataset Overview and Statistics

The dataset overview table is presented at the top of the Dataset
Info report. This report loads by default when you select a dataset to
browse. If you don't see the overview, click the main tab titled
Dataset Info followed by the Overview
link below.

In the Overview table, a number of fields describe the dataset's
characteristics. If you are a project admin for this project, you can edit some of
the fields in the Overview table—click a field to edit it.

The overview fields are:

Category

Description

Project

A collection of datasets with a principal investigator and a data provider (who
is often the same as the principal investigator). We consider a project to be a
title for your research (e.g., Perceptual Fluency in Geometry Achievement).
It might be similar to the title of a grant proposal, or some other phrase that
identifies your work. To change the project name, contact
us.

Principal Investigator

Defined at the project level, this is the person who, along with the data provider,
determines who has access to the project. To change the principal investigator,
contact us.

Data Provider

Defined at the project level, the data provider is a person responsible for providing
a dataset to DataShop. He or she, with the agreement of CMU legal, may specify whether a
project-specific terms of use should apply to a project. Most datasets in DataShop use the
same person for both the data provider and principal investigator fields; in this case,
data provider is not shown. The data provider, along with the principal investigator,
determines who has access to the project. To change the data provider,
contact us.

Curriculum

Used to describe the curriculum in which these data were collected (e.g., Algebra
I).

Dates

The date range(s) for when these data were collected. This can be determined from
the log data by pressing the auto-set button.

Domain/LearnLab

The Domain/LearnLab group to which this dataset belongs (e.g., Language/Chinese or
Math/Algebra).

Tutor

The title of the tutor software used to collect data (e.g., Algebra 1 2005
or CTAT 2.7)

Description

A description of the dataset. This can include links to outside resources. It can
be helpful to enter as much contextual information here as possible so that other
researchers can attempt to make sense of the dataset. This is especially true if the
dataset is part of a public project.

Has Study Data

Whether or not the dataset contains data that are the result of a research study
or experiment.

Hypothesis

The hypothesis that was tested. Only displayed if "Has Study Data" is "yes".

Status

The status of the dataset (one of on-going, complete, files-only,
or other ).

School(s)

The school(s) where these data were collected.

Acknowledgment for Secondary Analysis

Acknowledgement that a researcher should include in a publication if they use this
dataset for their research. The acknowledgement, if entered, is shown on the Citation page
and in a text file included with each export.

Preferred Citation for Secondary Analysis

Citation that a researcher should include in a publication if they use this
dataset for their research. The citation, if entered, is shown on the Citation page
and in a text file included with each export. A citation must be for a paper attached
to the dataset.

Additional Notes

Any additional information about the dataset.

The statistics table, described below, is generated from the data
and is therefore not editable.

Category

Description

Number of Students

The total number of students for which there is data.

Number of Unique Steps

The number of unique steps in
the dataset, where uniqueness is defined as a step within a specific problem hierarchy (the curriculum
location where the problem appears). The same step attempted by two students equals
only one unique step.

Total Number of Steps

The number of steps in the dataset, where each student-step counts as one step.
The same step attempted by two students equals two steps in the total number of steps.
For example, if problem A has steps S1, S2, and S3, and student A does S1 and S2 while
student B does S2 and S3, and there is just that problem in the dataset, then there are
3 unique steps and 4 total steps.

Sample Selector

Sample Selector is a tool for creating and editing
samples, or groups of data you compare across—they're
not "samples" in the statistical sense, but more like filters.

By default, a single sample exists: "All Data". With the Sample
Selector, you can create new samples to organize your data.

You can use samples to:

Compare across conditions

Narrow the scope of data analysis to a specific time range,
set of students, problem category, or unit of a curriculum (for example)

A sample is composed of one or more filters, specific
conditions that narrow down your sample.

Creating a sample

The general process for creating a sample is to:

Add a filter from the categories at the left to the composition
area at the right

Modify the filter to select the subset of data you're interested
in, saving it when done

View the sample preview table to see the effect of adding your filter,
making sure you don't have an empty set (ie, a filter or combination
of filters that exclude all transactions).

Name and describe the sample

Decide whether to share the sample with others who can view the
dataset

Save the sample

The effect of multiple filters

DataShop interprets each filter after the first as an additional
restriction on the data that is included in the sample. This is also known
as a logical "AND". You can see the results of multiple filters in the
sample preview as soon as all filters are "saved".