Dataset: USNA Physics Fall 2008

Sample Selector

Sample Selector is a tool for creating and editing
samples, or groups of data you compare across—they're
not "samples" in the statistical sense, but more like filters.

By default, a single sample exists: "All Data". With the Sample
Selector, you can create new samples to organize your data.

You can use samples to:

Compare across conditions

Narrow the scope of data analysis to a specific time range,
set of students, problem category, or unit of a curriculum (for example)

A sample is composed of one or more filters, specific
conditions that narrow down your sample.

Creating a sample

The general process for creating a sample is to:

Add a filter from the categories at the left to the composition
area at the right

Modify the filter to select the subset of data you're interested
in, saving it when done

View the sample preview table to see the effect of adding your filter,
making sure you don't have an empty set (ie, a filter or combination
of filters that exclude all transactions).

Name and describe the sample

Decide whether to share the sample with others who can view the
dataset

Save the sample

The effect of multiple filters

DataShop interprets each filter after the first as an additional
restriction on the data that is included in the sample. This is also known
as a logical "AND". You can see the results of multiple filters in the
sample preview as soon as all filters are "saved".

Dataset Info / Overview

This page provides both an overview and context for
the current dataset. It may answer questions such as:

How, when, and where were these data collected?

What's the scope of the dataset?

If this was an experiment, what were the research goals?

How should I cite this dataset for secondary analysis?

If you are a project admin for this project, you can edit some of
the fields in the Overview table—click a field to edit
it. You can help other researchers by describing the dataset and
the context in which it was created.

Have you or someone you know published about these data? Attach a
paper to this dataset on the Files tab.

Dataset statistics at a glance

You can gauge the size of the dataset by looking at the numbers
in the Statistics table,
particularly the Total Number of
Students, Transactions, and
Student Hours.

The Knowledge Component Models,
or step-to-knowledge-component mappings, are listed at the bottom of
the table. If you see a few Knowledge Component Models listed,
researchers have likely thought about different ways of attributing
skills to steps, and potentially new ways of categorizing knowledge in
this domain.
You can learn more about these models and create new ones by clicking the
KC Models subtab.

Dataset Info / Samples

A sample is a proper subset of a dataset and is composed of one or more filters, specific
conditions that narrow down your sample. This page lists samples shared by others, as well as those owned by you.

You can use samples to:

Compare across conditions

Narrow the scope of data analysis to a specific time range,
set of students, problem category, or unit of a curriculum (for example)

Creating a new dataset from an existing sample

A new dataset can be created from an existing sample by clicking on the
Save as Dataset icon next to a sample.
Creating a dataset from an existing sample will place the new dataset
into the same project as the source dataset, thus, inheriting the same permissions, IRB attributes,
Principal Investigator, and Data Provider as the parent project.

The general process for creating a new dataset from an existing sample is to:

Choose a unique name for the new dataset

Decide whether or not to include user-created KC models in your new dataset. If you choose to include them,
they will be copied to the new dataset. If you choose to exclude them, your new dataset will still contain
the 'default' KC model, if one was included in the original data.

Save the Dataset

Your new dataset will be added to the Import Queue. The system will send an email
once the new dataset has been loaded.

Creating a new sample

The general process for creating a sample is to:

Click the edit sample icon next to the
All Data sample.

Choose a unique sample name.

Add or modify an existing filter to select the subset of data you're interested
in, saving the filter when done.

View the sample preview table to see the effect of adding your filter,
making sure you don't have an empty set (ie, a filter or combination
of filters that exclude all transactions).

Decide whether to share the sample with others who can view the
dataset

Save as New

Modifying an existing sample

The general process for modifying a sample is to:

Click the edit sample icon next to the
desired sample.

Choose a unique sample name.

Add or modify an existing filter to select the subset of data you're interested
in, saving the filter when done.

View the sample preview table to see the effect of adding your filter,
making sure you don't have an empty set (ie, a filter or combination
of filters that exclude all transactions).

Decide whether to share the sample with others who can view the
dataset

Save the sample

Deleting a sample

Once a sample has been deleted, it cannot be recovered.

The effect of multiple filters on samples

DataShop interprets each filter after the first as an additional
restriction on the data that is included in the sample. This is also known
as a logical "AND". You can see the results of multiple filters in the
sample preview as soon as all filters are "saved".

Dataset Info / KC Models

A KC (Knowledge Component) model is a mapping between steps and
knowledge components. In DataShop, each unique step can map to zero or more
knowledge components.

From the KC Models page, you can compare existing KC models, export an
existing model or template for creating a new KC model, or import a new
model that you've created.

Comparing KC models

On the KC Models page, each model is described by:

a number of KCs

a number of observations labeled with KCs

five statistical measures of goodness of fit for the model: AIC, BIC, and three
Cross Validation RMSE values. These model fit values are described in more detail on the Model Values help page.

The models are sorted by AIC (lowest to highest, or best fit with fewest parameters to worst
fit or additional parameters) and then by model name.

One general goal of KC modeling is to determine the "best" model for representing knowledge
by fitting the model to the data. The "best" model would not only account for most of the
data—it would have the highest number of observations labeled with KCs—and fit the data
well, but it would do so with fewest parameters (KCs). The BIC value that DataShop calculates tells
you how well the model fits the data (lower values are better), and it penalizes the models for
overfitting (having additional parameters). This penalty for having additional parameters is
stronger than AIC's penalty, so it is used in DataShop for sorting models.

Why create additional KC models and import them to DataShop?

A primary reason for creating a new KC model is that an existing model is
insufficient in some way—it may model some knowledge components too
coarsely, producing learning curves that spike or dip, or it may be too
fine-grained (too many knowledge components), producing curves that end
after one or two opportunities. Or perhaps the model fails to model the
domain sufficiently or with the right terminology. In any case, you may find
value in creating a new KC model.

By importing the resulting KC model that you created back into DataShop,
you can use DataShop tools to assess your new model. Most reports in
DataShop support analysis by knowledge component model, while some
currently support comparing values from two KC models
simultaneously—see the predicted values on the error rate Learning
Curve, for example. We plan to create new features in DataShop that
support more direct knowledge component model comparison.

Auto-generated KC models

DataShop creates two knowledge component models in addition to the model
that was logged or imported when the dataset was created:

single-KC model: the same knowledge component is
applied to every transaction in the dataset, producing a very general
model

unique-step model: a unique knowledge
component is applied to each unique step in the dataset, producing a
very precise (likely too much so) model.

Creating a new KC model

Step 1: Export an existing model or blank template

To get started, click Export at the top of the KC Models page.

Select one or more existing KC models to use as a template for the new one, or
choose "(new only)" to download a blank template.

Click the Export button to download your file.

Step 2: Edit the KC model file in Excel or other text-file/spreadsheet editor

Define the KC model by filling in the cells in the column KC (model_name),
replacing "model_name" with a name for your new model.

Assign multiple KCs to a step by adding additional KC (model_name) columns, placing one
KC in each column. Replace "model_name" with the same model name you used for your new model;
you will have multiple columns with the same header.

Add additional KC models by creating a new KC (model_name) column
for each KC model, replacing "model_name" with the name of your new model.

Delete any KC model columns that duplicate existing KC models already in the dataset (unless you want
to overwrite these).

Do not change the values or headers of any other columns.

Step 3: Import a KC model file

Start the import process by clicking Import at the top of the
KC Models page.

Click Choose File to browse for the KC model file you edited.

Click Verify to start file verification. If errors are found
in your file, fix them and re-verify the file. When DataShop
successfully verifies the file, you can then import it by clicking
the Import button.

Dataset Info / Custom Fields

A custom field is a new column you define for annotating transaction data.
DataShop currently supports adding and modifying custom fields at the transaction level.

You can add or modify a custom field's metadata from this page, but to set the
data in that custom field, you need to use web
services, which is a way to interact with DataShop through a program you write.
You can also add custom fields when logging or importing new data.

Permissions

A custom field has an owner, the user who created it. Users who have edit or
admin permission for a project can create custom fields for a dataset in it.
Only the owner or a DataShop administrator can delete or modify the custom field.
Only DataShop administrators can delete custom fields that were logged with the data.

Custom Field Metadata

The following fields describe a custom field:

name—descriptive name for the new custom field. Must be unique across all custom fields
for the dataset. Must be no more than 255 characters.

description—description for the new custom field. Must be no more than 500 characters.

type—the data type of the custom field (see below). Cannot be modified later.

level—the level of aggregation that the custom field describes. Currently, the only accepted value
is transaction. Future versions may support other levels such as step or student.
Cannot be modified later.

Dataset Info / Problem List

The problem list page lists all problems in dataset, grouped by problem hierarchy,
which is a unique hierarchy of curriculum levels containing the problem (e.g., a problem might be
contained in a Unit A, Section B hierarchy).

This page is most useful for seeing which particular problems have
problem content stored: any problem
name shown as a hyperlink will link to the content that students saw when they interacted
with that problem. You can also filter on problems with or without problem content, and search
those lists.

Download all of the problem content associated with the dataset by clicking the
Download Problem Content button. The format of the download is
a single .zip file containing a hierarchy of .html and web content files (e.g., images, videos, audio).
The exact hierarchy of this file differs depending on the source of the problem content.

Dataset Info / Step List

The Step List table lists and decomposes all of the problems in the
dataset. It details the problem hierarchy (the unit, section, or other divisions that
contain the problem) and composition (the steps that make up a problem).

Dataset Info / Citation

This page displays dataset-specific citation guidance. This information is taken from the Dataset Info
fields "Acknowledgement for Secondary Analysis" and "Preferred Citation for Secondary Analysis",
which are settable by researchers who have edit access to the dataset.

Dataset Info / Problem Content

Problem content refers to a representation (text, images, html, etc.) of the content that students
interacted with in the system that generated the dataset's data. Note that the word "problem" is
used in the sense of any activity the user did that was named in the problem
column of the data.

When problem content is mapped to a dataset and its problems, users can jump from DataShop reports
to the problem content by clicking one of the "View Problem" buttons throughout the interface (often
in tooltips on problem or step name), allowing them to better understand the activities that
correspond with the data.

With problem content, you can:

Learn more about the system that students used

Inspect the interface and problem to explain student difficulties suggested by data

Use machine learning on an export of problem content from the Problem List page

Datasets with problem content are noted on the list of datasets with a problem content icon
.

Adding problem content to your dataset

Please contact us, and we will consult with you on the format DataShop expects for problem content.
For a faster solution, consider attaching files documenting your system on the Files tab of your dataset.

If you are a project admin for a dataset with problem content that has already been uploaded to
the DataShop server, you can use the Problem Content page to map problem content to
problems within the dataset. Select the Conversion Tool and Content Version
to see a list of content items that can be mapped to the dataset, then click add to perform the mapping.

To see a list of all problems in dataset and which have problem content, or to download all problem
content for a datset, visit the Problem List page.