Importing New Data

In addition to logging data directly to the DataShop logging database, you can import data to
create a new dataset in DataShop. To begin the import process, upload a new dataset from the
Upload a dataset page. If you've never uploaded a dataset before, DataShop
will prompt you to request permission to do so.

Transaction data

On the upload page, you will be asked to specify whether you want to upload transaction data. Transaction data is data in either
of the above two formats. If you want to create a dataset that will hold file attachments (of
any format), or if you want to create the dataset as a placeholder and add transaction data
later, choose No transaction data now.

De-identification requirements

Data uploaded to DataShop must be de-identified. That is, the identity of human subjects
referenced in the data must not be discoverable.

If your file is entirely de-identified, choose the first option, I certify that all data in
this file including the content of the "Anon Student Id" column is de-identified.

If your file is de-identified except for the identifiers present in the Anon Student Id
column, select the second option, I certify that all data in this file except the content of
the "Anon Student Id" column is de-identified. DataShop will de-identify that column for you,
substituting the identifiers in that column with anonymous ones. (You can later obtain a mapping
from DataShop identifiers to the original identifiers by emailing us.)

Creating a new project

A project is primarily a container for a group of related datasets. In addition, access to
datasets is granted by project. You can create a new project from the upload page or the Create a project page. When specifying a new project, you will be
asked to specify a data collection type. Those options are described on our IRB page.

Import process

The import process is as follows:

Upload one or more files (as a .ZIP file) to be imported as a dataset.

DataShop will perform a quick verification of the file's first 100 lines and display the results.*
You will need to correct any errors that are found. If any potential issues are found, you will be asked
to decide if you want to continue.

After the initial verification completes, the dataset will appear in your Import Queue as Queued for Verification,
where a separate process will verify the dataset in its entirety.*

When verification is complete, you will receive an email with the verification results. The status
for your dataset will update in your Import Queue.
When your dataset is loaded, you will be notified via email.

After your dataset is loaded, we ask that you examine the dataset and then release it.
When you release a dataset, it inherits the permissions of its project (those who can access the project can then access
this dataset) and becomes visible in the main index of datasets.

Sample Selector

Sample Selector is a tool for creating and editing
samples, or groups of data you compare across—they're
not "samples" in the statistical sense, but more like filters.

By default, a single sample exists: "All Data". With the Sample
Selector, you can create new samples to organize your data.

You can use samples to:

Compare across conditions

Narrow the scope of data analysis to a specific time range,
set of students, problem category, or unit of a curriculum (for example)

A sample is composed of one or more filters, specific
conditions that narrow down your sample.

Creating a sample

The general process for creating a sample is to:

Add a filter from the categories at the left to the composition
area at the right

Modify the filter to select the subset of data you're interested
in, saving it when done

View the sample preview table to see the effect of adding your filter,
making sure you don't have an empty set (ie, a filter or combination
of filters that exclude all transactions).

Name and describe the sample

Decide whether to share the sample with others who can view the
dataset

Save the sample

The effect of multiple filters

DataShop interprets each filter after the first as an additional
restriction on the data that is included in the sample. This is also known
as a logical "AND". You can see the results of multiple filters in the
sample preview as soon as all filters are "saved".