Data science courses and tutorials have grown popular in recent years, yet they are still taught using production-grade programming tools (e.g., R, MATLAB, and Python IDEs) within desktop computing environments. Although powerful, these tools present high barriers to entry for novices, forcing them to grapple with the extrinsic complexities of software installation and configuration, data file management, data parsing, and Unix-like command-line interfaces. To lower the barrier for novices to get started with learning data science, we created DS.js, a bookmarklet that embeds a data science programming environment directly into any existing webpage. By transforming any webpage into an example-centric IDE, DS.js eliminates the aforementioned complexities of desktop-based environments and turns the entire web into a rich substrate for learning data science. DS.js automatically parses HTML tables and CSV/TSV data sets on the target webpage, attaches code editors to each data set, provides a data table manipulation and visualization API designed for novices, and gives instructional scaffolding in the form of bidirectional previews of how the user's code and data relate.

People often teach data science using production-grade IDEs such as
RStudio for R, MATLAB, and Jupyter notebooks for Python and other
languages. These IDEs are usually embedded within Unix-like command-line
environments to handle data file management, script execution, and
version control. We chatted with a few prominent data science
instructors and found that this status quo has several major drawbacks:

Novices must deal with the complexities of installing and configuring
IDEs meant for professionals. On top of that, they must also grapple
with arcane Unix-like command-line concepts to manage data files and
scripts locally on their machines (e.g., What are absolute vs.
relative paths? Where did all my files go? Why can't I run this script
from that directory?!?).

Since these IDEs are designed for professionals, they don't provide
any instructional scaffolding to help novices build mental models of
how data science APIs operate.

Code, data, and exposition are stored in separate places, which makes
it harder for instructors to produce and distribute self-contained
instructional materials.

These limitations all stem from the fact that people currently need to
bring their data into monolithic data science environments, but what
if instead they could bring a lightweight data science environment
directly to their data? To explore this possibility, we built a
browser bookmarklet called DS.js, which
turns any existing webpage into a live programming environment for learning data
science. The target webpage contains all of the required data, and DS.js brings
the user's code directly to it.

The beauty of a bookmarklet is that it works in any modern web browser
and doesn't require users to install or configure anything. (To
“install” DS.js, simply drag its bookmarklet into your
bookmarks bar like a regular webpage bookmark.)

DS.js automatically detects structured data sources on the webpage
(such as the HTML population table in this Wikipedia example) and parses
them into special JavaScript data structures. It also automatically
parses, say, CSV files linked from the current page. More advanced users
can use a GUI-based selector (powered by SelectorGadget) to visually choose
groups of webpage elements to parse or manually write jQuery selectors
to parse any other data on the page.

An “Append DS.js editor” button appears next to each
parsed data source. Click that button to embed a JavaScript code editor
into the webpage right underneath that data source. Within that editor,
you can write arbitrary JavaScript code to transform, analyze, and
visualize that data. The outputs of your analyses (such as statistics,
derived tables, and graphs) get updated live in a pane to the right of
your code. DS.js comes with a JavaScript library for introductory data
science, modeled after datascience.py; think
of it as a super-simplified form of Pandas for Python or the tidyverse
for R.

To help build proper mental models, DS.js includes instructional
scaffolding in the form of bidirectional previews of code and
data. You can click on any code expression to visually preview its
effects on the corresponding data tables. You can also click on parts of
data tables to preview suggestions for what code to write to transform
those parts.

All of your code can be encapsulated in a single URL. This lets you
easily share your data science explorations or questions with others,
and everyone can safely modify their own copies, again without
installing or configuring any software.

In sum, we're really excited about using the web as a substrate for
learning data science because it already contains enormous amounts of
data in all sorts of domains that could engage students. Educational
materials made using DS.js benefit from the authenticity of being
situated directly within real-world webpages so that students can see
the original context behind their data while writing analysis code.
Finally, in addition to being used by students and instructors in
educational settings, another potential user audience for DS.js is
citizen data scientists who want to play around with analyzing and
visualizing data that they find on the web and easily share their
findings for others to remix and build upon.

Data science courses and tutorials have grown popular in recent years, yet they are still taught using production-grade programming tools (e.g., R, MATLAB, and Python IDEs) within desktop computing environments. Although powerful, these tools present high barriers to entry for novices, forcing them to grapple with the extrinsic complexities of software installation and configuration, data file management, data parsing, and Unix-like command-line interfaces. To lower the barrier for novices to get started with learning data science, we created DS.js, a bookmarklet that embeds a data science programming environment directly into any existing webpage. By transforming any webpage into an example-centric IDE, DS.js eliminates the aforementioned complexities of desktop-based environments and turns the entire web into a rich substrate for learning data science. DS.js automatically parses HTML tables and CSV/TSV data sets on the target webpage, attaches code editors to each data set, provides a data table manipulation and visualization API designed for novices, and gives instructional scaffolding in the form of bidirectional previews of how the user's code and data relate.