Data that you put (or “commit”) into Pachyderm ultimately lives in an object
store of your choice (S3, Minio, GCS, etc.). This data is content-addressed by
Pachyderm to build our version control semantics for data and is therefore not
“human-readable” directly in the object store. That being said, Pachyderm
allows you and your pipeline stages to interact with versioned data like you
would in a normal file system.

Versioned data in Pachyderm lives in repositories (again think about something
similar to “git for data”). Each data “repository” can contain one file,
multiple files, multiple files arranged in directories, etc. Regardless of the
structure, Pachyderm will version the state of each data repository as it
changes over time.

Regardless of the method you use to get data into Pachyderm (CLI, language client, etc.),
the mechanism that is used is a “commit” of data into a data
repository. In order to put data into Pachyderm, a commit must be “started” (aka
an “open commit”). Data can then be put into Pachyderm as part of that open commit and will be available once the commit is “finished” (aka a “closed commit”).

In terms of actually getting data into Pachyderm via “commits,” there are
a few options:

Via the pachctl CLI tool: This is the great option for testing, development,
integration with CI/CD, and for users who prefer scripting.

Via one of the Pachyderm language clients: This option is ideal for Go, Python,
or Scala users who want to push data to Pachyderm from services or
applications written in those languages. Actually, even if you don’t use Go,
Python, or Scala, Pachyderm uses a protobuf API which supports many other
languages, we just haven’t built the full clients yet.

Via the Pachyderm dashboard: The Pachyderm Enterprise dashboard provides a
very convenient way to upload data right from the GUI. You can find out more
about Pachyderm Enterprise Edition here.

When you deployed Pachyderm, the Pachyderm Enterprise dashboard was also
deployed automatically (if you followed one of our deploy guides here). You can
get a FREE trial token to experiment with this dashboard, which will let you create
data repositories and add data to those repositories via a GUI. More information
about getting your FREE trial token and activating the dashboard can be found
here.

In the dashboard, you can create a data repository by clicking on the + sign icon
in the lower right hand corner of the screen:

When you click “Create Repo,” a box will pop up prompting you for a name and
optional description for the repo:

Once you fill in your name and click save, the new data repository will show up
in the main dashboard screen:

To add data to this repository, you can click on the blue icon representing
the repo. This will present you with some details about the repo along with an
“ingest data” icon:

You can add data from an object store or other URL by clicking this “ingest data”
icon: