Using the Spark DataFrame API

A DataFrame is a distributed collection of data organized into named columns. It is
conceptually equivalent to a table in a relational database or a data frame in R or in the
Python pandas library. You can construct DataFrames from a wide array of
sources, including structured data files, Apache Hive tables, and existing Spark resilient
distributed datasets (RDD). The Spark DataFrame API is available in Scala, Java, Python, and
R.

This section provides examples of DataFrame API use.

To list JSON file contents as a DataFrame:

As user spark, upload the people.txt and
people.json sample files to the Hadoop Distributed File System
(HDFS):