This guide presents basic configuration tasks for the Pentaho Server, data connections, the Pentaho design tools, and Hadoop cluster connections so you can get started creating ETL solutions and data analytics. This guide assumes you have installed the Pentaho software.

Tools: These configuration tasks can be performed through the PUC (Pentaho User Console), the PDI (Pentaho Data Integration) client, or edits to shell scripts and property files.

Login Credentials: A Pentaho administrator user name and password is required to perform configuration tasks through the user console.

These tasks are for IT and Pentaho administrators as described in the following definitions:

An IT administrator installs, configures, and upgrades the Pentaho Server. An IT administrator knows where the data is stored, how to connect to it, details about the computing environment, and how to use the command line on Microsoft Windows or Linux.

A Pentaho administrator is responsible for creation and management of users and roles along with managing workstations so the ETL specialists and business analysts can create, publish, and share content.

IT Administrator Tasks

As an IT administrator, you need to configure the Pentaho Server and define what security to use. If your team is working with Big Data, you will also need to set up a connection to a Hadoop cluster.

Configure the Pentaho Server and Security

Basic server tasks include starting and stopping the Pentaho Server, increasing the server's memory limit, and specifying data connections. These IT administrator tasks prepare the system for more specific Pentaho administrator configuration tasks, like defining connections and managing users and roles.

You also need to establish a security plan for your Pentaho system. Pentaho supports two different security options: Pentaho Security and advanced security providers, such as LDAP, Single Sign-On, or Microsoft Active Directory. The following task assists you in defining your security plan:

The Pentaho Server can be configured to connect to a Hadoop cluster through an adaptive big-data layer referred to as a shim. You must modify shim properties and configuration files before you can connect to a Hadoop cluster. Pentaho regularly develops and releases shims, even in between releases, so that customers can easily keep abreast of the latest technological developments. To see which shims are supported for this version of Pentaho, see the Component Reference.

Pentaho Administrator Tasks

As a Pentaho administrator, you need to configure data connections, manage the Pentaho Server, and set up the BA (Business Analytics) or PDI (Pentaho Data Integration) design tools.

Configure Data Connections

Data connection tasks include establishing data connections for the Pentaho Server and the Pentaho Repository, as well as steps on how to manage the permissions for users accessing those data connections.

Manage the Pentaho Server

The ​​​Pentaho Administrator is responsible for creating and managing users and workstations in the organization so the ETL specialists and business analysts can create, publish, and share content. Depending on the size and needs of your organization, their duties can include updating licenses, managing users and roles, creating and modifying data sources, and scheduling reports.

If you are using basic Pentaho Security, the Pentaho Administrator may be tasked with creating and managing users and roles, including assigning permissions to allow users to access the content they need.

Set Up the Design Tools and Utilities

Before using design tools and utilities, you need to perform configuration tasks for each workstation running these tools. Depending on how these tools and utilities were installed, they might be located on different machines other than the Pentaho Server.