Splunk Admin Manual

Version: 4.1.5

Generated: 10/06/2010 09:52 am

Copyright Splunk, Inc. All Rights Reserved Table of ContentsWelcome to Splunk administration....................................................................................................1 What's in this manual.................................................................................................................1 What is Splunk?.........................................................................................................................1

What's Splunk Web?................................................................................................................13 What are apps and add-ons?....................................................................................................13 Where to get more apps and add-ons......................................................................................16 App architecture and object ownership....................................................................................17 Manage app and add-on objects..............................................................................................19

How to configure Splunk...................................................................................................................21

i Table of ContentsAdd data and configure inputs Find more things to monitor with crawl.....................................................................................82 Send SNMP events to Splunk..................................................................................................84 Set up custom (scripted) inputs................................................................................................85 Whitelist or blacklist specific incoming data.............................................................................88 How log file rotation is handled.................................................................................................90

Set up forwarding and receiving......................................................................................................92

About forwarding and receiving................................................................................................92 Enable forwarding and receiving..............................................................................................96 Configure forwarders with outputs.conf..................................................................................101 Consolidate data from multiple machines..............................................................................105 Set up load balancing.............................................................................................................106 Route and filter data...............................................................................................................109 Clone data..............................................................................................................................115 Forward data to third-party systems.......................................................................................117 Encrypt and authenticate data with SSL................................................................................120 More about forwarders...........................................................................................................123

About users and roles............................................................................................................188 Add users and assign roles....................................................................................................189 Set up user authentication with Splunk..................................................................................193 Set up user authentication with LDAP....................................................................................194 Set up user authentication with external systems..................................................................201 Use single sign-on (SSO) with Splunk...................................................................................206 Delete user accounts using the CLI.......................................................................................209 User language and locale......................................................................................................209 Configure user session timeouts............................................................................................210

Define alerts.....................................................................................................................................234 How alerting works.................................................................................................................234 Set up alerts in savedsearches.conf......................................................................................235 Configure scripted alerts........................................................................................................242 Send SNMP traps to other systems.......................................................................................244

Set up backups and retention policies..........................................................................................246

What you can back up............................................................................................................246 How much space you will need..............................................................................................246 Back up indexed data.............................................................................................................247 Back up configuration information..........................................................................................249 Set a retirement and archiving policy.....................................................................................249 Archive indexed data..............................................................................................................251

About jobs and job management...........................................................................................316 Manage jobs in Splunk Web..................................................................................................317 Manage jobs in the OS...........................................................................................................318

Use Splunk's command line interface (CLI)..................................................................................320

About the CLI.........................................................................................................................320 Get help with the CLI..............................................................................................................321 Use the CLI to administer a remote Splunk server.................................................................322 CLI admin commands............................................................................................................324

Troubleshooting...............................................................................................................................481 Splunk log files.......................................................................................................................481 Work with metrics.log.............................................................................................................484 Contact Support.....................................................................................................................486 Anonymize data samples to send to support.........................................................................490 Not finding the events you're looking for?..............................................................................492 SuSE Linux: unable to get a properly formatted response from the server............................493 Command line tools for use with Support's direction.............................................................494 Troubleshooting configurations..............................................................................................496

viWelcome to Splunk administrationWhat's in this manualWhat's in this manual

This manual contains information and procedures for the Splunk administrator. If you're responsiblefor configuring, running, and maintaining Splunk as a service for yourself or other users, this manualis for you.

• add data inputs to Splunk

and much more.

Where is all the information about event types and source types etc?

For this all-new version of Splunk, we are trying something different--we've broken out all theinformation about Splunk knowledge into a separate manual just for the person who handles thatinformation. If that person is you, check the Knowledge Manager Manual. Let us know what youthink!

Looking for help with searching in Splunk?

Check out the User Manual and the Search Reference Manual for all things search. In particular,you might want to check out the Search Cheatsheet if you're looking for a quick list of commonexamples.

Make a PDF

If you'd like a PDF of any version of this manual, click the pdf version link above the table ofcontents bar on the left side of this page. A PDF version of the manual is generated on the fly for you,and you can save it or print it out to read later.

What is Splunk?What is Splunk?

Splunk is an IT search engine.

• You can use Splunk to search and navigate IT data from applications, servers, and network devices in real-time. • Data sources include logs, configurations, messages, alerts, scripts, code, metrics, etc.

1 • Splunk lets you search, navigate, alert, and report on all your IT data in real-time using Splunk Web.

Learn more about what Splunk is, what it does, and how it's different.

2What to do firstStart SplunkStart Splunk

This topic provides a brief instruction for starting Splunk. If you are new to Splunk, we recommendreviewing the User Manual first.

Start Splunk on Windows

On Windows, Splunk is installed by default into C:\Program Files\Splunk. Many examples in

the Splunk documentation use $SPLUNK_HOME to indicate the Splunk installation, or home, directory.You can replace the string $SPLUNK_HOME (and the Windows variant %SPLUNK_HOME%) withC:\Program Files\Splunk if you installed Splunk into the default directory.

You can start and stop Splunk on Windows in one of the following ways:

You should see this output:

splunkd is running (PID: 3162).

Note: On Unix systems, you must be logged in as the user who runs Splunk to run the splunkstatus command. Other users cannot read the necessary files to report status correctly.

You can also use ps to check for running Splunk processes:

# ps aux | grep splunk | grep -v grep

Solaris users, type -ef instead of aux:

# ps -ef | grep splunk | grep -v grep

4Configure Splunk to start at boot timeConfigure Splunk to start at boot time

On Windows, Splunk starts by default at machine startup. On other platforms, you must configure thismanually. To disable this, see the end of this topic.

Splunk provides a utility that updates your system boot configuration so that Splunk starts when thesystem boots up. This utility creates a suitable init script (or makes a similar configuration change,depending on your OS).

As root, run:

$SPLUNK_HOME/bin/splunk enable boot-start

If you don't start Splunk as root, you can pass in the -user parameter to specify which user to startSplunk as. For example, if Splunk runs as the user bob, then as root you would run:

$SPLUNK_HOME/bin/splunk enable boot-start -user bob

If you want to stop Splunk from running at system startup time, run:

$SPLUNK_HOME/bin/splunk disable boot-start

More information is available in $SPLUNK_HOME/etc/init.d/README and if you type help

boot-start from the command line.

Note for Mac users

Splunk automatically creates a script and configuration file in the directory:

/System/Library/StartupItems. This script is run at system start, and automatically stopsSplunk at system shutdown.

Note: If you are using a Mac OS, you must have root level permissions (or use sudo). You needadministrator access to use sudo..

Example:

Enable Splunk to start at system start up on Mac OS using:

just the CLI::

./splunk enable boot-start

the CLI with sudo:

sudo ./splunk enable boot-start

5Disabling boot-start on Windows

By default, Splunk starts automatically when you start your Windows machine. You can configure theSplunk processes (SplunkWeb and Splunkd) to start manually from the Windows Services controlpanel.

Find Splunk Manager in Splunk Web

Find Splunk Manager in Splunk Web

Launch Splunk Web

Navigate to:

http://mysplunkhost:8000

Use whatever host and port you chose during installation.

The first time you log in to Splunk with an Enterprise license, the default login details are:Username - adminPassword - changeme

Note: Splunk with a free license does not have access controls, so you will not be prompted for logininformation.

Find Splunk Manager

Splunk Web provides a convenient interface for managing most aspects of Splunk operations:Manager. To access Manager, look for the link in the upper right corner of Splunk Web:

Learn more about using Manager to configure and maintain your Splunk installation.

Install your license

Install your license

The first time you download Splunk, you are asked to register.

Your registration authorizes you to receive a temporary (60 day) Enterprise trial license, which allowsa maximum indexing volume of 500 MB/day. This license is included with your download.

The Enterprise license enables the following features:

• Multiple user accounts and access controls.

• Distributed search and data routing. • Deployment management.

6Important: You cannot use the same Enterprise license on multiple servers. Each instance of Splunk(including forwarders) must have its own unique license, whether a Free license or an Enterpriselicense. The only exception to this is the 1 MB/day forward-only license that can be installed onmultiple forwarding instances. For more information, read About Splunk licenses.

Access your license

All Splunk servers have a license located in $SPLUNK_HOME/etc/, whether it is a Free license(splunk-free.license) or an Enterprise license (splunk.license).

Where is your new license?

When you request a new license, you should receive the license in an email from Splunk. You canalso access that new license in your splunk.com My Orders page. To install a new license (orchange and update your existing license), replace your existing license with the new license.

You can install and update your licenses from Splunk Web's Manager > License page or with theCLI.

1. Create a new file named splunk.license.

2. Copy your new license key and paste it into splunk.license.

Note: If a splunk.license file already exists in this directory, mv will overwrite it without promptingfor confirmation of the action. This does not overwrite the Free license, splunk-free.license.However, by default Splunk ignores the Free license file if splunk.license exists.

4. Restart your Splunk server to apply your new license:

$SPLUNK_HOME/bin/splunk restart

First login after applying new trial or Enterprise license

To log in for the first time after applying an Enterprise license (converting from free), use the defaultusername "admin" with the password "changeme". If you later clean (reset) your user data, yourusername/password is reset to this default.

License violations

Violations occur when you exceed the maximum indexing volume allowed for your license. If youexceed your licensed daily volume on any one calendar day, you will get a violation warning. Themessage persists for 14 days. If you have 5 or more violations on an Enterprise license (3 on aFree license) in a rolling 30-day period, search will be disabled. Search capabilities return whenyou have fewer than 5 (3 for Free) violations in the previous 30 days or when you apply a new licensewith a larger volume limit.

Note: During a license violation period, Splunk does not stop indexing your data. Splunk only blocksaccess while you exceed your license.

index=_internal todaysBytesIndexed LicenseManager-Audit NOT

Before you begin configuring Splunk for your environment, check through the following defaultsettings to see if there's anything you'd like to change.

Changing the admin default password

Splunk with an Enterprise license has a default administration account and password,admin/changeme. Splunk recommends strongly that you change the default. You can do this viaSplunk's CLI or Splunk Web.

This command changes the admin password from changeme to foo.

NB: Passwords with special characters that would be interpreted by the shell (for example '$' or '!')must be either escaped or single-quoted:

./splunk edit user admin -password 'fflanda$' -role admin -auth

admin:changeme

or

./splunk edit user admin -password fflanda\$ -role admin -auth

admin:changeme

9Change network ports

Splunk uses two ports. They default to:

• 8000 - HTTP or HTTPS socket for Splunk Web.

• 8089 - Splunkd management port. Used to communicate with the splunkd daemon. Splunk Web talks to splunkd on this port, as does the command line interface and any distributed connections from other servers.

Note: You may have changed these ports at install time.

via Splunk Web

• Log into Splunk Web as the admin user.

• Click Manager in the top-right of the interface. • Click the System Configuration tab. • Click System Settings. • Change the value for Web port and click Save.

via Splunk CLI

To change the port settings via the Splunk CLI, use the CLI command set.

# splunk set web-port 9000

This command sets the Splunk Web port to 9000.

# splunk set splunkd-port 9089

This command sets the splunkd port to 9089.

Change the default Splunk server name

The Splunk server name setting controls both the name displayed within Splunk Web and the namesent to other Splunk Servers in a distributed setting.

The default name is taken from either the DNS or IP address of the Splunk Server host.

via Splunk Web

• Log into Splunk Web as the admin user.

• Click Manager in the top-right of the interface. • Click the System Configuration tab. • Click System Settings. • Change the value for Splunk server name and click Save.

via Splunk CLI

To change the server name via the CLI, type the following:

# splunk set servername foo

10This command sets the servername to foo.

Changing the datastore location

The datastore is the top-level directory where the Splunk Server stores all indexed data.

Note: If you change this directory, the server does not migrate old datastore files. Instead, it startsover again at the new location.

To migrate your data to another directory follow the instructions in Move an index.

via Splunk Web

• Log into Splunk Web as the admin user.

• Click Manager in the top-right of the interface. • Click the System Configuration tab. • Click System Settings. • Change the path in Path to indexes and click Save.

• TCP port 8089 (by default)

• any port that has been configured as for: ♦ SplunkTCP inputs ♦ TCP or UDP inputs

To bind the Splunk Web process (splunkweb) to a specific IP, use the server.socket_hostsetting in web.conf.

Temporarily

To make this a temporary change, set the environment variable SPLUNK_BINDIP=<ipaddress>

before starting Splunk.

Permanently

If you want this to be a permanent change in your working environment, modify

$SPLUNK_HOME/etc/splunk-launch.conf to include the SPLUNK_BINDIP attribute and<ipaddress> value. For example, to bind Splunk ports to 127.0.0.1, splunk-launch.conf shouldread:

# Modify the following line to suit the location of your Splunk install.# If unset, Splunk will use the parent of the directory this configuration# file was found in## SPLUNK_HOME=/opt/splunkSPLUNK_BINDIP=127.0.0.1

12Meet Splunk Web and Splunk appsWhat's Splunk Web?What's Splunk Web?

Splunk Web is Splunk's dynamic and interactive browser-based interface. Splunk Web is the primaryinterface for investigating problems, reporting on results, and managing Splunk deployments. Refer tothe system requirements for a list of supported operating systems and browsers.

To launch Splunk Web, navigate to:

http://<mysplunkhost>:<port>

Use whatever host and port you chose during installation. The default port is 8000, but if that port wasalready in use, the installer asked you to pick a different one.

The first time you log into Splunk with an Enterprise license, use username "admin" and password"changeme". Splunk with a free license does not support access controls or multiple user accounts.

Note: Starting in Splunk version 4.1.4, you cannot access Splunk Free from a remote browser untilyou have edited $SPLUNK_HOME/etc/local/server.conf and set allowRemoteLogin toAlways. If you are running Splunk Enterprise, remote login is disabled by default (set torequireSetPassword) for the admin user until you have changed the default password.

The Launcher

The first time you launch Splunk Web, you'll see the Launcher. This interface lets you choose anapp from the list of apps currently available to you. In particular, you might want to check out the"Getting Started" app, but depending on what OS you're running, you'll also see an app that's specficto Windows or UNIX. You can also visit the Splunk App Store to browse and download more apps.

Read on for more information about apps.

What are apps and add-ons?

What are apps and add-ons?

Apps give you insight into your IT systems with dashboards, reports, data inputs, and savedsearches that work in your environment from the moment they install. Apps can include new viewsand dashboards that completely reconfigure the way Splunk looks. Or, they can be as complex as anentirely new program using Splunk's REST API.

Add-ons let you tackle specific data problems directly. They are smaller, reusable components thatcan change the look and feel of Splunk, add data sources, or share information between users.Add-ons can be as simple as a collection of one or more event type definitions or saved searches.

When you're using Splunk, you're almost always using an app; we typically refer to that asbeing "in" an app. The default app is the Search app.

13What are apps and add-ons good for?

Apps and add-ons allow you to build different environments that sit on top of a single Splunkinstance. You can create separate interfaces for the different communities of Splunk users within yourorganization: one app for troubleshooting email servers, another for Web analysis, an add-on thatconnects a lookup table for the frontline support team to use, and so on. This way, everyone can usethe same Splunk instance, but see only data and tools that are relevant to their interests.

What apps and add-ons are there?

The first time you install and log into Splunk, you'll see the app Launcher. This interface shows youthe list of apps that have been preinstalled for you. By default, one of these apps is the GettingStarted app. This app has been developed to introduce new users to Splunk's features. If you're newto Splunk, we recommend you check it out and give us your feedback!

Bypass the Launcher for a single user

If you do not want the Launcher displayed every time you log into Splunk, you can configure a defaultapp to land in. This can be done on a per-user basis. For example, to make the Search app thedefault landing app for a user:

1. Create a file called user-prefs.conf in the user's local directory:

etc/users/<user>/user-prefs/local/user-prefs.conf

• For the admin user the file would be in:

etc/users/admin/user-prefs/local/user-prefs.conf

14 • For the test user, it would be in:

etc/users/test/user-prefs/local/user-prefs.conf

2. Put the following line in the user-prefs.conf file:

default_namespace = search

Bypass the Launcher for all users

You can specify a default app for all users to land in when they log in. For example, if you want theSearch app to be the global default, edit$SPLUNK_HOME/etc/apps/user-prefs/local/user-prefs.conf and specify:

[general_default] default_namespace = search

Note: Users who do not have permission to access the Search app will see an error.

What you get by default

Besides the Getting Started app, Splunk comes with the Search app and another app to support yourOS:

• The Search app interface provides the core functionality of Splunk and is designed for general-purpose use. If you've used Splunk before, the Search app replaces the main Splunk Web functionality from earlier versions. In the Search app you see a search bar and a dashboard full of graphs. When you are in the Search app, you change the dashboard or view by selecting new ones from the Dashboards and Views drop-down menus in the upper left of the window.

• The OS-specific app (Splunk for Windows or Splunk for *NIX) provides dashboards and pre-built searches to help you get the most out of Splunk on your particular platform. They are disabled by default, but you can turn them on from the apps section of Splunk Manager.

If you want to change the app you're in, select a new one from the App drop-down menu at the topright:

You can return to the Launcher later and select another app from there.

Get more apps

You can download a variety of other apps. For example, if the bulk of your data operations workinvolves tasks related to things like change management or PCI (Payment Card Industry)

15compliance, you'll be happy to know that Splunk has apps designed specifically for those applicationareas.

To find more apps, click the Browse More Apps tab in the Launcher. For more information, see"Where to get more apps and add-ons".

How saving and sharing Splunk knowledge relates to apps

Splunk knowledge includes objects like saved searches, event types, tags -- items that enrich yourSplunk data and make it easier to find what you need. In Splunk, these are known as knowledgeobjects.

Any user logged into Splunk Web can create and save knowledge objects to the user's directoryunder the app the user's "in" (assuming sufficient permissions). This is the default behavior --whenever a user saves an object, it goes into the user's directory in the currently running app.

Once the user has saved the object in a particular app, it is available to the user only in that app,unless they do one of the following things (and have the correct permissions to do so):

• Share the object with other specific roles or users in that same app • Promote the object so that it is available to all users who have access to that app • Promote the object so that it is available globally to all apps (and users)

Read more about App architecture and object ownership in this manual.

Where to get more apps and add-ons

Where to get more apps and add-ons

You can find new apps and add-ons on http://www.splunkbase.com.

All the apps and add-ons that are available on Splunkbase also show up in Launcher, so you candownload and install them directly within Splunk.

When you log into Splunk Web, you see the Launcher by default. You can always get back to theLauncher from the App menu in the upper right-hand corner of the main page of any Splunk-providedapp.

If you are connected to the internet

If your Splunk server or your client machine are connected to the internet, you can download appsdirectly from the Launcher:

1. From the Launcher, click on the Browse Apps tab. This will connect you to Splunkbase, whereyou can download apps and add-ons available for this version of Splunk.

2. Pick the app or add-on you want and select Install.

3. You will be prompted to log in with your splunk.com username and password (note that this is notyour Splunk username and password).

164. Your app or add-on will be installed. If it has a Web GUI component (most add-ons contain onlyknowledge objects like event type definitions and don't have any GUI context), you can navigate to itfrom the Launcher.

If you are not connected to the internet

If your Splunk server and client do not have internet connectivity, you must download apps fromSplunkBase and copy them over to your server:

1. From a computer connected to the internet, browse Splunkbase for the app or add-on you want.

2. Download the app or add-on.

3. Copy this app over to your Splunk server.

4. Put the app in your $SPLUNK_HOME/etc/apps directory.

5. Untar and ungzip your app or add-on, using a tool like tar -xvf. Note that Splunk apps arepackaged with a .SPL extension although they are just tarred and gzipped. You may need to forceyour tool to recognize this extension.

6. You may need to restart Splunk, depending on the contents of the app or add-on.

7. Your app or add-on is now installed and will be available from the Launcher (if it has a Web GUIcomponent).

Any user logged into Splunk Web can create and save knowledge objects to the user's directoryunder the app the user's "in" (assuming sufficient permissions). This is the default behavior --whenever a user saves an object, it goes into the user's directory in the currently running app. Theuser directory is located at $SPLUNK_HOME/etc/users/<user_name>/<app_name>/local.Once the user has saved the object in that app, it is available only to that user when they are in thatapp, unless they do one of the following:

• Promote the object, so that it is available to all users who have access to that app • Restrict the object to specific roles or users (still within the app's context) • Mark the object as globally available to all apps and users (unless you've explicitly restricted it by role/user)

Note: Users must have write permissions for an app before they can promote objects to the applevel.

17Promote and share Splunk knowledge

Users can share their Splunk knowledge objects with other users through the Permissions dialog.This means users who have read permissions in an app can see the shared objects and use them.For example, if a user shares a saved search, other users can see that saved search, but only withinthe app in which the search was created. So if you create a saved search in the app "Fflanda" andshare it, other users of Fflanda can see your saved search if they have read permission for Fflanda.

Users with write permission can promote their objects to the app level. This means the objects arecopied from their user directory to the app's directory -- from:

$SPLUNK_HOME/etc/users/<user_name>/<app_name>/local/

to:

$SPLUNK_HOME/etc/apps/<app_name>/local/

Users can do this only if they have write permission in the app.

Make Splunk knowledge objects globally available

Finally, upon promotion, users can decide if they want their object to be available globally, meaningall apps are able to see it. Again, the user must have permission to write to the original app. It'seasiest to do this from within Manager, but you can also do it later by moving the relevant object intothe desired directory.

To make globally available an object "A" (defined in "B.conf") that belongs to user "C" in app "D":

1. Move the stanza defining the object A from $SPLUNK_HOME/etc/users/C/D/B.conf into

$SPLUNK_HOME/etc/apps/D/local/B.conf.

2. Add a setting, export = system, to the object A's stanza in the app's local.meta file. If thestanza for that object doesn't already exist, you can just add one.

For example, to promote an event type called "rhallen" created by a user named "fflanda" in the *Nixapp so that it is globally available:

2. Add the following stanza:

Note: Adding the export = system setting to local.meta isn't necessary when you're sharingevent types from the Search app, because it exports all of its events globally by default.

18What objects does this apply to?

The knowledge objects discussed here are limited to those that are subject to access control. Theseobjects are also known as app-level objects and can be set in the App Configuration tab of SplunkManager. This page is available to all users to manage any objects they have created and shared.These objects include:

• Saved searches and Reports

• Event types • Views and dashboards • Field extractions

There are also system-level objects available only to users with admin privileges (or read/writepermissions on the specific objects). These are also managed through Splunk Manager. Theseobjects include:

Important: If you add an input, Splunk adds that input to the copy of inputs.conf that belongs tothe app you're currently in. This means that if you navigated to Splunk Manager directly from theLauncher, your input will be added to$SPLUNK_HOME/etc/apps/launcher/local/inputs.conf, which might not be the behavioryou desire.

App configuration and knowledge precedence

When you add knowledge to Splunk, it's added in the context of the app you're in when you add it.When Splunk is evaluating configurations and knowledge, it evaluates them in a specific order ofprecedence, so that you can control what knowledge definitions and configurations are used in whatcontext. Refer to About configuration files for more information about Splunk configuration files andthe order of precedence.

Manage app and add-on objects

Manage app and add-on objects

When an app or add-on is created by a Splunk user, a collection of objects is created that make upthe app or add-on. These objects can include views, commands, navigation items, event types,saved searches, reports, and more. Each of these objects have permissions associated with themto determine who can view or alter them. By default, the admin user has permissions to alter all theobjects in the Splunk system.

19Refer to these topics for more information:

• For an overview of apps, refer to "What are apps and add-ons?" in this manual. • For more information about app and add-on permissions, refer to "App architecture and object ownership" in this manual. • To learn more about how to create your own apps and add-ons, refer to the Developer Manual.

View and manage app/add-on objects in Manager

To see and control the objects for all the apps on your your system, use Splunk Manager in SplunkWeb. You can use Manager to view the objects in your Splunk deployment in the following ways:

• To see all the objects for all the apps/add-ons on your system at once: Manager > All configurations. • To see all the saved searches and report objects: Manager > Saved searches and reports. • To see all the event types: Manager > Event types. • To see all the field extractions: Manager > Field extractions. • To see all the Python search command scripts: Manager > Search commands.

You can:

• View and manipulate the objects on any page with the sorting arrows • Filter the view to see only the objects from a given app or add-on, owned by a particular user, or those that contain a certain string, with the App context bar:

Use the Search field on the App context bar to search for strings in fields. By default, Splunksearches for the string in all available fields. To search within a particular field, specify that field.Wildcards are supported.

• Editing configuration files directly.

All of these methods change the contents of the underlying configuration files.

To configure and manage distributed environments, you can use Splunk's deployment server.

Configuration files

Most of Splunk's configuration information is stored in .conf files. These files are located under yourSplunk installation directory (usually referred to in the documentation as $SPLUNK_HOME) under/etc/system. You can make changes to these files using a standard text editor. Before you beginediting configuration files, read the material in the topic called About configuration files.

Splunk Manager

You can perform most common configuration tasks with Splunk Manager in Splunk Web. Splunk Webruns by default on port 8000 of the host on which it is installed:

• If you're running Splunk on your local machine, the URL to access Splunk Web is http://localhost:8000. • If you're running Splunk on a remote machine, the URL to access Splunk Web is http://<hostname>:8000,

where <hostname> is the name of the machine Splunk is running on.

To access Splunk Manager, log into Splunk Web and click Manager in the upper right hand corner.

Splunk CLI

Many configuration options are available via the CLI. These options are documented in theirrespective topics, or you can get a complete CLI help reference with the command help whileSplunk is running:

./splunk help

For more information about the CLI, refer to "About the CLI" in this manual.

21Managing a distributed environment

The Splunk deployment server provides centralized management and configuration for distributedenvironments. You can use it to deploy sets of configuration files or other content to groups of Splunkinstances across the enterprise.

For information about managing deployments, refer to "Deploy to other Splunk instances" in thismanual.

Restarting after configuration changes

Many changes to configuration files require you to restart Splunk. Check the configuration file or itsreference topic to see whether a particular change requires a restart.

When you make changes in the Manager, the Manager will let you know if you have to restart.

These changes require additional or different actions before they will take effect:

• Enable configuration changes made to transforms.conf by typing the following search in

Splunk Web:

| extract reload=T

• Reload authentication.conf via the Manager > Authentication section of Splunk Web. This refreshes the authentication caches, but does not disconnect current users.

About Splunk Manager

About Splunk Manager

To configure Splunk from within Splunk Web, use Splunk Manager. To access Splunk Manager, loginto Splunk Web and click Manager in the upper right:

Users with admin privileges can access all the areas of the Manager. Other users have limitedaccess to the Manager.

Apps and knowledge

From the Apps and knowledge area, you can manage:

• Apps: Edit permissions for installed apps, create new apps, or browse Splunkbase for apps created by the community. • Searches and reports: View, edit, and set permissions on searches and reports. Set up alerts and summary indexing. • Event types: View, edit, and set permissions on event types. • Tags: Manage tags on field values. • Fields: View, edit, and set permissions on field extractions. Define event workflow actions and field aliases. Rename sourcetypes. • Lookups: Configure lookup tables and lookups. • User interface: Create and edit views, dashboards, and navigation menus. • Advanced search: Create and edit search macros. Set permissions on search commands. • All configurations: See all configurations across all apps.

About configuration files

About configuration files

Splunk's configuration information is stored in configuration files, identified by their .conf extension.These files are located under $SPLUNK_HOME/etc.

When you make a change to a configuration setting in Splunk Manager in Splunk Web, the changegets written to the relevant configuration file. This change is written to a copy of the configuration filein a directory under $SPLUNK_HOME/etc (the actual directory depends on a number of factors,discussed later), and the default value of the attribute is left alone in$SPLUNK_HOME/etc/system/default.

You can do a lot of configuration from the Manager, but for some more advanced customizations, youmust edit the configuration files directly.

The configuration directory structure

The following is the configuration directory structure that exists under $SPLUNK_HOME/etc:

• $SPLUNK_HOME/etc/system/default ♦ This contains the pre-configured configuration files. Do not modify the files in this directory. • $SPLUNK_HOME/etc/system/local ♦ Local changes on a site-wide basis go here; for example, settings you want to make available to all apps.

23 • $SPLUNK_HOME/etc/apps/<app_name>/local ♦ If you're in an app when a configuration change is made, the setting goes into a configuration file in the app's /local directory. ♦ For example, edits for search-time settings in the default Splunk search app go here: $SPLUNK_HOME/etc/apps/search/local/. ♦ If you want to edit a configuration file such that the change only applies to a certain app, copy the file to the app's /local directory and make your changes there. • $SPLUNK_HOME/etc/users ♦ User-specific configuration changes go here. • $SPLUNK_HOME/etc/system/README ♦ This directory contains supporting reference documentation. For most configuration files, there are two reference files: .spec and .example; for example, inputs.conf.spec and inputs.conf.example. The .spec file specifies the syntax, including a list of available attributes and variables. The .example files contain examples of real-world usage.

A single Splunk instance typically has multiple versions of some configuration files, across several ofthese directories. For example, you can have configuration files with the same names in your default,local, and app directories. This provides a layering effect that allows Splunk to determineconfiguration priorities based on factors such as the current user and the current app. Be sure toreview the topic "Configuration file precedence" to understand the precedence rules governingSplunk configuration files. That topic explains how Splunk determines which files have priority.

Note: The most accurate list of settings available for a given configuration file is in the .spec file forthat configuration file. You can find the latest version of the .spec and .example files in the"Configuration file reference", or in $SPLUNK_HOME/etc/system/README.

The default directory

When you edit a configuration file, you should not edit the version in$SPLUNK_HOME/etc/system/default. Instead, make a copy of the file and put it in anotherconfiguration directory. Since Splunk always looks at the default directory last, the edited version cango into any of the other available directories, according to whether the edit applies at the system, app,or user level. You can layer several versions of a configuration file on top of one-another, withdifferent attribute values filtering through and being used by Splunk as described in "Configuration fileprecedence", but for most deployments, you can just use the $SPLUNK_HOME/etc/system/localdirectory to make configuration changes.

Another reason not to edit the copies of the configuration files in

$SPLUNK_HOME/etc/system/default is that when you upgrade Splunk, all your changes will beoverwritten. Changes you make to files in other directories are not overwritten and will continue totake effect post-upgrade.

Important: Some configuration files are not created by default -- if you want to enable the featuresthey manage, you must create the configuration files from scratch. These configuration files still have.spec and .example files for you to review.

Splunk expects configuration files to be in ASCII/UTF-8. If you are editing or creating a configurationfile on an operating system that is non-UTF-8, you must ensure that the editor you are using isconfigured to save in ASCII/UTF-8.

The structure of configuration files

Configuration files consist of one or more stanzas, or sections. Each stanza begins with a stanzaheader, designated by square brackets. Following the header is a series of attribute/value pairs thatspecify configuration settings. Depending on the stanza type, some of the attributes might berequired, while others could be optional.

Here's the basic pattern:

[stanza1_header]<attribute1> = <val1><attribute2> = <val2>...

[stanza2_header]<attribute1> = <val1><attribute2> = <val2>...

Important: Attributes are case-sensitive. sourcetype = my_app is not the same as SOURCETYPE= my_app. One will work; the other won't.

Configuration files frequently have stanzas with varying scopes, with the more specific stanzas takingprecedence. For example, consider this example of an outputs.conf configuration file, used toconfigure forwarders:

This example file has three levels of stanzas:

• The global [tcpout], with settings that affect all tcp forwarding. • The more specific [tcpout:my_indexers], whose settings affect only the target group of indexers named "my_indexers" (whose members are defined within the stanza). • The most specific [tcpout-server://mysplunk_indexer1:9997], whose settings affect only one specific indexer in the target group.

The setting for compressed in [tcpout-server://mysplunk_indexer1:9997] overrides that

attribute's setting in [tcpout:my_indexers], for the indexer "mysplunk_indexer1" only.

25For more information on forwarders and outputs.conf, see Configure forwarders withoutputs.conf.

List of configuration files, and what's in them

The following is an up-to-date list of the available spec and example files associated with each conffile. Some conf files do not have spec or example files; contact Support before editing a conf file thatdoes not have an accompanying spec or example file.

Important: Do not edit the default copy of any conf file in

$SPLUNK_HOME/etc/system/default/. Make a copy of the file in$SPLUNK_HOME/etc/system/local/ or $SPLUNK_HOME/etc/apps/<app_name>/local andedit that copy.

This topic describes how Splunk sifts through its layers of configuration files to determine whichsettings to use. For general information about configuration files, read "About configuration files".

Order of precedence

Splunk uses configuration files to determine nearly every aspect of its behavior. Besides having manydifferent types of configuration files, a single Splunk instance can also have many copies of the sameconfiguration file, layered in directories categorized by user, app, and system. When consuming aconfiguration file, Splunk merges the settings from all copies of the file, using a location-basedprioritization scheme. When different copies have conflicting attribute values (that is, when they setthe same attribute to different values), Splunk uses the value from the file with the highest priority.

Splunk determines the priority of configuration files by their location in its directory structure. At themost general level, it prioritizes according to whether the file is located in a system, app, or userdirectory.

You need to understand how this works so that you can manage your Splunk configurationsintelligently. It's all pretty straightforward when you focus on context. Once you get a feel for thecontext in which Splunk is consuming a particular file, the way precedence works makes quite a bit of

27sense.

Note: Besides resolving configuration settings amongst multiple copies of a file, Splunk sometimesneeds to resolve settings within a single file. For information on how Splunk determines precedencewithin a single props.conf file, see "Attribute precedence within a single props.conf file".

The app/user context

In determining priority among copies of a configuration file, Splunk uses two different schemes ofdirectory precedence, according to whether that particular configuration relates to an activity with anapp/user context -- that is, where the current app and user matter. Some activities, like searching,take place in an app/user context; others, like indexing, take place in a global context, independent ofany app or user.

For instance, configuration files that determine indexing or monitoring behavior occur outside of theapp/user context; they are global in nature. At the time of data input or event indexing, it does notmatter which user or app might later access the data or event. The app/user context, on the otherhand, is vital to search-time processing, where certain knowledge objects or actions might be validonly for specific users in specific apps.

How context affects precedence order

When the context is global, where there's no app/user context, directory priority descends fromsystem/local to app to system/default:

• System local directory -- highest priority.

When consuming a global configuration, such as inputs.conf, Splunk first gets the attributes fromany copy of the file in system/local. Then it looks for any copies of the file located in the appdirectories, adding any attributes found in them, but ignoring attributes already discovered insystem/local. As a last resort, for any attributes not explicitly assigned at either the system or applevel, it assigns default values from the file in the system/default directory.

When there's an app/user context, directory priority descends from user to app to system:

• User directories -- highest priority.

• App directories for currently running app (local, followed by default). • App directories for all other apps (local, followed by default) -- for exported settings only. • System directories (local, followed by default) -- lowest priority.

An attribute in savedsearches.conf, for example, might be set at all three levels: the user, theapp, and the system. Splunk will always use the value of the user-level attribute, if any, in preferenceto a value for that same attribute set at the app or system level.

How app directory names affect precedence

Note: For most practical purposes, the information in this subsection probably won't matter, but itmight prove useful if you need to force a certain order of evaluation or for troubleshooting.

28To determine priority among the collection of apps directories, Splunk uses ASCII sort order. Files inan apps directory named "A" have a higher priority than files in an apps directory named "B", and soon. In addition, numbered directories have a higher priority than alphabetical directories and areevaluated in lexicographic, not numerical, order. For example, in descending order of precedence:

Note: When determining precedence in the app/user context, directories for the currently runningapp take priority over those for all other apps, independent of how they're named. Furthermore, otherapps are only examined for exported settings.

Summary of directory precedence

Putting this all together, the order of directory priority, from highest to lowest, goes like this:

Important: In the app/user context, all configuration files for the currently running app take priorityover files from all other apps. This is true for the app's local and default directories. So, if the currentcontext is app C, Splunk evaluates both $SPLUNK_HOME/etc/apps/C/local/* and$SPLUNK_HOME/etc/apps/C/default/* before evaluating the local or default directories for any

29other apps. Furthermore, Splunk only looks at configuration data for other apps if that data has beenexported globally through the app's default.meta file, as described in this topic on setting apppermissions.

Also, note that /etc/users/ is evaluated only when the particular user logs in or performs a search.

Example of how attribute precedence works

This example of attribute precedence uses props.conf. The props.conf file is unusual, becauseits context can be either global or app/user, depending on when Splunk is evaluating it. Splunkevaluates props.conf at both index time (global) and search time (apps/user).

Assume $SPLUNK_HOME/etc/system/local/props.conf contains this stanza:

[source::/opt/Locke/Logs/error*]sourcetype = fatal-error

and $SPLUNK_HOME/etc/apps/t2rss/local/props.conf contains another version of the

The line merging attribute assignments in t2rss always apply, as they only occur in that version ofthe file. However, there's a conflict with the sourcetype attribute. In the /system/local version,the sourcetype has a value of "fatal-error". In the /apps/t2rss/local version, it has a value of"t2rss-error".

Since this is a sourcetype assignment, which gets applied at index time, Splunk uses the globalcontext for determining directory precedence. In the global context, Splunk gives highest priority toattribute assignments in system/local. Thus, the sourcetype attribute gets assigned a value of"fatal-error".

List of configuration files and their context

As mentioned, Splunk decides how to evaluate a configuration file based on the context that the fileoperates within, global or app/user. Generally speaking, files that affect data input, indexing, ordeployment activities are global; files that affect search activities usually have a app/user context.

The props.conf and transforms.conf files can be evaluated in either a app/user or a globalcontext, depending on whether Splunk is using them at index or search time.

30Global configuration files

admon.confauthentication.confauthorize.confcrawl.confdeploymentclient.confdistsearch.confindexes.confinputs.confoutputs.confpdf_server.confprocmonfilters.confprops.conf -- global and app/user contextpubsub.confregmonfilters.confreport_server.confrestmap.confsearchbnf.confsegmenters.confserver.confserverclass.confserverclass.seed.xml.confsource-classifier.confsourcetypes.confsysmon.conftenants.conftransforms.conf -- global and app/user contextuser_seed.conf -- special case: Must be located in /system/defaultweb.confwmi.conf

Splunk's configuration file system supports many overlapping configuration files in many differentlocations. The price of this level of flexibility is that figuring out which value for which configurationoption is being used in your Splunk installation can sometimes be quite complex. If you're looking forsome tips on figuring out what configuration setting is being used in a given situation, check out"Troubleshooting configurations" in this manual.

Attribute precedence within a single props.conf file

Attribute precedence within a single props.conf file

In addition to understanding how attribute precedence works across files, you also sometimes needto consider attribute priority within a single props.conf file.

Precedence within sets of stanzas affecting the same target

When two or more stanzas specify a behavior that affects the same item, items are evaluated by thestanzas' ASCII order. For example, assume you specify in props.conf the following stanzas:

[source::.../bar/baz]attr = val1

[source::.../bar/*]attr = val2

The second stanza's value for attr will be used, because its path is higher in the ASCII order andtakes precedence.

Overriding default attribute priority in props.conf

There's a way to override the default ASCII priority in props.conf. Use the priority key tospecify a higher or lower priority for a given stanza.

For example, suppose we have a source:

source::az

and the following patterns:

[source::...a...] sourcetype = a

[source::...z...] sourcetype = z

In this case, the default behavior is that the settings provided by the pattern "source::...a..." takeprecedence over those provided by "source::...z...". Thus, sourcetype will have the value "a".

To override this default ASCII ordering, use the priority key:

[source::...a...]

32 sourcetype = a priority = 5

[source::...z...] sourcetype = z priority = 10

Assigning a higher priority to the second stanza causes sourcetype to have the value "z".

There's another attribute precedence issue to consider. By default, stanzas that match a stringliterally ("literal-matching stanzas") take precedence over regex pattern-matching stanzas. This is dueto the default values of their priority keys:

• 0 is the default for pattern-matching stanzas

• 100 is the default for literal-matching stanzas

So, literal-matching stanzas will always take precedence over pattern-matching stanzas, unless youchange that behavior by explicitly setting their priority keys.

You can use the priority key to resolve collisions between patterns of the same type, such assourcetype patterns or host patterns. The priority key does not, however, affect precedenceacross spec types. For example, source patterns take priority over host and sourcetype patterns,regardless of priority key values.

Precedence for events with multiple attribute assignments

The props.conf file sets attributes for processing individual events by host, source, or sourcetype(and sometimes event type). So it's possible for one event to have the same attribute set differentlyfor the default fields: host, source or sourcetype. The precedence order is:

• source • host • sourcetype

You might want to override the default props.conf settings. For example, assume you are tailingmylogfile.xml, which by default is labeled sourcetype = xml_file. This configuration willre-index the entire file whenever it changes, even if you manually specify another sourcetype,because the property is set by source. To override this, add the explicit configuration by source:

[source::/var/log/mylogfile.xml]CHECK_METHOD = endpoint_md5

33Indexing with SplunkWhat's a Splunk index?What's a Splunk index?

The index is the repository for Splunk data. While processing incoming data, Splunk transforms theraw data into events, which it stores in indexes.

Indexes reside in flat files in a datastore on your file system. Splunk manages its index files tofacilitate flexible searching and fast data retrieval, eventually archiving them according to auser-configurable schedule. Splunk handles everything with flat files; it doesn't require any third-partydatabase software running in the background.

During indexing, Splunk processes incoming raw data to enable fast search and analysis, storing theresult in an index. As part of the indexing process, Splunk adds knowledge to the data in variousways, including by:

To start the indexing process, simply specify the data inputs, using Splunk Web, the CLI, or theinputs.conf file. You can add additional inputs at any time, and Splunk will begin indexing them aswell. See Add data and configure inputs in this manual.

Splunk, by default, puts all user data into a single, preconfigured index. It also employs several otherindexes for internal purposes. You can add new indexes and manage existing ones to meet your datarequirements. See Manage indexes in this manual.

How indexing works

How indexing works

Splunk can index any type of time-series data (data with timestamps). When Splunk indexes data, itbreaks it into events, based on its timestamps.

Event processing

Event processing occurs in two stages, parsing and indexing. All data that comes into Splunk entersthrough the parsing pipeline as large (10,000 bytes) chunks. During parsing, Splunk breaks thesechunks into events which it hands off to the indexing pipeline, where final processing occurs.

While parsing, Splunk performs a number of actions, including:

34 • Extracting sets of default fields for each event, including host, source, and sourcetype. • Configuring character set encoding. • Identifying line termination using linebreaking rules. While many events are short and only take up a line or two, others can be long. • Identifying timestamps or creating them if they don't exist. At the same time that it processes timestamps, Splunk identifies event boundaries. • Splunk can be set up to mask sensitive event data (such as credit card or social security numbers) during the indexing process. It can also be configured to apply custom metadata to incoming events.

• Breaking all events into segments that can then be searched upon. You can determine the level of segmentation, which affects indexing and searching speed, search capability, and efficiency of disk compression. • Building the index data structures. • Writing the raw data and index files to disk, where post-indexing compression occurs.

The breakdown between parsing and indexing pipelines is mainly of relevance for forwarders, whichcan parse, but not index, data.

For more information about events and what happens to them during the indexing process, seeOverview of event processing in this manual.

Note: Indexing is an I/O-intensive process.

This diagram shows the main processes inherent in indexing:

35What's in an index?

Splunk stores all of the data it processes in indexes. An index is a collection of databases, which aredirectories located in $SPLUNK_HOME/var/lib/splunk. A database directory is nameddb_<starttime>_<endtime>_<seq_num>.

Have questions? Visit Splunk Answers and see what questions and answers the Splunk communityhas around indexing.

36Index time versus search timeIndex time versus search time

Splunk documentation includes many references to the terms "index time" and "search time." Theseterms distinguish between the sorts of event data that are processed by Splunk during indexing, andother kinds of event data that are processed when a search is run.

It is important to consider this distinction when administering Splunk. For example, if you haven't yetstarted indexing data and you think you're going to have a lot of custom source types and hosts, youmight want to get those in place before you start indexing. You can do this by defining custom sourcetypes and hosts (through rule-based source type assignation, source type overriding, input-basedhost assignment, and host overrides), so that these things are handled during the indexing process.

On the other hand, if you have already begun to index your data, you might want to handle the issueat search time. Otherwise, you will need to re-index your data, in order to apply the custom sourcetypes and hosts to your existing data as well as new data. After indexing, you can't change the hostor source type assignments, but you can tag them with alternate values and manage the issue thatway.

As a general rule, it is better to perform most knowledge-building activities, such as field extraction, atsearch time. Additional, custom field extraction, performed at index time, can degrade performance atboth index time and search time. When you add to the number of fields extracted during indexing, theindexing process slows. Later, searches on the index are also slower, because the index has beenenlarged by the additional fields, and a search on a larger index takes longer. You can avoid suchperformance issues by instead relying on search-time field extraction. For details on search-time fieldextraction, see "About fields" and "Create search-time field extractions" in the Knowledge Managermanual.

At index time

Index-time processes take place just before event data is actually indexed.

Advanced indexing strategy

Advanced indexing strategy

In a single-machine deployment consisting of just one Splunk instance, the Splunk indexer alsohandles data input and search requests. However, for mid-to-enterprise scale needs, indexing istypically split out from the data input function and sometimes from the search function as well. Inthese larger, distributed deployments, the Splunk indexer might reside on its own machine andhandle only indexing.

For instance, you can have a set of Windows and Linux machines generating interesting events,which need to go to a central Splunk indexer for consolidation. Usually the best way to do this is toinstall a lightweight instance of Splunk, known as a forwarder, on each of the event-generatingmachines. These forwarders handle data input and send the data across the network to the Splunkindexer residing on its own machine.

Similarly, in cases where you have a large amount of indexed data and numerous concurrent userssearching on it, it can make sense to split off the search function from indexing. In this type ofscenario, known as distributed search, one or more search heads distribute search requestsacross multiple indexers.

To manage a distributed deployment, Splunk's deployment server lets you push out configurationsand content to sets of Splunk instances, grouped according to any arbitrary criteria, such as OS,machine type, application area, location, and so on.

While the fundamental issues of indexing and event processing remain the same for distributeddeployments, it is important to take into account deployment needs when planning your indexingstrategy.

Forward data to an indexer

This type of deployment involves the use of forwarders, which are Splunk instances that receive datainputs and then consolidate and send the data to a Splunk indexer. Forwarders come in two flavors:

• Regular forwarders. These retain most of the functionality of a full Splunk instance. They can parse data before forwarding it to the receiving indexer, also known as the receiver. (See How indexing works for the distinction between parsing and indexing.) They can also retain indexed data locally, while forwarding the parsed data to the receiver for final indexing on that machine as well. • Light forwarders. These forwarders maintain a small footprint on their host machine. They perform minimal processing on the incoming data streams before forwarding them on to the

38 receiving indexer.

Both types of forwarders tag data with metadata such as host, source, and source type, beforeforwarding it on to the indexer.

Forwarders allow you to use resources efficiently while processing large quantities or disparate typesof data. They also feature in a number of interesting use case scenarios, to handle key needs suchas load balancing, data cloning for enhanced availability, and data filtering and routing.

For an extended discussion of forwarders, including configuration and detailed use cases, see Aboutforwarding and receiving.

Search across multiple indexers

In distributed search, Splunk servers send search requests to other Splunk servers and merge theresults back to the user. This is useful for a number of purposes, including horizontal scaling, accesscontrol, and managing geo-dispersed data.

The Splunk instance that manages search requests is called the search head. The instances thatmaintain the indexes and perform the actual searching are called search peers or indexer nodes.

For an extended discussion of distributed search, including configuration and detailed use cases, seeWhat is distributed search?.

Manage distributed deployments

When dealing with distributed deployments consisting potentially of multiple forwarders, indexers, andsearch heads, the Splunk deployment server greatly eases the process of configuring and updatingall Splunk instances. With the deployment server, you can group the distributed Splunk instances(referred to as deployment clients in this context) into server classes.

A server class is a set of Splunk instances that share configurations. Server classes are typicallygrouped by OS, machine type, application area, location, or other useful criteria. A single deploymentclient can belong to multiple server classes, so a Linux forwarder residing in the UK, for example,might belong to a Linux server class and a UK server class, and receive configuration settingsappropriate to each.

For an extended discussion of deployment management, see About deployment server.

39Add data and configure inputsHow to get your data into SplunkHow to get your data into Splunk

No special plugins are required for Splunk to index data any network or local source. Splunk canindex any IT data from any source in real time. We call this universal indexing.

Splunk consumes any data you point it at. Before indexing data, you must add your data source asan input. The source is then listed as one of Splunk's default fields (whether it's a file, directory ornetwork port).

Important: If you add an input, Splunk adds that input to a copy of inputs.conf that belongs to theapp you're currently in. This means that if you navigated to Splunk Manager directly from theLauncher and then added an input there, your input will be added to$SPLUNK_HOME/etc/apps/launcher/local/inputs.conf. Make sure you're in the desiredapp when you add your inputs.

Ways to get data into Splunk

Specify data inputs via the following methods:

• The Manager in Splunk Web.

You can add most data sources using Splunk Web. For more extensive configuration options, useinputs.conf. Changes you make using Splunk Web or the Splunk CLI are written to$SPLUNK_HOME/etc/system/local/inputs.conf.

Sources

Splunk accepts data inputs in a variety of ways. Here's a basic overview of your options.

Files and directories

A lot of the data you may be interested comes directly from files and directories. For the most part,you can use Splunk's files and directories monitor input processor to index data in files anddirectories.

40You can also configure Splunk's file system change monitor to watch for changes in your filesystem. However, you shouldn't use both the file and directories monitor and the file system changemonitor to follow the same directory or file. If you want to see changes in a directory, use the filesystem change monitor. If you want to index new events in a directory, use the file and directoriesmonitor.

To monitor files and directories, see "Monitor files and directories".

To enable and configure the file system change monitor, see "Monitor changes to your file system".

TCP network ports

TCP is a reliable, connection-oriented protocol that should be used instead of UDP to transmit andreceive data whenever possible. Splunk with an Enterprise license can receive data on any TCP port,allowing Splunk to receive remote data from syslog-ng and any other application that transmits viaTCP.

To monitor data via TCP, see "Monitor network ports".

• It doesn't enforce delivery.

• It's not encrypted. • There's no accounting for lost datagrams.

Refer to "Working with UDP connections" on the Splunk Community Wiki for recommendations if youmust use UDP.

Windows sources

Splunk on Windows ships with the Windows inputs, well as pages in Splunk Manager for defining theWindows-specific input types listed below. Because of compatibility issues, you will not seeWindows-specific inputs or Splunk Manager pages on non-Windows Splunk instances.

Splunk on Windows can add data from these Window-specific sources:

• Windows Event Log data

• Windows Registry data • WMI data • Active Directory data

Important: You can index and search your Windows data on a non-Windows instance of Splunk, butyou must first use a Windows instance of Splunk to gather the Windows data. You can easily do thisby means of a Splunk forwarder running on Windows, configured to gather Windows inputs and thenforward the data to the non-Windows instance of Splunk where searching and indexing will takeplace. See "Considerations for deciding how to monitor remote Windows data" for the details.

Need some help deciding how to get your Windows data into Splunk? Check out "Considerations fordeciding how to monitor remote Windows data" in this manual.

41Got custom data? It might need some extra TLC

Splunk can index any time-series data, usually without the need for additional configuration. If you'vegot logs from a custom application or device, you should try Splunk's defaults first. But if you're notgetting the results you want, you can tweak a bunch of different things to make sure your events areindexed correctly.

We recommend you learn a bit about event processing and how Splunk indexes data beforeproceeding so you can make informed decisions about what TLC your data needs. Some optionsinclude:

• Are your events multi-line?

• Is your data in an unusual character set? • Is Splunk not figuring out the timestamps correctly?

Not finding the events you're looking for?

When you add an input to Splunk, that input gets added relative to the app you're in. Some apps, likethe *nix and Windows apps that ship with Splunk, write input data to a specific index (in the case of*nix and Windows, that is the 'os' index). If you're not finding data that you're certain is in Splunk, besure that you're looking at the right index. You may want to add the 'os' index to the list of defaultindexes for the role you're using. For more information about roles, refer to the topic about roles inthis manual.

Note: Splunk looks for the inputs it is configured to monitor every 24 hours starting from the time itwas last restarted. This means that if you add a stanza to monitor a directory or file that doesn't existyet, it could take up to 24 hours for Splunk to start indexing its contents. To ensure that your input isimmediately recognized and indexed, add the input using Splunk Web or by using the add commandin the CLI.

Specify input paths with wildcards

A wildcard is a character that you can substitute for one or more unspecified characters whensearching text or selecting multiple files or directories. In Splunk, you can use wildcards to specifyyour input path for monitored input; use ... for paths and * for files.

RegexWildcard Description Example(s) equivalent /foo/.../bar matches the files The ellipsis wildcard /foo/bar, /foo/1/bar, /foo/1/2/bar,... recurses through directories .* etc. and subdirectories to match. Note: This only works if bar is a file.

42 The asterisk wildcard matches anything in that specific directory path /foo/*.log matches all files with the .log segment. extension, such as /foo/bar.log. It does* [^/]* not match /foo/bar.txt or Note: It cannot be used /foo/bar/test.log. inside a directory path; must be used in the last segment of the path.Note: A single dot (.) is not a wildcard, and is the regex equivalent of \..

For more specific matches, combine the ... and * wildcards. For example, /foo/.../bar/*matches any file in the /bar directory within the specified path.

Input examples

To load anything in /apache/foo/logs or /apache/bar/logs, etc.

[monitor:///apache/.../logs]

To load anything in /apache/ that ends in .log.

[monitor:///apache/*.log]

Wildcards and whitelisting

Important: In Splunk, whitelists and blacklists are defined with standard PCRE REGEX syntax(unlike the file input path syntax described in the previous sections).

Specifying wildcards results in an implicit whitelist created for that stanza. The longest fullyqualified path is used as the monitor stanza, and the wildcards are translated into regularexpressions, as described in the table above.

Note: In Windows, whitelist and blacklist rules do not support regexes that includeblackslashes; you must use two backslashes \\ to escape wildcards.

Additionally, the converted expression is anchored to the right end of the file path, so that the entirepath must be matched.

For example, if you specify

[monitor:///foo/bar*.log]

Splunk translates this into

[monitor:///foo/]whitelist = bar[^/]*\.log$

As a consequence, you can't have multiple stanzas with wildcards for files in the samedirectory. If you have multiple inputs that only disambiguate after a wildcard, they will collide.

43Also, you cannot use a whitelist declaration in conjunction with wildcards.

For example:

[monitor:///foo/bar_baz*][monitor:///foo/bar_qux*]

This results in overlapping stanzas indexing the directory /foo/. Splunk takes the first one, so onlyfiles starting with /foo/bar_baz will be indexed. To include both sources, manually specify awhitelist using regular expression syntax for "or":

[monitor:///foo]whitelist = (bar_baz[^/]*|bar_qux[^/]*)$

Note: To set any additional attributes (such as sourcetype) for multiple whitelisted/blacklisted inputsthat may have different attributes, use props.conf.

Monitor files and directories

Monitor files and directories

Splunk has two file input processors: monitor and upload. For the most part, you can use monitor toadd all your data sources from files and directories. However, you may want to use upload when youwant to add one-time inputs, such as an archive of historical data.

This topic discusses how to add monitor and upload inputs using Splunk Web and the configurationfiles. You can also add, edit, and list monitor inputs using the CLI; for more information, read thistopic.

How monitor works in Splunk

Specify a path to a file or directory and Splunk's monitor processor consumes any new input. This ishow you'd monitor live application logs such as those coming from J2EE or .Net applications, Webaccess logs, and so on. Splunk will continue to index the data in this file or directory as it comes in.You can also specify a mounted or shared directory, including network filesystems, as long as theSplunk server can read from the directory. If the specified directory contains subdirectories, Splunkrecursively examines them for new files.

Splunk checks for the file or directory specified in a monitor configuration on Splunk server start andrestart. If the file or directory specified is not present on start, Splunk checks for it again in 24 hourintervals from the time of the last restart. Subdirectories of monitored directories are scannedcontinuously. To add new inputs without restarting Splunk, use Splunk Web or the command lineinterface. If you want Splunk to find potential new inputs automatically, use crawl.

When using monitor, note the following:

• On most file systems, files can be read even as they are being written to. However, Windows file systems have the ability to prevent files from being read while they are being written, and some Windows programs may use these modes, though most do not. • Files or directories can be included or excluded via whitelists and blacklists.

44 • Upon restart, Splunk continues processing files where it left off. • Splunk decompresses archive files before it indexes them. It can handle these common archive file types: .tar, .gz, .bz2, .tar.bz2 , and .zip. • Splunk detects log file rotation and does not process renamed files it has already indexed (with the exception of .tar and .gz archives; for more information see "Log file rotation" in this manual). • The entire dir/filename path must not exceed 1024 characters. • Set the source type for directories to Automatic. If the directory contains multiple files of different formats, do not set a value for the source type manually. Manually setting a source type forces a single source type for all files in that directory. • Removing an input does not stop the the input's files from being indexed. Rather, it stops files from being checked again, but all the initial content will be indexed. To stop all in-process data, you must restart the Splunk server.

Note: You cannot currently use both monitor and file system change monitor to follow the samedirectory or file. If you want to see changes in a directory, use file system change monitor. If you wantto index new events in a directory, use monitor.

Note: Monitor input stanzas may not overlap. That is, monitoring /a/path while also monitoring/a/path/subdir will produce unreliable results. Similarly, monitor input stanzas that watch thesame directory with different whitelists, blacklists, and wildcard components are not supported.

Why use upload or batch

Use the Upload a local file or Index a file on the Splunk server options to index a static file onetime. The file will not be monitored on an ongoing basis.

Use the batch input type in inputs.conf to load files once and destructively. By default, Splunk'sbatch processor is located in $SPLUNK_HOME/var/spool/splunk. If you move a file into thisdirectory, Splunk indexes it and then deletes it.

Note: For best practices on loading file archives, see "How to index different sized archives" on theCommunity Wiki.

Configure with Splunk Web

Add inputs from files and directories via Splunk Web.

1. Click Manager in the upper right-hand corner of Splunk Web.

2. Under System configurations, click Data Inputs.

3. Click Files and directories.

4. Click Add new to add an input.

5. Select a Source radio button:

• Monitor a file or directory. Sets up an ongoing input. Whenever data is added to this file or directory, Splunk will index it. See the next section for advanced options specific to this choice.

45 • Upload a local file. Uploads a file from your local machine into Splunk. • Index a file on the Splunk server. Copies a file on the server into Splunk via the batch directory.

6. Specify the Full path to the file or directory.

\\<myhost>\<mypath> on Windows). Make sure Splunk has read access to the mounted drive, aswell as to the files you wish to monitor.

7. Under the Host section, set the host name value. You have several choices for this setting. Learnmore about setting the host value in "About default fields".

Note: Host only sets the host field. It does not direct Splunk to look on a specific host on yournetwork.

8. Set the Source type. Source type is a default field added to events. Source type is used todetermine processing characteristics such as timestamps and event boundaries.

9. Set the Index. Leave the value as "default", unless you have defined multiple indexes to handledifferent types of events. In addition to indexes for user data, Splunk has a number of utility indexes,which show up in the dropdown box.

10. Click Save.

Advanced options for file/directory monitoring

If your choice for source is Monitor a file or directory, the page includes an Advanced Optionssection, which allows you to configure some additional settings:

• Follow tail. If checked, monitoring begins at the end of the file (like tail -f). • Whitelist. If a path is specified, files from that path are monitored only if they match the specified regex. • Blacklist. If a path is specified, files from that path are not monitored if they match the specified regex.

For detailed information on whitelists and blacklists, see Whitelist or blacklist specific incoming data inthis manual.

Configure with inputs.conf

To add an input, add a stanza to inputs.conf in $SPLUNK_HOME/etc/system/local/, or your own

custom application directory in $SPLUNK_HOME/etc/apps/. If you have not worked with Splunk'sconfiguration files before, read "About configuration files" before you begin.

You can set multiple attributes in an input stanza. If you do not specify a value for an attribute, Splunkuses the default that's preset in $SPLUNK_HOME/etc/system/default/.

Note: To ensure that new events are indexed when you copy over an existing file with new contents,set CHECK_METHOD = modtime in props.conf for the source. This checks the modtime of the file

46and re-indexes it when it changes. Be aware that the entire file will be re-indexed, which can result induplicate events.

Configuration settings

The following are options that you can use in both monitor and batch input stanzas. See thesections that follow for attributes that are specific to each type of input.

host = <string>

• Set the host value of your input to a static value.

• "host=" is automatically prepended to <string>. • Defaults to the IP address or fully qualified domain name of the host where the data originated.

index = <string>

• Set the index where events from this input will be stored. • "index=" is automatically prepended to <string>. • Defaults to main, or whatever you have set as your default index. • For more information about the index field, see "How indexing works" in this manual.

sourcetype = <string>

• Set the sourcetype name of events from this input.

• "sourcetype=" is automatically prepended to <string>. • Splunk picks a sourcetype based on various aspects of your data. There is no hard-coded default. • For more information about the sourcetype field, see "About default fields (host, source, sourcetype, and more)", in this manual.

source = <string>

• Set the source name of events from this input.

• Defaults to the file path. • "source=" is automatically prepended to <string>.

queue = parsingQueue | indexQueue

• Specifies where the input processor should deposit the events that it reads. • Set to "parsingQueue" to apply props.conf and other parsing rules to your data. • Set to "indexQueue" to send your data directly into the index. • Defaults to parsingQueue.

_TCP_ROUTING = <tcpout_group_name>,<tcpout_group_name>,...

• Specifies a comma-separated list of tcpout group names.

• Using this attribute, you can selectively forward your data to specific indexer(s) by specifying the tcpout group(s) that the forwarder should use when forwarding your data. • The tcpout group names are defined in outputs.conf in

47 [tcpout:<tcpout_group_name>] stanzas. • This setting defaults to the groups present in 'defaultGroup' in [tcpout] stanza in outputs.conf.

host_regex = <regular expression>

• If specified, the regex extracts host from the filename of each input. • Specifically, the first group of the regex is used as the host. • Defaults to the default "host =" attribute, if the regex fails to match.

host_segment = <integer>

• If specified, a segment of the path is set as host, using <integer> to determine which segment. For example, if host_segment = 2, host is set to the second segment of the path. Path segments are separated by the '/' character. • Defaults to the default "host =" attribute, if the value is not an integer, or is less than 1.

Monitor syntax and examples

Monitor input stanzas direct Splunk to watch all files in the <path> (or just <path> itself if itrepresents a single file). You must specify the input type and then the path, so put three slashes inyour path if you're starting at root. You can use wildcards for the path. For more information, readhow to "Specify input paths with wildcards".

[monitor://<path>]<attrbute1> = <val1><attrbute2> = <val2>...

The following are additional attributes you can use when defining monitor input stanzas:

crcSalt = <string>

• If set, this string is added to the CRC.

• Use this setting to force Splunk to consume files that have matching CRCs. • If set to crcSalt = <SOURCE> (note: This setting is case sensitive), then the full source path is added to the CRC.

followTail = 0|1

• If set to 1, monitoring begins at the end of the file (like tail -f). • This only applies to files the first time they are picked up. • After that, Splunk's internal file position records keep track of the file.

whitelist = <regular expression>

• If set, files from this path are monitored only if they match the specified regex.

blacklist = <regular expression>

48 • If set, files from this path are NOT monitored if they match the specified regex.

alwaysOpenFile = 0 | 1

• If set to 1, Splunk opens a file to check if it has already been indexed.

• Only useful for files that don't update modtime. • Should only be used for monitoring files on Windows, and mostly for IIS logs. • Note: This flag should only be used as a last resort, as it increases load and slows down indexing.

time_before_close = <integer>

• Modtime delta required before Splunk can close a file on EOF.

• Tells the system not to close files that have been updated in past <integer> seconds. • Defaults to 3.

recursive = true|false

• If set to false, Splunk will not go into subdirectories found within a monitored directory. • Defaults to true.

followSymlink

• If false, Splunk will ignore symbolic links found within a monitored directory. • Defaults to true.

Example 1. To load anything in /apache/foo/logs or /apache/bar/logs, etc.

[monitor:///apache/.../logs]

Example 2. To load anything in /apache/ that ends in .log.

[monitor:///apache/*.log]

Batch syntax and examples

Use batch to set up a one time, destructive input of data from a source. For continuous,non-destructive inputs, use monitor. Remember, after the batch input is indexed, Splunk deletes thefile.

Important: When defining batch inputs, you must include the setting, move_policy = sinkhole.This loads the file destructively. Do not use this input type for files you do not want to consumedestructively.

Note: source = <string> and <KEY> = <string> are not used by batch.

49Example: This example batch loads all files from the directory /system/flight815/.

[batch://system/flight815/*]move_policy = sinkhole

Monitor files and directories using the CLI

Monitor files and directories using the CLI

Monitor files and directories via Splunk's Command Line Interface (CLI). To use Splunk's CLI,navigate to the $SPLUNK_HOME/bin/ directory and use the ./splunk command from the UNIX orWindows command prompt.

If you get stuck, Splunk's CLI has built-in help. Access the main CLI help by typing splunk help.Individual commands have their own help pages as well -- type splunk help <command>.

CLI commands for input configuration

The following commands are available for input configuration via the CLI:

Note: You can only set one -hostname, -hostregex or -hostsegmentnum per command.

Parameter Required? Description

source Required Path to the file or directory to monitor for new input.sourcetype Optional Specify a sourcetype field value for events from the input source.index Optional Specify the destination index for events from the input source. Specify a host name to set as the host field value for events fromhostname Optional the input source. Specify a regular expression on the source file path to set as thehostregex Optional host field value for events from the input source. Set the number of segments of the source file path to set as thehostsegmentnum Optional host field value for events from the input source.

50 (T/F) True or False. Default False. When set to True, Splunk willfollow-only Optional read from the end of the source (like the "tail -f" Unix command).Example 1. monitor files in a directory

The following example shows how to monitor files in /var/log/:

Add /var/log/ as a data input:

./splunk add monitor /var/log/

Example 2. monitor windowsupdate.log

The following example shows how to monitor the Windows Update log (where Windows logsautomatic updates):

Add C:\Windows\windowsupdate.log as a data input:

.\splunk add monitor C:\Windows\windowsupdate.log

Example 3. monitor IIS logging

This example shows how to monitor the default location for Windows IIS logging: AddC:\windows\system32\LogFiles\W3SVC as a data input:

.\splunk add monitor c:\windows\system32\LogFiles\W3SVC

Monitor network ports

Monitor network ports

You can enable Splunk to accept an input on any TCP or UDP port. Splunk consumes any data senton these ports. Use this method for syslog (default port is UDP 514) or set up netcat and bind to aport.

TCP is the protocol underlying Splunk's data distribution, which is the recommended method forsending data from any remote machine to your Splunk server. Note that the user you run Splunk asmust have access to the port. On a Unix system you must run as root to access a port under 1024.

Add a network input using Splunk Web

Add inputs from network ports via Splunk Web.

1. Click Manager in the upper right-hand corner of Splunk Web.

2. Under System configurations click Data inputs.

3. Pick TCP or UDP.

514. Click Add new to add an input.

5. Enter a port number.

6. If this is a TCP input, you can specify whether this port should accept connections from all hosts orone host. If you specify one host, enter the IP address of the host.

7. Enter a new Source name to override the default source value, if necessary.

Important: Consult Splunk support before changing this value.

8. Set the Host by selecting a radio button:

• IP. Sets the input processor to rewrite the host with the IP address of the remote server.

• DNS. Sets the host to the DNS entry of the remote server.

• Custom. Sets the host to a user-defined label.

9. Now set the Source type.

Source type is a default field added to events. Source type is used to determine processingcharacteristics such as timestamps and event boundaries. Choose:

• From List. Select one of the predefined source types from the drop-down list. • Manual. Label your own source type in the text box.

10. Set the Index. Leave the value as "default" unless you have defined multiple indexes to handledifferent types of events. In addition to indexes meant for user data, Splunk has a number of utilityindexes, which show up in the dropdown box.

11. Click Save.

Add a network input using the CLI

Monitor files and directories via Splunk's Command Line Interface (CLI). To use Splunk's CLI,navigate to the $SPLUNK_HOME/bin/ directory and use the ./splunk command.

If you get stuck, Splunk's CLI has built-in help. Access the main CLI help by typing splunk help.Individual commands have their own help pages as well -- type splunk help <command>.

The following commands are available for input configuration via the CLI:

Parameter Required? Description

$SOURCE Require Port number to listen for data to index.sourcetype Optional Specify a sourcetype field value for events from the input source.index Optional Specify the destination index for events from the input source. Specify a host name to set as the host field value for events from the inputhostname Optional source.remotehost Optional Specify an IP address to exclusively accept data from. Set True of False (T | F). Default is False. Set True to use DNS to set theresolvehost Optional host field value for events from the input source.Example

Configure a network input, then set the source type:

• Configure a UDP input to watch port 514 and set the source type to "syslog".

Check the Best Practices Wiki for information about the best practices for using UDP whenconfiguring Syslog input.

./splunk add udp 514 -sourcetype syslog

• Set the UDP input's host value via DNS. Use auth with your username and password.

./splunk edit udp 514 -resolvehost true -auth admin:changeme

Note: Splunk must be running as root to watch ports under 1024.

Add a network input using inputs.conf

To add an input, add a stanza for it to inputs.conf in $SPLUNK_HOME/etc/system/local/, or your

own custom application directory in $SPLUNK_HOME/etc/apps/. If you have not worked withSplunk's configuration files before, read "About configuration files" in this manual before you begin.

You can set any number of attributes and values following an input type. If you do not specify a valuefor one or more attributes, Splunk uses the defaults that are preset in$SPLUNK_HOME/etc/system/default/ (noted below).

This type of input stanza tells Splunk to listen to <remote server> on <port>. If <remote server> isblank, Splunk listens to all connections on the specified port.

host = <string>

• Set the host value of your input to a static value.

• host:: is automatically prepended to the value when this shortcut is used. • Defaults to the IP address of fully qualified domain name of the host where the data originated.

index = <string>

• Set the index where events from this input will be stored. • index:: is automatically prepended to the value when this shortcut is used. • Defaults to main (or whatever you have set as your default index). • For more information about the index field, see "How indexing works" in this manual.

sourcetype = <string>

• Set the sourcetype name of events from this input.

• sourcetype:: is automatically prepended to the value when this shortcut is used. • Splunk automatically picks a source type based on various aspects of your data. There is no hard-coded default. • For more information about the sourcetype field, read about source types in this manual.

source = <string>

• Set the source name of events from this input.

• Defaults to the file path. • source:: is automatically prepended to the value when this shortcut is used.

queue = <string> (parsingQueue, indexQueue, etc)

• Specify where the input processor should deposit the events that it reads. • Can be any valid, existing queue in the pipeline. • Defaults to parsingQueue.

connection_host = [ip | dns]

• If set to ip: the TCP input processor rewrites the host with the ip address of the remote server. • If set to dns: the host is rewritten with the DNS entry of the remote server. • Defaults to ip.

54UDP

[udp://<port>]<attrbute1> = <val1><attrbute2> = <val2>...

This type of input stanza is similar to the TCP type, except that it listens on a UDP port.

host = <string>

• Set the host value of your input to a static value.

• host= is automatically prepended to the value when this shortcut is used. • Defaults to the IP address of fully qualified domain name of the host where the data originated.

index = <string>

• Set the index where events from this input will be stored. • index= is automatically prepended to the value when this shortcut is used. • Defaults to main (or whatever you have set as your default index). • For more information about the index field, read about how indexing works in this manual.

sourcetype = <string>

• Set the sourcetype name of events from this input.

• sourcetype= is automatically prepended to the value when this shortcut is used. • Splunk automatically picks a source type based on various aspects of your data. There is no hard-coded default. • For more information about the sourcetype field, read about source types in this manual.

source = <string>

• Set the source name of events from this input.

• Defaults to the file path. • source= is automatically prepended to the value when this shortcut is used.

queue = <string> (parsingQueue, indexQueue, etc)

• Specify where the input processor should deposit the events that it reads. • Can be any valid, existing queue in the pipeline. • Defaults to parsingQueue.

_rcvbuf = <int>

• Specify the receive buffer for the UDP port (in bytes). • If the value is 0 or negative, it is ignored. • Defaults to 1,572,864. • Note: The default in the OS varies.

no_priority_stripping = true | false

55 • If this attribute is set to true, then Splunk does NOT strip the <priority> syslog field from received events. • Otherwise, Splunk strips syslog priority from events.

no_appending_timestamp = true

• If this attribute is set to true, then Splunk does NOT append a timestamp and host to received events. • Note: Do NOT include this key at all if you want to append timestamp and host to received events.

Answers

Have questions? Visit Splunk Answers and see what and answers the Splunk community has aboutquestions UDP inputs, TCP inputs, and inputs in general,

Considerations for deciding how to monitor remote Windows

dataConsiderations for deciding how to monitor remote Windows data

Use forwarders or remote collection?

The best way to get data off of a Windows host is with local Splunk forwarder. Using a local forwarderoffers the most types of data sources, minimizes network overhead and reduces operational risk andcomplexity.

However, there are circumstances ? from organizational boundaries to local performance

considerations ? where remote collection is preferred. For these situations, Splunk supports using thenative WMI interface on Windows to collect event logs and performance data.

This table offers a list of data sources and their respective trade-offs for you to consider.

Data Source Considerations

Data Source Local Forwarder Remote Polling

Event logs Yes Yes*Performance Yes YesRegistry Yes NoLog files Yes Yes** Crawl Yes No* For remote event log collection, you must know the name of the Event Log you wish to collect. Onlocal forwarders, you have the option to collect all logs, regardless of name.

** Remote log file collection using {\\servername\share\} syntax is supported, however you must useCIFS as your application layer file access protocol and Splunk must have at least read access to both

56the share and the underlying file system.

Tradeoffs

Performance

For identical collecting local Event Logs and flat log files, a local forwarder requires less CPU andperforms basic pre-compression of the data; it is more memory intensive, mostly owing to theadditional data source input options. WMI remote polling is more CPU intensive on the target for thesame set of data (either remote Event Logs or remote performance data) and more networkintensive.

Note that for highly audited hosts, such as domain controllers, remote polling may not be able to keepup with the volume of data or Event Log events. Remote polling is best-effort by design of the WMIAPI, and is throttled to prevent unintentional denial of service attacks.

Deployment

Local forwarders are easier where you have control of the base OS build, and/or if you have manydata sources, especially if transformation of data is required. Remote polling works well when youwant a limited set of data from a large number of hosts (for example, just process CPU data for usagebilling). Remote polling may be your only option where you don?t have either build control or localadministrator privileges on the target host.

A common usage scenario is to first test using remote polling, then add successful or useful polls toyour local forwarding configuration later, or at mass deployment time.

Management

Both mechanisms offer logging and, potentially, alerting to let you know if a host is coming on oroffline or is no longer connected. However, to prevent an unintentional denial of service attack theWMI polling service in Splunk will start to poll less frequently over time if it is unable to contact a hostfor a period of time. Therefore remote polling is not advised for machines that are frequently offline,such as laptops or dynamically provisioned virtual machines.

Search Windows data on a non-Windows Splunk

You can index and search your Windows data on a non-Windows instance of Splunk, but you mustfirst use a Windows instance of Splunk to gather the Windows data. You can easily do this by meansof a Splunk forwarder running on Windows, configured to gather Windows inputs and then forwardthe data to the non-Windows instance of Splunk where searching and indexing will take place.

There are two main ways to proceed:

• Set up light forwarders locally on each Windows machine that's generating data. These forwarders can send the Windows data to the non-Windows receiving instance of Splunk.

• Set up a regular forwarder on a separate Windows machine. The fowarder can perform remote polling on all the Windows machines in the environment and then forward the combined data to a non-Windows receiving instance of Splunk.

57You must specially configure the non-Windows Splunk to handle the Windows data. For details, see"Searching data received from a forwarder running on a different operating system".

For information on setting up forwarders, see "Set up forwarding and receiving".

You can monitor two types of event log collections:

Note: To add another log channel to monitor on localhost, edit the existing input. To monitor a remotemachine, add a new input.

Use Splunk Web to configure event log monitoring

Configure local event log monitoring

1. Click Manager in the upper right-hand corner of Splunk Web.

2. Under System configurations, click Data Inputs.

3. Click Local event log collections.

4. Click Add new to add an input.

5. Select one or more logs from the list of Available Logs and click to add to the list of SelectedLogs.

Note 1: Select up to 63 logs from the list of Available Logs. Selecting more than 63 can cause Splunkto become unstable.

Note 2: Certain Windows Event Log channels (known as direct channels) do not allow for users toaccess - or subscribe to - them in order to monitor them. This is because events sent via these logchannels are not actually processed by the Windows Event Log framework, and thus can't beforwarded or collected remotely. Attempts to monitor these log channels will generate the error: "Thecaller is trying to subscribe to a direct channel which is not allowed."

6. Click Save.

The input is added and enabled.

58Configure remote event log monitoring

1. Click Manager in the upper right-hand corner of Splunk Web.

2. Under System configurations, click Data Inputs.

3. Click Remote event log collections.

4. Click Add new to add an input.

5. Enter a unique name for this collection.

6. Specify a hostname or IP address for the host from which to pull logs, and click Find logs... to geta list of logs from which to choose.

Note: Windows Vista offers many channels; depending on the CPU available to Splunk, selecting allor a large number of them can result in high load.

7. Optionally, provide a comma-separated list of additional servers from which to pull data.

# Windows platform specific input processor.

59You can configure Splunk to read non-default Windows event logs as well, but you must first importthem to the Windows Event Viewer first, and then add them to your local copy of inputs.conf,(usually in %SPLUNK_HOME%\etc\system\local\inputs.conf) as follows:

To disable indexing for an event log, add disabled = 1 below its listing in the stanza in%SPLUNK_HOME%\etc\system\local\inputs.conf.

If you've added some non-standard event log channels and you want to specify whether ActiveDirectory objects like GUIDs and SIDs are resolved for a given Windows event log channel, you canturn on the evt_resolve_ad_obj setting (1=enabled, 0=disabled) for that channel's stanza in yourlocal copy of inputs.conf. evt_resolve_ad_obj is on by default for the Security channel.

To specify the Domain Controller name and/or DNS name of the domain to bind to for Splunk to useto resolve the AD objects, use the evt_dc_name and/or evt_dns_name settings along withevt_resolve_ad_obj. This name can be the name of the domain controller or the fully-qualifiedDNS name of the domain controller. Either name type can, optionally, be preceded by two backslashcharacters. The following examples are correctly formatted domain controller names:

Specify whether to index starting at earliest or most recent event

Use these settings to specify in which chronological order you want to index the events, fromoldest->newest or newest->oldest, and whether you want to index all pre-existing events, or just newevents.

start_from = oldestcurrent_only = 1

• start_from: By default, Splunk starts with the oldest data and indexes forward. You can set it to newest, telling Splunk to start with the newest data and index backward. We don't recommend changing this setting, as it results in a highly inefficient indexing process. • current_only: This option allows you to only index new events that appear from the moment Splunk was started. When set to 1, it is enabled. When set to 0, it is disabled and all events are indexed.

Index exported event log (.evt or .evtx) files

To index exported Windows event log files, use the instructions for monitoring files and directories tomonitor the directory into which you place these exported files.

60Constraints

• As a result of API and log channel processing constraints on Windows XP and 2003 systems, imported .evt files will not contain the message field. This means that the message field will not appear in your Splunk index. • Splunk running on Windows 2000/2003/XP cannot index Vista/2008/Windows 7 .evtx files. • Splunk running on Vista/2008/Windows 7 can index both .evt and .evtx files. • If your .evt/.evtx file is not from a standard event log channel, you must make sure that any DLL files required by that channel are present on the computer on which you are indexing. • The language that a .evt/.evtx file will be indexed as is the primary locale/language of the Splunk computer that collects the file.

Caution: Do not attempt to monitor a .evt or .evtx file that is currently being written to; Windows willnot allow read access to these files. Use the event log monitoring feature instead.

Note: When producing .evt/.evtx files on one system, and monitoring them on another, it's possible tonot have all fields expanded as they would be on the producing system. This is caused by variationsin DLL availability and APIs. Differences in OS version, language, patch level, installed third partyDLLs, etc. can have this effect.

Answers

Have questions? Visit Splunk Answers and see what questions and answers the Splunk communityhas around Windows event logs.

Monitor Windows Registry data

Monitor Windows Registry data

Splunk supports the capture of Windows registry settings and lets you monitor changes to theregistry. You can know when registry entries are added, updated, and deleted. When a registry entryis changed, Splunk captures the name of the process that made the change and the key path fromthe hive to the entry being changed.

The Windows registry input monitor application runs as a process called splunk-regmon.exe.

Warning: Do not stop or kill the splunk-regmon.exe process manually; this could result in systeminstability. To stop the process, stop the Splunkd server process from the Services control panel.

Enable Registry monitoring in Splunk Web

Splunk on Windows comes with Registry monitoring configured but disabled by default. You canperform a one-time baseline index and then separately enable ongoing monitoring for machine and/oruser keys. To do this:

1. In Splunk Web, click Manager in the upper right corner.

2. Click Data inputs > Registry Monitoring

3. Choose Machine keys or User keys and enable the baseline and ongoing monitoring as desired.

614. Click Save.

How it works: the details

Windows registries can be extremely dynamic (thereby generating a great many events). Splunkprovides a two-tiered configuration for fine-tuning the filters that are applied to the registry event datacoming into Splunk.

Splunk Windows registry monitoring uses two configuration files to determine what to monitor on yoursystem, sysmon.conf and the filter rules file referenced by it. By default, the filter rules file is namedregmon-filters.conf, but you can define its name within sysmon.conf by using thefilter_file_name attribute. Both of these files need to reside in$SPLUNK_HOME\etc\system\local\.

The two configuration files work as a hierarchy:

• sysmon.conf contains global settings for which event types (adds, deletes, renames, and so on) to monitor, which regular expression filters from the filter rules file to use, and whether or not Windows registry events are monitored at all. • The filter rules file (by default named regmon-filters.conf) contains the specific regular expressions you create to refine and filter the Registry hive key paths you want Splunk to monitor.

sysmon.conf contains only one stanza, where you specify:

• event_types: the superset of registry event types you want to monitor. Can be any of delete, set, create, rename, open, close, query. • filter_file_name: the file that Splunk should access for filter rules for this monitor. For example, if the attribute is set to regmon-filters, then Splunk looks in regmon-filters.conf for filter rule information. • inclusive: whether the filter rules listed in the file specified by filter_file_name are inclusive (meaning Splunk should only monitor what is listed there) or exclusive (meaning Splunk should monitor everything except what is listed there). Set this value to 1 to make the filter rules inclusive, and 0 to make them exclusive. • disabled: whether to monitor registry settings changes or not. Set this to 1 to disable Windows registry monitoring altogether.

• proc: a regular expression containing the path to the process or processes you want to monitor • hive: a regular expression containing the hive path to the entry or entries you want to monitor. Splunk supports the root key value mappings predefined in Windows: ♦ \\REGISTRY\\USER\\ maps to HKEY_USERS or HKU ♦ \\REGISTRY\\USER\\_Classes maps to HKEY_CLASSES_ROOT or HKCR ♦ \\REGISTRY\\MACHINE maps to HKEY_LOCAL_MACHINE or HKLM ♦ \\REGISTRY\\MACHINE\\SOFTWARE\\Classes maps to HKEY_CLASSES_ROOT or HKCR ♦ \\REGISTRY\\MACHINE\\SYSTEM\\CurrentControlSet\\Hardware

62 Profiles\\Current maps to HKEY_CURRENT_CONFIG or HKCC ♦ Note: There is no direct mapping for HKEY_CURRENT_USER or HKCU, as the Splunk Registry monitor runs in kernel mode. However, using \\REGISTRY\\USER\\.* (note the period and asterisk at the end) will generate events that contain the logged-in user's SID. ♦ Alternatively, you can specify the user whose registry keys you wish to monitor by using \\REGISTRY\\USER\\<SID>, where SID is the SID of the desired user. • type: the subset of event types to monitor. Can be delete, set, create, rename, open, close, query. The values here must be a subset of the values for event_types that you set in sysmon.conf. • baseline: whether or not to capture a baseline snapshot for that particular hive path. Set to 0 for no, and 1 for yes. • baseline interval: how long Splunk has to have been down before re-taking the snapshot, in seconds. The default value is 24 hours. • disabled: whether or not a filter is enabled. 0 means it is, 1 means it is not.

Get a baseline snapshot

When you enable Registry monitoring, you're given the option of recording a baseline snapshot ofyour registry hives the next time Splunk starts. By default, the snapshot covers the entirety of the userkeys and machine keys hives. It also establishes a timeline for when to retake the snapshot; bydefault, if Splunk has been down for more than 24 hours since the last checkpoint, it will retake thebaseline snapshot. You can customize this value for each of the filters in regmon-filters.confby setting the value of baseline_interval.

Note: The baseline_interval attribute is expressed in seconds.

Note: Executing a splunk clean all -f from the CLI deletes the current baseline snapshot.

What to consider

When you install Splunk on a Windows machine and enable registry monitoring, you specify whichmajor hive paths to monitor: key users (HKEY) and/or key local machine (HKLM). Depending on howdynamic you expect the registry to be on this machine, checking both could result in a great deal ofdata for Splunk to monitor. If you're expecting a lot of registry events, you may want to specify somefilters in regmon-filters.conf to narrow the scope of your monitoring immediately after youinstall Splunk and enable registry event monitoring but before you start Splunk up.

Similarly, you have the option of capturing a baseline snapshot of the current state of your Windowsregistry when you first start Splunk, and again every time a specified amount of time has passed. Thebaselining process can be somewhat processor-intensive, and may take several minutes. You canpostpone taking a baseline snapshot until you've edited regmon-filters.conf and narrowed thescope of the registry entries to those you specifically want Splunk to monitor.

Change the default Windows registry input values

Look at inputs.conf to see the default values for Windows registry input. They are also shown below.

To make changes to the default values, edit a copy of inputs.conf in

$SPLUNK_HOME\etc\system\local\. Provide new values for only the parameters you want to

63change within the [script://$SPLUNK_HOME\bin\scripts\splunk-regmon.path] stanza.There's no need to edit the other values. For more information about how to work with Splunkconfiguration files, refer to "About configuration files".

• source: labels these events as coming from the registry.

• sourcetype: assigns these events as registry events. • interval: specifies how frequently to poll the registry for changes, in seconds. • disabled: indicates whether the feature is enabled. Set this to 1 to disable this feature.

Note: The Splunk registry input monitoring script (splunk-regmon.path) is configured as a

scripted input. Do not change this value.

Note: You must use two backslashes \\ to escape wildcards in stanza names in inputs.conf.Regexes with backslashes in them are not currently supported when specifying paths to files.

Monitor WMI data

Monitor WMI data

Splunk supports WMI (Windows Management Instrumentation) data input for agentless access toWindows performance data and event logs. This means you can pull event logs from all the Windowsservers and desktops in your environment without having to install anything on those machines.

The Splunk WMI data input can connect to multiple WMI providers and pull data from them. The WMIdata input runs as a separate process (splunk-wmi.exe) on the Splunk server. It is configured as ascripted input in %SPLUNK_HOME%\etc\system\default\inputs.conf. Do not make changesto this file.

Note: This feature is only available on the Windows versions of Splunk.

Security and remote access considerations

Splunk requires privileged access to index many Windows data sources, including WMI, Event Logs,and the registry. This includes both the ability to connect to the computer you wish to poll, as well aspermissions to read the appropriate data once connected. To access WMI data, Splunk must run as auser with permissions to perform remote WMI connections. This user name must be a member of anActive Directory domain, and must have appropriate privileges to query WMI. Both the Splunk servermaking the query and the target systems being queried must be part of this Active Directory domain.

Note: If you installed Splunk as the LOCAL SYSTEM user, WMI remote authentication will not work;this user has null credentials and Windows servers normally disallow such connections.

There are several things to consider:

64 • For remote data collection via WMI, the Splunk service must run as a user who has sufficient OS privileges to access the WMI resources you wish to poll. At a minimum, Splunk requires access to the following privileges on every machine you poll: ♦ Profile System Performance ♦ Access this Computer from the Network ♦ The simplest way to ensure Splunk has access to these resources is to add Splunk's user to the Performance Log Users and Distributed COM Users Domain groups. If these additions fail to provide sufficient permissions, add Splunk's user to the remote machine's Administrators group. • You must enable DCOM for remote machine access, and it must be accessible to Splunk's user. See the Microsoft topic about Securing a Remote WMI Connection for more information. Adding Splunk's user to the Distributed COM Users local group is the fastest way to enable this permission. If this fails to provide sufficient permissions, add Splunk's user to the remote machine's Administrators group. • The WMI namespace that Splunk accesses (most commonly root\cimv2) must have proper permissions set. Enable the following permissions on the WMI tree at root for the Splunk user: ♦ Execute Methods, Enable Account, Remote Enable, and Read Security. ♦ See the Microsoft HOW TO: Set WMI Namespace Security in Windows Server 2003 for more information. • If you have a firewall enabled, you must allow access for WMI. If you are using the Windows Firewall, the exceptions list explicitly lists WMI. You must set this exception for both the originating and the remote machine. See the Microsoft topic about Connecting to WMI Remotely Starting with Vista for more details.

Test access to WMI

Follow these steps to test the configuration of the Splunk server and the remote machine:

5. If you see data streaming back and no error message, that means your were able to connect andquery successfully. If there was an error, there would be a message with a reason on why it failed(look for the error="<msg>" string).

Configure WMI input

You can configure WMI input either in Splunk Web or by editing configuration files. More options areavailable when using the configuration file option.

65Configure WMI with Splunk Web

1. Click Manager in the upper right-hand corner of Splunk Web.

2. Under System configurations, click Data Inputs.

3. Click WMI collections.

4. Click Add new to add an input.

5. Enter a unique name for this collection.

6. Enter a target host and click Query... to get a list of the available classes of properties to choosefrom.

Note: Only classes that are prefixed with Win32_PerfFormattedData_* are displayed in the list. If theclass you wish to index does not start with Win32_PerfFormattedData_* prefix, you must add them byediting wmi.conf.

7. Optionally, provide a comma-separated list of additional servers from which to pull data.

8. Specify an interval in seconds between polls.

9. Ensure the Enabled? radio button is set to Yes and click Save.

The input is added and enabled.

Disabling or deleting WMI inputs

To disable a WMI input, navigate to it and select the No radio button under Enabled?.

It's not possible to delete a WMI input from within Splunk Web. To delete a WMI input, use theinformation in the section below titled "Configure WMI with configuration files" to edit the wmi.conf fileand remove the [WMI:<input_name>] stanza you'd like to delete.

Configure WMI with configuration files

Look at wmi.conf to see the default values for the WMI input. If you want to make changes to thedefault values, edit a copy of wmi.conf in %SPLUNK_HOME%\etc\system\local\. Only setvalues for the attributes you want to change for a given type of data input. Refer to Aboutconfiguration files for more information about how Splunk uses configuration files.

The [settings] stanza specifies runtime parameters. The entire stanza and every parameterwithin it are optional. If the stanza is missing, Splunk assumes system defaults.

• The following attributes control how the agent reconnects to a given WMI provider when an error occurs. All times are in seconds: ♦ initial_backoff: how much time to wait the first time after an error occurs before trying to reconnect. Thereafter, if errors keep occurring, the wait time doubles, until it reaches max_backoff. ♦ max_backoff: the maximum amount of time to wait before invoking max_retries_at_max_backoff. ♦ max_retries_at_max_backoff : if the wait time reaches max_backoff, try this many times at this wait time. If the error continues to occur, Splunk will not reconnect to the WMI provider in question until the Splunk services are restarted. • checkpoint_sync_interval: minimum wait time for state data (event log checkpoint) to be written to disk. In seconds.

The common parameters for both types are:

• server: a comma-separated list of servers from which to pull data. If this parameter is missing, Splunk assumes the local machine.

67 • interval : how often to poll for new data, in seconds. Required. • disabled: indicates whether this feature is enabled or disabled. Set this parameter to 1 to disable WMI input into Splunk.

WQL-specific parameters:

• namespace: specifies the path to the WMI provider. The local machine must be able to connect to the remote machine using delegated authentication. This attrbitue is optional. If you don't specify a path to a remote machine, Splunk will connect to the default local namespace (\root\cimv2), which is where most of the providers you are likely to query reside. Microsoft provides a list of namespaces for Windows XP and later versions of Windows. • wql: provides the WQL query. The example above polls data about a running process named splunkd every 5 seconds.

Event log-specific parameters:

• event_log_file: specify a comma-separated list of log files to poll in the event_log_file

parameter. File names that include spaces are supported, as shown in the example. • current_only: If enabled (or set to 1), this option allows you to collect events that occur only while Splunk is running. It behaves like a tail to a file.

You can also use the current_only parameter in raw WQL stanzas to collect WMI eventnotification data. Check the wmi.conf configuration file reference for examples.

Fields for WMI data

All events received from WMI have the source set to wmi.

• For event log data, the source type is set to WinEventLog:<name of log file> (for example WinEventLog:Application). • For WQL data, the the source type is set to the name of the config stanza (for example, for a stanza named [WMI:LocalSplunkdProcess], the field is set to WMI:LocalSplunkdProcess).

The host is identified automatically from the data received.

Monitor Active Directory

Monitor Active Directory

Configure Active Directory monitoring as an input to monitor changes to portions of, or all of, your ADforest and collect user and machine metadata.

Once you've enabled this feature and restarted Splunk it will take a baseline snapshot of your ADdata and the AD schema. It'll use this data to get a starting point against which to monitor. Thisprocess might take a little time before it is complete.

68Powerful lookups from your AD data

You can use this feature combined with dynamic list lookups to decorate or modify events with anyinformation available in AD. Read an overview of how in this topic on the Splunk Community Wiki.

Things to know

• This feature is only available on Windows platforms.

• The admon.exe process can run under a full Splunk install or within a forwarder. • The machine the admon.exe process is running on must belong to the domain you want to monitor. • The user Splunk is running as must be part of the domain too; whatever rights that user has to query to AD will filter the results Splunk can see. • You can use the Windows permissions of the user admon.exe is running as to control the level of access Splunk should have and what it should be allowed to see. Note that the AD user rights policy set in Group Policy Manager can further restrict access.

For more details, see this topic about choosing the user Splunk should run as in the InstallationManual.

Configure monitoring

You can configure AD monitoring either in Splunk Web or by editing configuration files.

3. Click Active Directory monitoring.

4. Click Add new to add an input.

5. Enter a unique Name for the AD monitor.

7. Either select a Starting node or Splunk will start monitoring from the highest available part of thetree.

8. Select Monitor subtree, if you want Splunk to monitor all child nodes.

10. Click Save.

Configure AD monitoring in inputs.conf and admon.conf

Be sure to edit copies of these configuration files in a \local directory. If you edit them in the defaultdirectory, any changes you make will be overwritten when you upgrade Splunk. For more informationabout configuration file precedence, refer to "About configuration files" in this manual.

691. Make a copy of $SPLUNK_HOME\etc\system\default\inputs.conf and put it in$SPLUNK_HOME\etc\system\local\inputs.conf.

2. Edit the copy and enable the scripted input

[script://$SPLUNK_HOME\bin\scripts\splunk-admon.path] by setting the value ofdisabled to 0.

3. Next, make a similar copy of $SPLUNK_HOME\etc\system\default\admon.conf and put it in

$SPLUNK_HOME\etc\system\local\admon.conf.

4. Edit it using the information later in this topic. By default, when enabled, it will index the firstdomain controller that the admon.exe process can attach to. If that is acceptable, no furtherconfiguration is necessary; it will just work.

Settings in admon.conf

monitorSubtree tells Splunk how much of the target container to index. A value of 0 will tell Splunkto only index the target container. A value of of 1 (the default) will tell Splunk to enumerate allsub-containers and domains it has access to.

targetDC sets the unique name of the domain controller host you want to monitor. Specify a uniquename if:

• you have a very large AD and you only want to monitor information in a particular branch (ou), subdomain, etc. • you want to limit your scope to only a certain subdomain of your tree. • you have a specific (read-only) domain controller that is offered for this purpose in a high security environment. • if you have multiple domain forests in a trusted configuration, you can use this to target a different tree than the one where Splunk resides. • if you want to run multiple instances of admon.exe to target multiple Domain Controllers, for example, to monitor replication health across a distributed environment.

If you want to target multiple DCs, add another [<uniquename>TargetDC] stanza for a target inthat tree.

startingNode is a fully qualified LDAP name (for example

"LDAP://OU=Computers,DC=ad,DC=splunk,DC=com") where Splunk will begin its indexing.Splunk starts there and enumerates down to sub-containers, depending on the configuration ofmonitorSubtree above. If you don't specify something, it will start at the highest root domain in thetree it can access.

The startingNode must be within the scope of the DC you are targeting to be successful.

Example AD monitoring configurations

You can monitor monitor a target DC that is a higher root level than an OU you want to target, forexample:

The OU = computers in the eng.ad.splunk.com subdomain.

70Target your DC to be one of the controllers in ad.splunk.com. The reason one might do this is if youwant the schema for the entire tree, not just a sub-domain. Then set the starting node to be an OU ineng.ad.splunk.com to audit machines being added and removed from that OU.

Have questions? Visit Splunk Answers and see what questions and answers the Splunk communityhas around monitoring AD with Splunk.

Monitor FIFO queues

Monitor FIFO queues

This topic describes how to configure a FIFO input using inputs.conf. Defining FIFO inputs is notcurrently supported in Splunk Web/Manager.

75Caution: Data sent via FIFO is not persisted in memory and can be an unreliable method for datasources. To ensure your data is not lost, use monitor instead.

Add a FIFO input to inputs.conf

To add a FIFO input, add a stanza for it to inputs.conf in $SPLUNK_HOME/etc/system/local/, or

your own custom application directory in $SPLUNK_HOME/etc/apps/. If you have not worked withSplunk's configuration files before, read about configuration files before you begin.

You can set any number of attributes and values following an input type. If you do not specify a valuefor one or more attributes, Splunk uses the defaults that are preset in$SPLUNK_HOME/etc/system/default/ (noted below).

[fifo://<path>]

This input stanza type directs Splunk to read from a FIFO at the specified path.

host = <string>

• Set the host value of your input to a static value.

• host= is automatically prepended to the value when this shortcut is used. • Defaults to the IP address of fully qualified domain name of the host where the data originated. • For more information about the host field, read "About default fields" in this manual.

index = <string>

• Set the index where events from this input will be stored. • index= is automatically prepended to the value when this shortcut is used. • Defaults to main (or whatever you have set as your default index). • For more information about the index field, read "About fields" in the Knowledge Manager Manual.

sourcetype = <string>

• Set the sourcetype name of events from this input.

• sourcetype= is automatically prepended to the value when this shortcut is used. • Splunk automatically picks a source type based on various aspects of your data. There is no hard-coded default. • For more information about the sourcetype field, read"About default fields" in this manual.

source = <string>

• Set the source name of events from this input.

• Defaults to the file path. • source= is automatically prepended to the value when this shortcut is used.

queue = <string> (parsingQueue, indexQueue, etc)

• Specify where the input processor should deposit the events that it reads. • Can be any valid, existing queue in the pipeline.

76 • Defaults to parsingQueue.

Monitor changes to your filesystem

Monitor changes to your filesystem

Splunk's file system change monitor is useful for tracking changes in your file system. The filesystem change monitor watches any directory you specify and generates an event (in Splunk) whenthat directory undergoes any change. It is completely configurable and can detect when any file onthe system is edited, deleted or added (not just Splunk-specific files). For example, you can tell thefile system change monitor to watch /etc/sysconfig/ and alert you any time the system'sconfigurations are changed.

Configure the file system change monitor in inputs.conf.

Note: If you're interested in auditing file reads on Windows, check out this topic on the SplunkCommunity best practices Wiki. Some users might find it more straightforward to use Windows nativeauditing tools.

Caution: Do not configure the file system change monitor to monitor your root filesystem. This can bedangerous and time-consuming if directory recursion is enabled.

77Configure the file system change monitor

By default, the file system change monitor will generate audit events whenever the contents of$SPLUNK_HOME/etc/ are changed, deleted, or added to. When you start Splunk for the first time,an add audit event will be generated for each file in the $SPLUNK_HOME/etc/ directory and allsub-directories. Any time after that, any change in configuration (regardless of origin) will generate anaudit event for the affected file(s). If you have signedaudit=true , the file system change auditevent will be indexed into the audit index (index=_audit). If signedaudit is not turned on, bydefault, the events are written to the main index unless you specify another index.

Note: The file system change monitor does not track the user name of the account executing thechange, only that a change has occurred. For user-level monitoring consider using native operatingsystem audit tools, which have access to this information.

You can use the file system change monitor to watch any directory by adding a stanza toinputs.conf.

Create your own inputs.conf in $SPLUNK_HOME/etc/system/local/. Edit this files in

$SPLUNK_HOME/etc/system/local/, or your own custom application directory in$SPLUNK_HOME/etc/apps/. For more information on configuration files in general, see "Aboutconfiguration files".

Edit the [fschange] stanza to configure the file system change monitor. Every setting is optionalexcept the stanza name fschange:<directory or file to monitor>.

Note: You must restart Splunk any time you make changes to the [fschange] stanza.

Possible attribute/value pairs

[fschange:<directory or file to monitor>]

• The system will monitor all adds/updates/deletes to this directory and sub-directories. • Any changes will generate an event that is indexed by Splunk. • Defaults to $SPLUNK_HOME/etc/.

index=<indexname>

• The index to store all events generated.

• Defaults to main (unless you have turned on audit event signing).

78recurse=<true | false>

• If true, recurse directories within the directory specified in [fschange].

• Defaults to true.

followLinks=<true | false>

• If true, the file system change monitor will follow symbolic links. • Defaults to false.

Caution: If you are not careful with setting followLinks, file system loops may occur.

pollPeriod=N

• Check this directory for changes every N seconds.

• Defaults to 3600. ♦ If you make a change, the file system audit events could take anywhere between 1 and 3600 seconds to be generated and become available in audit search.

hashMaxSize=N

• Calculate a SHA1 hash for every file that is less than or equal to N size in bytes. • This hash can be used as an additional method for detecting change in the file/directory. • Defaults to -1 (no hashing used for change detection).

signedaudit=<true | false>

• Send cryptographically signed add/update/delete events.

• Defaults to false. • Setting to true will generate events in the _audit index. • This should be deliberately set to false if you wish to set the index.

Note: When setting signedaudit to true, make sure auditing is enabled in audit.conf.

fullEvent=<true | false>

• Send the full event if an add or update change is detected.

• Further qualified by the sendEventMaxSize attribute. • Defaults to false.

sendEventMaxSize=N

• Only send the full event if the size of the event is less than or equal to N bytes. • This limits the size of indexed file data. • Defaults to -1, which is unlimited.

To define a filter, add a [filter...] stanza as follows:

Fschange white/blacklist logic is handed similarly to typical firewalls. The events run down throughthe list of filters until they reach their first match. If the first filter to match an event is a whitelist, theevent will be indexed. If the first filter to match an event is a blacklist, the event will not be indexed. Ifan event reaches the end of the chain with no matches, it will be indexed. This means that there is animplicit "all pass" built in. To default to a situation where events are *not* indexed if they don't matcha whitelist explicitly, end the chain with a blacklist that will match all remaining events.

For example:

...filters = <filter1>, <filter2>, ... terminal-blacklist

[filter:blacklist:terminal-blacklist]regex1 = .?

Important: If a directory is ever blacklisted *including via a terminal blacklist at the end of a series ofwhitelists*, then *all* its subfolders and files are automatically blacklisted and will not pass anywhitelist. To accommodate this, whitelist all desired folders and subfolders explicitly ahead of theblacklist items in your filters.

80Examples

Monitor files with specific extensions

This configuration monitors files in the specified directory with the extensions config, xml, properties,and log and ignores all others.

Note: In this example, a directory could be blacklisted. If this is the case, *all* its subfolders and fileswould automatically blacklisted as well--only files in the specified directory would be monitored.

Find more things to monitor with crawl

Find more things to monitor with crawl

Use crawl to search your filesystem for new data sources to add to your index. Configure one or moretypes of crawlers in crawl.conf to define the type of data sources to include in or exclude from yourresults.

Configuration

Edit $SPLUNK_HOME/etc/system/local/crawl.conf to configure one or more crawlers that

browse your data sources when you run the crawl command. Define each crawler by specifyingvalues for each of the crawl attributes. Enable the crawler by adding it to crawlers_list.

Crawl logging

The crawl command produces a log of crawl activity that's stored in

$SPLUNK_HOME/var/log/splunk/crawl.log. Set the logging level with the logging key in the[default] stanza of crawl.conf:

[default]logging = <warn | error | info | debug>

Enable crawlers

Enable a crawler by listing the crawler specification stanza name in the crawlers_list key of the[crawlers] stanza.

Use a comma-separated list to specify multiple crawlers.

Enable crawlers that are defined in the stanzas: [file_crawler], [port_crawler], and[db_crawler].

Example crawler stanzas in crawl.conf:

[Example_crawler_name]....[Another_crawler_name]

82....

Add key/value pairs to crawler definition stanzas to set a crawler's behavior. The following keys areavailable for defining a file_crawler:

Argument Descriptionbad_directories_list Specify directories to exclude.bad_extensions_list Specify file extensions to exclude. Specify a string, or a comma-separated list of strings that filenamesbad_file_matches_list must contain to be excluded. You can use wildcards (examples: foo*.*,foo*bar, *baz*). Specify extensions of common archive filetypes to include. Splunk unpacks compressed files before it reads them. It can handle tar, gz,packed_extensions_list bz2, tar.gz, tgz, tbz, tbz2, zip, and z files. Leave this empty if you don't want to add any archive filetypes. Specify the minimum number of files a source must have to becollapse_threshold considered a directory. Specify a comma-separated list of age (days) and size (kb) pairs to constrain what files are crawled. For example: days_sizek_pairs_listdays_sizek_pairs_list = 7-0, 30-1000 tells Splunk to crawl only files last modified within 7 days and at least 0kb in size, or modified within the last 30 days and at least 1000kb in size. Set the maximum number of files a directory can have in order to bebig_dir_filecount crawled. crawl excludes directories that contain more than the maximum number you specify. Specify the name of the index to which you want to add crawled fileindex and directory contents. Specify how far to crawl into a directory for files. If Splunk crawls amax_badfiles_per_dir directory and doesn't find valid files within the specified max_badfiles_per_dir, then Splunk excludes the directory.root Specify directories for a crawler to crawl through.Example

This topic covers ways to receive and index SNMP traps at the Splunk indexer. SNMP traps arealerts fired off by remote devices; these devices need to be configured to send their traps to Splunk'sIP address. The default port for SNMP traps is udp:162. This topic does not cover SNMP polling,which is a way to query remote devices.

On UNIX

The most effective way to index SNMP traps is to use snmptrapd to write them to a file. Then,configure the Splunk server to add the file as an input.

snmptrapd itself is part of the net-snmp project. If you're installing this on your system, refer first toany local documentation for your distribution's packaging of the tool, and after that, thedocumentation here: http://net-snmp.sourceforge.net/docs/man/snmptrapd.html

The simplest configuration is:

# snmptrapd -Lf /var/log/snmp-traps

Note: Previously, snmptrapd would accept all incoming notifications, and log them automatically(even if no explicit configuration was provided). Starting with snmptrapd release 5.3 (check withsnmptrapd --version), access control checks will be applied to all incoming notifications. If snmptrapdis run without suitable access control settings, then such traps WILL NOT be processed. You canavoid this by specifying:

# snmptrapd -Lf /var/log/snmp-traps --disableAuthorization=yes

Troubleshooting:

• If you keep the default listening port of 161, which is a privileged port, you will have to run snmptrapd as root. • Use the -f flag to keep snmptrapd in the foreground while testing. Use -Lo instead of -Lf to log to standard output • You can use the snmptrap command to generate an example trap, as in: # snmptrap -v2c -c public localhost 1 1

On Windows

To log SNMP traps to a file on Windows:

1. Install NET-SNMP from http://www.net-snmp.org/

2. Register snmptrapd as service using the script included in the NET-SNMP install.

3. Edit C:\usr\etc\snmp\snmptrapd.conf

snmpTrapdAddr [System IP]:162

84authCommunity log [community string]

4. The default log location is C:\usr\log\snmptrapd.log

MIBs

MIBs, or Management Information Bases, provide a map between numeric OIDs reported by theSNMP trap and a textual human readable form. Though snmptrapd will work quite happily without anyMIB files at all, the results won't be displayed in quite the same way. The vendor of the device youare receiving traps from should provide a specific MIB. For example, all Cisco device MIBs can belocated using the online Cisco SNMP Object Navigator

There are two steps required to add a new MIB file:

1. Download and copy the MIB file into the MIB search directory. The default location is/usr/local/share/snmp/mibs, although this can be set using the -M flag to snmptrapd.

2. Instruct snmptrapd to load the MIB or MIBs by passing a colon separated list to the -m flag. Thereare two important details here:

• Adding a leading '+' character will load the MIB in addition to the default list, instead of overwriting the list; and • The special keyword ALL is used to load all MIB modules in the MIB directory. The safest argument seems to be -m +ALL

Set up custom (scripted) inputs

Set up custom (scripted) inputs

Splunk can accept events from scripts that you provide. Scripted input is useful in conjunction withcommand-line tools, such as vmstat, iostat, netstat, top, etc. You can use scripted input to get datafrom APIs and other remote data interfaces and message queues. You can then use that data togenerate metrics and status data through commands like vmstat, iostat, etc.

Lots of apps on Splunkbase provide scripted inputs for specific applications. -- You can find them onthe Browse more apps tab in the Launcher.

You configure custom scripted inputs from Splunk Manager or by editing inputs.conf.

Note: On Windows platforms, you can enable text-based scripts, such those in perl and python, withan intermediary Windows batch (.bat) file.

Caution: Scripts launched through scripted input inherit Splunk's environment, so be sure to clearenvironment variables that can affect your script's operation. The only environment variable that'slikely to cause problems is the library path (most commonly known as LD_LIBRARY_PATH onlinux/solaris/freebsd).

85Add a scripted input in Splunk Web

To add a scripted input in Splunk Web:

1. Click Manager in the upper right-hand corner of Splunk Web.

2. Under System configurations, click Data Inputs.

3. Click Scripts.

4. Click Add new to add an input.

5. In the Command text box, specify the script command, including the path to the script.

6. In Interval, specify the interval in seconds between script runtimes. The default is 60 (seconds).

7. Enter a new Source name to override the default source value, if necessary.

Important: Consult Splunk support before changing this value.

8. Change the Host value, if necessary.

9. Set the Source type.

Source type is a default field added to events. Source type is used to determine processingcharacteristics such as timestamps and event boundaries. Choose:

• From List. Select one of the predefined source types from the drop-down list. • Manual. Label your own source type in the text box.

10. Set the Index. Leave the value as "default" unless you have defined multiple indexes to handledifferent types of events. In addition to indexes meant for user data, Splunk has a number of utilityindexes, which show up in the dropdown box.

Configure inputs.conf using the following attributes:

• script is the fully-qualified path to the location of the script.

♦ As a best practice, put your script in the bin/ directory nearest the inputs.conf where your script is specified. So if you are configuring $SPLUNK_HOME/etc/system/local/inputs.conf, place your script in

86 $SPLUNK_HOME/etc/system/bin/. If you're working on an application in $SPLUNK_HOME/etc/apps/$APPLICATION/, put your script in $SPLUNK_HOME/etc/apps/$APPLICATION/bin/. • interval indicates how often to execute the specified command. Specify either an integer value representing seconds or a valid cron schedule. ♦ Defaults to 60 seconds. ♦ When a cron schedule is specified, the script is not executed on start up. ♦ Splunk keeps one invocation of a script per instance. Intervals are based on when the script completes. So if you have a script configured to run every 10 minutes and the script takes 20 minutes to complete, the next run will occur 30 minutes after the first run. ♦ For constant data streams, enter 1 (or a value smaller than the script's interval). ♦ For one-shot data streams, enter -1. Setting interval to -1 will cause the script to run each time the splunk daemon restarts. • index can be any index in your Splunk instance. ♦ Default is main. • disabled is a boolean value that can be set to true if you want to disable the input. ♦ Defaults to false. • sourcetype and source can be any value you'd like. ♦ The value you specify is appended to data coming from your script in the sourcetype= or source= fields. ♦ These are optional settings.

If you want the script to run continuously, write the script to never exit and set it on a short interval.This helps to ensure that if there is a problem the script gets restarted. Splunk keeps track of scripts ithas spawned and will shut them down upon exit.

Example using inputs.conf

This example shows the use of the UNIX top command as a data input source.

• Start by creating a new application directory. This example uses scripts/:

$ mkdir $SPLUNK_HOME/etc/apps/scripts

• All scripts should be run out of a bin/ directory inside your application directory: • $ mkdir $SPLUNK_HOME/etc/apps/scripts/bin • This example uses a small shell script top.sh:

$ #!/bin/sh top -bn 1 # linux only - different OSes have different paramaters

• Make sure the script is executable:

chmod +x $SPLUNK_HOME/etc/apps/scripts/bin/top.sh

• Test that the script works by running it via the shell:

$SPLUNK_HOME/etc/apps/scripts/bin/top.sh

• The script should have sent one top output.

87 • Add the script entry to inputs.conf in $SPLUNK_HOME/etc/apps/scripts/default/:

[script:///opt/splunk/etc/apps/scripts/bin/top.sh]interval = 5 # run every 5 secondssourcetype = top # set sourcetype to topsource = script://./bin/top.sh # set source to name of script

props.conf

You may need to modify props.conf:

• By default Splunk breaks the single top entry into multiple events. • The easiest way to fix this problem is to tell the Splunk server to break only before something that does not exist in the output.

For example, adding the following to

$SPLUNK_HOME/etc/apps/scripts/default/props.conf forces all lines into a single event:

[top]BREAK_ONLY_BEFORE = <stuff>

Since there is no timestamp in the top output we need to tell Splunk to use the current time. This isdone in props.conf by setting:

DATETIME_CONFIG = CURRENT

Whitelist or blacklist specific incoming data

Whitelist or blacklist specific incoming data

Use whitelist and blacklist rules to explicitly tell Splunk which files to consume when monitoringdirectories. You can also apply these settings to batch inputs. When you define a whitelist, Splunkindexes ONLY the files in that list. Alternately, when you define a blacklist, Splunk ignores the files inthat list and consumes everything else. You don't have to define both a whitelist and a blacklist, theyare independent settings. If you happen to have both, and a file that matches both of them, that fileWILL NOT be indexed, for example blacklist will override whitelist.

Whitelist and blacklist rules use regular expression syntax to define the match on the file name/path.Also, your rules must be contained within a configuration stanza, for example[monitor://<path>]; those outside a stanza (global entries) are ignored.

Instead of whitelisting or blacklisting your data inputs, you can filter specific events and send them todifferent queues or indexes. Read more about routing and filtering data. You can also use the crawlfeature to predefine files you want Splunk to index or not index automatically when they are added toyour filesystem.

To define the files you want Splunk to exclusively index, add the following line to your monitorstanza in the /local/inputs.conf file for the App this input was defined in:

whitelist = $YOUR_CUSTOM_REGEX

For example, if you want Splunk to monitor only files with the .log extension:

[monitor:///mnt/logs] whitelist = \.log$

You can whitelist multiple files in one line, using the "|" (OR) operator. For example, to whitelistfilenames that contain query.log OR my.log:

whitelist = query\.log$|my\.log$

Or, to whitelist exact matches:

whitelist = /query\.log$|/my\.log$

Note: The "$" anchors the regex to the end of the line. There is no space before or after the "|"operator.

Blacklist (ignore) files

To define the files you want Splunk to exclude from indexing, add the following line to your monitorstanza in the /local/inputs.conf file for the App this input was defined in:

blacklist = $YOUR_CUSTOM_REGEX

Important: If you create a blacklist line for each file you want to ignore, Splunk activates only thelast filter.

If you want Splunk to ignore and not monitor only files with the .txt extension:

[monitor:///mnt/logs] blacklist = \.(txt)$

If you want Splunk to ignore and not monitor all files with either the .txt extension OR the .gzextension (note that you use the "|" for this):

[monitor:///mnt/logs] blacklist = \.(txt|gz)$

If you want Splunk to ignore entire directories beneath a monitor input refer to this example:

[monitor:///mnt/logs] blacklist = (archive|historical|\.bak$)

The above example tells Splunk to ignore all files under /mnt/logs/ within the archive directory, withinhistorical directory and to ignore all files ending in *.bak.

89If you want Splunk to ignore files that contain a specific string you could do something like this:

[monitor:///mnt/logs] blacklist = 2009022[89]file\.txt$

The above example will ignore the webserver20090228file.txt and webserver20090229file.txt filesunder /mnt/logs/.

How log file rotation is handled

How log file rotation is handled

Splunk recognizes when a file that it is monitoring (such as /var/log/messages) has been rolled(/var/log/messages1) and will not read the rolled file in a second time.

Note: Splunk does not recognize compressed files produced by logrotate (such as bz2 or gz) as thesame as the uncompressed originals. This can lead to a duplication of data if these files are thenmonitored by Splunk. You can configure logrotate to move these files into a directory you have nottold Splunk to read, or you can explicitly set blacklist rules for archive filetypes to prevent Splunkfrom reading these files as new logfiles.

For more information on setting blacklist rules see "Whitelist and blacklist specific incoming data" inthis manual.

How log rotation works

The monitoring processor picks up new files and reads the first and last 256 bytes of the file. Thisdata is hashed into a begin and end cyclic redundancy check (CRC). Splunk checks new CRCsagainst a database that contains all the CRCs of files Splunk has seen before. The location Splunklast read in the file is also stored.

There are three possible outcomes of a CRC check:

1. There is no begin and end CRC matching this file in the database. This is a new file and will bepicked up and consumed from the start. Splunk updates the database with new CRCs and seekptrsas the file is being consumed.

2. The begin CRC is present and the end CRC are present but the size of the file is larger than theseekPtr Splunk stored. This means that, while Splunk has seen the file before, there has beeninformation added to it since it was last read. Splunk opens the file and seeks to the previous end ofthe file and starts reading from there (so Splunk will only grab the new data and not anything it hasread before).

903. The begin CRC is present but the end CRC does not match. This means the file has been changedsince Splunk last read it and some of the portions it has read in already are different. In this casethere is evidence that the previous data Splunk read from has been changed. In this case Splunk hasno choice but to read the whole file again.

91Set up forwarding and receivingAbout forwarding and receivingAbout forwarding and receiving

You can forward data from a Splunk instance to another Splunk server or even to a non-Splunksystem. The Splunk instance that performs the forwarding is typically a smaller footprint version ofSplunk, called a forwarder. The forwarder functions as a lightweight, all-purpose agent.

A Splunk server that receives data from a forwarder is called a receiver. The receiver is either aSplunk indexer or another forwarder, configured to receive data from one or more forwarders.

Forwarders vs. light forwarders

Splunk forwarders come in two flavors: regular and light. These differ according to their functionalityand the corresponding size of their footprints.

A regular forwarder, also referred to as just a forwarder, has a smaller footprint than a Splunkserver but retains most of the capability, except that it lacks the ability to do distributed searches.Much of its default functionality, such as Splunk Web, can be disabled, if necessary, to further reducethe size of its footprint. A forwarder parses data before forwarding it and can route data based oncriteria such as source or type of event.

A light forwarder has a small footprint with limited functionality. Its size makes it ideal for forwardingdata from workstations or non-Splunk production servers to a Splunk server for consolidation. Itforwards only unparsed data and, therefore, cannot perform content-based routing. In addition, itdoes not include Splunk Web and its throughput is limited to 256kbs.

For detailed information on the capabilities of regular and light forwarders, see More about forwardersin this manual.

Both types of forwarders can perform automatic load balancing, with the regular forwarder alsooffering round-robin load balancing. Forwarders represent a much more robust solution for dataforwarding than raw network feeds, with their capabilities for:

• Tagging of metadata (source, sourcetype, and host)

Forwarders can transmit three types of data:

92 • Raw • Unparsed • Parsed

A light forwarder can send raw or unparsed data. A regular forwarder can send raw or parsed data.

With raw data, the data stream is forwarded as raw TCP; it is not converted into Splunk'scommunications format. The forwarder just collects the data and forwards it on. This is particularlyuseful for sending data to a non-Splunk system.

With unparsed data, a light forwarder performs only minimal processing. It does not examine thedata stream, but it does tag the entire stream with metadata to identify source, sourcetype, and host.It also divides the data stream into 32K blocks and performs some rudimentary timestamping on thestream, for use by the receiving indexer in case the events themselves have no discernabletimestamps. The light forwarder does not identify, examine, or tag individual events.

With parsed data, a regular forwarder breaks the data into individual events, which it tags and thenforwards to a Splunk server. It can also examine the events. Because the data has been parsed, theforwarder can perform conditional routing based on event data, such as field values.

The parsed and unparsed formats are both referred to as cooked data, to distinguish them from rawdata. By default, forwarders send cooked data — in the light forwarder's case, unparsed data, and inthe regular forwarder's case, parsed data. To send raw data instead, set thesendCookedData=false atribute/value pair in outputs.conf.

Deployment topologies

You can deploy Splunk forwarders in a wide variety of scenarios. These are some typical topologies.

Data consolidation

Data consolidation represents one of the most common topologies, with multiple forwarders sendingdata to a single Splunk server. The scenario typically involves light forwarders forwarding unparseddata from workstations or production non-Spunk servers to a central Splunk server for consolidationand indexing. With a lighter footprint, these forwarders have minimal impact on the performance ofthe systems they reside on. In other scenarios, regular forwarders send parsed data to a centralSplunk server.

Here, three light forwarders are sending data to a single Splunk server:

93Load balancing

Load balancing simplifies the process of distributing data across several Splunk servers to handleconsiderations such as high data volume, horizontal scaling for enhanced search performance, andfault tolerance. In load balancing, the forwarder routes data sequentially to different servers atspecified intervals.

Splunk load balancing comes in two flavors:

• Automatic load balancing

• Round-robin load balancing

For most needs, automatic load balancing, in which the forwarder switches receivers at set timeintervals, offers the better solution. For details on the relative advantages of automatic vs. round-robinload balancing, see Load balancing in this manual.

In this diagram, three light forwarders are each performing automatic load balancing between tworeceivers:

Routing and filtering

In routing, a forwarder routes events to specific Splunk or third-party servers, based on criteria suchas source, sourcetype, or patterns in the events themselves. Routing at the event level requires aregular forwarder.

A forwarder can also filter and route events to specific queues, or discard them altogether by routingto the null queue.

Here, a regular forwarder routes data to three Splunk servers based on event patterns:

94Cloning

With cloning, the forwarder sends duplicate copies of data to multiple Splunk servers. If cloning iscombined with load balancing, the forwarder sends duplicate copies of data to multiple groups ofservers. This second scenario is particularly useful for situations requiring data redundancy topromote data availability. If any of the servers in a load-balanced group goes down while receivingdata, another server in the group automatically takes over the receiver function, ensuring that eachgroup of servers still contains a clone of the data.

In this simple scenario, three forwarders are sending duplicate copies of data to two Splunk servers:

You can also clone data from one full Splunk instance to another. This scenario can be useful if youneed to keep complete sets of indexed data at separate locations to ensure fast local access; forinstance, the North American office in San Francisco and the European office in London.

In another type of cloning, a regular forwarder can retain indexed data locally while also forwardingparsed data to a Splunk server.

Forwarding to non-Splunk systems

You can send raw data to a third-party system such as a syslog aggregator. You can combine thiswith data routing, sending some data to a non-Splunk system and other data to one or more Splunkservers.

Here, three forwarders are routing data to two Splunk servers and a non-Splunk system:

95Key set-up steps

Once you've determined your Splunk deployment topology and what sort of data forwarding isnecessary to implement it, the steps for setting up forwarding are simple:

1. Install the Splunk instances that will serve as forwarders and receivers. See the Installation Manualfor details.

2. Use Splunk Web or the CLI to enable receiving on the instances designated as receivers. See Setup receiving in this manual.

3. Use Splunk Web or the CLI to enable forwarding on the instances designated as forwarders. SeeSet up forwarding in this manual.

4. Specify data inputs for the Splunk forwarders in the usual manner. See Add data and configureinputs in this manual.

5. Perform any advanced configuration on each forwarder by editing the outputs.conf file. SeeConfigure forwarders with outputs.conf in this manual.

6. Test the results to confirm that forwarding, along with any configured behaviors like load balancingor routing, is occurring as expected.

In large environments with muliple forwarders, you might find it helpful to use the deployment serverto manage your forwarders. See Deploy to other Splunk instances in this manual.

Enable forwarding and receiving

Enable forwarding and receiving

To enable forwarding and receiving, you configure both a receiver and a forwarder. The receiver isthe Splunk instance receiving the data; the forwarder is the Splunk instance forwarding the data.Depending on your needs, you might have multiple receivers or forwarders.

You must first set up the receiver. You can then set up forwarder(s) to send data to that receiver.

96Important: The receiver must be running the same (or later) version of Splunk as its forwarder. A 4.0receiver can receive data from a 3.4 forwarder, but a 3.4 receiver cannot receive from a 4.0forwarder.

Set up receiving

You enable receiving in Splunk Web or through the Splunk CLI.

Set up receiving with Splunk Web

Use the Manager interface to set up a receiver:

1. Log into Splunk Web as admin on the server that will be receiving data from a forwarder.

2. Click the Manager link in the upper right corner.

3. Select Forwarding and receiving under System configurations.

4. Click Add new in the Receive data section.

5. Specify which TCP port you want the receiver to listen on. For example, if you enter "9997," thereceiver will receive data on port 9997. By convention, receivers listen on port 9997, but you canspecify any unused port. You can use a tool like netstat to determine what ports are available onyour system. Make sure the port you select is not in use by splunkweb or splunkd.

6. Click Save. You must restart Splunk to complete the process.

Set up receiving with Splunk CLI

To access the CLI, first navigate to $SPLUNK_HOME/bin/. This is unnecessary if you have addedSplunk to your path.

To enable receiving, enter:

./splunk enable listen <port> -auth <username>:<password>

For <port>, substitute the port you want the receiver to listen on.

To disable receiving, enter:

./splunk disable listen -port <port> -auth <username>:<password>

Searching data received from a forwarder running on a different operating system

In most cases, a Splunk instance receiving data from a forwarder on a different OS will need to installthe app for that OS. However, there are numerous subtleties that affect this; read on for the details.

Forwarding and indexing are OS-independent operations. Splunk supports any combination offorwarders and receivers, as long as each is running on a certified OS. For example, a Linux receivercan index data from a Windows forwarder.

97Once data has been forwarded and indexed, the next step is to search or perform otherknowledge-based activities on the data. At this point, the Splunk instance performing such activitiesmight need information about the OS whose data it is examining. You typically handle this byinstalling the app specific to that OS. For example, if you want a Linux instance to search OS-specificdata forwarded from Windows, you will ordinarily want to install the Windows app on the Linuxinstance.

If the data you're interested in is not OS-specific, such as web logs, then you do not need to installthe Splunk OS app.

In addition, if the receiver is only indexing the data, and an external search head is performing theactual searches, you do not need to install the OS app on the receiver, but you might need to install iton the search head. As an alternative, you can use a search head running the OS. For example, tosearch data forwarded from Windows to a Linux receiver, you can use a Windows search headpointing to the Linux indexer as a remote search peer. For more information on search heads, see"Set up distributed search".

Important: After you have downloaded the relevant OS app, remove its inputs.conf file beforeenabling it, to ensure that its default inputs are not added to your indexer. For the Windows app, thelocation is: %SPLUNK_HOME%\etc\apps\windows\default\inputs.conf.

In summary, you only need to install the app for the forwarder's OS on the receiver (or search head) ifit will be performing searches on the forwarded OS data.

Set up forwarding

You can use Splunk Web or the Splunk CLI as a quick way to enable forwarding in a Splunk instance.

You can also enable, as well as configure, forwarding by creating an outputs.conf file for theSplunk instance. Although setting up forwarders with outputs.conf requires a bit more initialknowledge, there are obvious advantages to performing all forwarder configurations in a singlelocation. Most advanced configuration options are available only through outputs.conf. Inaddition, if you will be enabling and configuring a number of forwarders, you can easily accomplishthis by editing a single outputs.conf file and making a copy for each forwarder. See the topicConfigure fowarders with outputs.conf for more information.

Note: By default, Splunk uses an Enterprise trial license when it is initially installed. When you enablea forwarder, you should also apply either the forwarder license or the free license to avoid anysubsequent license issues. Instructions on how to do this can be found here.

Set up regular forwarding with Splunk Web

Use the Manager interface to set up a forwarder. To set up a regular forwarder:

1. Log into Splunk Web as admin on the server that will be forwarding data.

2. Click the Manager link in the upper right corner.

3. Select Forwarding and receiving under System configurations.

984. Click Add new in the Forward data section.

5. Enter the hostname or IP address for the receiving Splunk instance, along with the port specifiedwhen the receiver was configured. For example, you might enter: receivingserver.com:9997.

6. Click Save. You must restart Splunk to complete the process.

You can use Splunk Web to perform one other configuration (for regular forwarders only). To store acopy of indexed data local to the forwarder:

1. From Forwarding and receiving, select Forwarding defaults.

2. Select Yes to store and maintain a local copy of the indexed data on the forwarder.

All other configuration must be done in outputs.conf.

Set up light forwarding with Splunk Web

To enable light forwarding, you must first enable regular forwarding on the Splunk instance. Then youseparately enable light fowarding. This procedure combines the two processes:

1. Log into Splunk Web as admin on the server that will be forwarding data.

2. Click the Manager link in the upper right corner.

3. Select Forwarding and receiving under System configurations.

4. Click Add new in the Forward data section.

5. Enter the hostname or IP address for the receiving Splunk instance, along with the port specifiedwhen the receiver was configured. For example, you might enter: receivingserver.com:9997.

6. Click Save.

7. Return to Manager>>Forwarding and receiving.

Important: When you enable a light forwarder, Splunk Web is immediately disabled. You will thenneed to use the Splunk CLI or outputs.conf to perform any further configuration on the forwarder.Therefore, if you want to use Splunk Web to configure your forwarder, do so before you enable lightforwarding.

Set up forwarding with the Splunk CLI

With the CLI, setting up forwarding is a two step process. First you enable forwarding on the Splunkinstance. Then you start forwarding to a specified receiver.

99To access the CLI, first navigate to $SPLUNK_HOME/bin/. This is unnecessary if you have addedSplunk to your path.

Important: Make sure you restart your Splunk instance as indicated by the CLI to take thesechanges into account.

Troubleshoot forwarding and receiving

Confusing the receiver's listening and management ports

As part of setting up a fowarder, you specify the receiver (hostname/IP_address and port) thatthe forwarder will send data to. When you do so, be sure to specify the port that was designated asthe receiver's listening port at the time the receiver was configured. See "Set up receiving withSplunk". Do not specify the receiver's management port. If you do mistakenly specify the receiver'smanagement port, the receiver will generate an error similar to this:

Closed receiving socket

If a receiving indexer's queues become full, it will close the receiving socket, to prevent additionalforwarders from connecting to it. If a forwarder with load-balancing enabled can no longer forward tothat receiver, it will send its data to another indexer on its list. If the forwarder does not employload-balancing, it will hold the data until the problem is resolved.

The receiving socket will reopen automatically when the queue gets unclogged.

Typically, a receiver gets behind on the dataflow because it can no longer write data due to a full diskor because it is itself attempting to forward data to another forwarder that is not accepting data.

The following warning message will appear in splunkd.log if the socket gets blocked:

Stopping all listening ports. Queues blocked for more than N seconds.

This message will appear when the socket reopens:

Started listening on tcp ports. Queues unblocked.

Answers

Have questions? Visit Splunk Answers and see what questions and answers the Splunk communityhas around configuring forwarding.

Configure forwarders with outputs.conf

Configure forwarders with outputs.conf

The outputs.conf file is unique to forwarders. It defines the forwarder configuration. Except for a fewbasic configurations available through Splunk Web or the CLI, all forwarder configuration takes placethrough outputs.conf. The topics describing various topologies, such as load balancing and datarouting, provide detailed examples on configuring outputs.conf.

Note: Although outputs.conf is the critical file for configuring forwarders, it specifically addressesthe outputs from the forwarder. To specify the inputs to a forwarder, you must configure the inputsseparately, as you would for any other Splunk instance. For details on configuring inputs, see Adddata and configure inputs in this manual.

Create and modify outputs.conf

There is no default outputs.conf file. When you enable a forwarder through Splunk Web or theCLI, Splunk creates an outputs.conf file in the directory of the currently running app. For example,if you're working in the search app, Splunk places the file in

101$SPLUNK_HOME/etc/apps/search/local/. You can then edit it there.

To enable and configure a forwarder without using Splunk Web or the CLI, create an outputs.conffile and place it in this directory: $SPLUNK_HOME/etc/system/local/.

A single forwarder can have multiple outputs.conf files (for instance, one located in an appsdirectory and another in /system/local). To understand how to manage multiple outputs.conffiles, see Configuration file precedence in this manual. No matter where the outputs.conf fileresides, it acts globally on the forwarder (bearing in mind the issue of location precedence, asdescribed in Configuration file precedence). For purposes of distribution and management simplicity,you might prefer to maintain just a single outputs.conf file, keeping it resident in the /system/localdirectory.

After making changes to outputs.conf, you must restart the forwarder for the changes to takeeffect.

See outputs.conf.spec and outputs.conf.example in

$SPLUNK_HOME/etc/system/README/ for guidance and a template to use when creating ormodifying outputs.conf.

Configuration levels

You can configure output processors at three levels of stanzas:

• Global. Here, you specify default target groups, as well as certain settings only configurable at the system-wide level for the output processor. • Target group. A target group defines settings for one or more receivers. There can be one or more target groups per output processor. Most configuration settings can be specified at the target group level. • Single server. You can specify configuration values for single servers (receivers) within a target group. This stanza type is optional.

Configurations at the more specific level take precedence. For example, if you specifycompressed=true for a single receiver, the forwarder will send that receiver compressed data,even if compressed is set to "false" for the receiver's target group.

Target groups

A target group allows you to configure where and how Splunk will send data. Target groups do notcontrol which events will be forwarded. For tcpout routing, events will be sent to all defined tcpouttarget groups by default, unless defaultGroup is set.

Here's the basic pattern for the target group stanza:

Available output processors are tcpout, syslog, and httpout.

102To specify a server in a target group, use the format <ipaddress_or_servername>:<port>. Forexample, myhost.Splunk.com:9997.

To perform load balancing, you specify a target group with multiple receivers.

To perform cloning, you specify multiple target groups.

Note: For syslog and other output types, you must explicitly specify routing as described here: Routeand filter data.

Set defaultGroup

You must include the defaultGroup attribute in your [tcpout] stanza:

[tcpout]defaultGroup= <group1>, <group2>, ...

The defaultGroup specifies one or more target groups, defined later in tcpout:<target_group>stanzas. The forwarder will send all events to the specified defaultGroups. You can use an asterisk(defaultGroup=*) to send events to all defined target groups.

If you do not want to forward data automatically, you can set "defaultGroup" to a non-existent targetgroup name (for example, "nothing").

Example

The following outputs.conf example contains three stanzas for sending tcpout to other Splunkreceivers:

• Global settings. In this example, there are two settings: one to specify a defaultGroup, and another to enable local indexing as well as forwarding. • Settings for a single target group consisting of two receivers. Here, we are specifying automatic load balancing between the two servers. See Set up load balancing in this manual for a detailed description of load balancing. We are also stipulating that the forwarder send the data in compressed form to the targeted receivers. • Settings for one receiver within the target group. This stanza turns off compression for this particular receiver. The server-specific value for "compressed" takes precedence over the value set at the target group level.

The outputs.conf file provides a large number of configuration options that offer considerablecontrol and flexibility in forwarding. Of the attributes available, several are of particular interest:

Attribute Default Value

Required. Specifies the server(s) that will function as receivers for theserver n/a forwarder. Configured at the target group level. This must be in the format <ipaddress_or_servername>:<port>. Required for [tcpout]. A comma-separated list of one or more target groups. Sends all events to all specified target groups. Set this toadefaultGroup n/a non-existent group name, if you dont' want events automatically forwarded to a target group. Configurable only at the global level. Specifies whether the stanza is disabled. If set to "true", it is equivalentdisabled false to the stanza not being there. Specifies whether data should be indexed and stored locally, as wellindexAndForward false as forwarded. It can be specified only at the global level. This setting is not available for light forwarders.sendCookedData true Specifies whether data is cooked before forwarding.compressed false Specifies whether the forwarder sends compressed data.maxQueueSize 1000 Specifies the maximum number of events queued on the forwarder.autoLB false Specifies load balancing.ssl.... n/a Set of attributes for configuring SSL.

The outputs.conf.spec file provides details, including the default settings, for these and all otherconfiguration options. In addition, most of these settings are discussed in topics dealing with specificforwarding scenarios.

HTTP forwarding

HTTP provides an alternative to TCP for forwarding data to a Splunk receiver. In certain situations,when dealing with firewalls, HTTP forwarding can ease network administrative issues. HTTP isavailable only for forwarding to a Splunk receiver, not for forwarding to third-party systems.

In outputs.conf, specify the httpoutput target group:

The target group stanza has these attributes:

Attribute Default Value

server n/a

104 Required. This must be in the format <ipaddress_or_servername>:<port>.ssl true Optional. Set to "true" or "false". If "true", HTTP output uses SSL.Consolidate data from multiple machinesConsolidate data from multiple machines

One of the most common forwarding use cases is to consolidate data produced across numerousmachines. Light forwarders running on machines generating data forward the data to a central Splunkindexer. Such forwarders ordinarily have little impact on their machines' performance. This diagramillustrates a common scenario, where light forwarders residing on machines running diverse operatingsystems send data to a single Splunk instance, which indexes and provides search capabilitiesacross all the data:

The diagram illustrates a small deployment. In practice, the number of light forwarders in a dataconsolidation use case could number upwards into the thousands.

This type of use case is simple to configure:

1. Determine what data, originating from which machines, you need to access.

2. Install a Splunk instance, typically on its own server. This instance will function as the receiver. Allindexing and searching will occur on it.

3. Enable the receiver through Splunk Web or the CLI. Using the CLI, enter this command from$SPLUNK_HOME/bin/:

./splunk enable listen <port> -auth <username>:<password>

For <port>, substitute the port you want the receiver to listen on.

4. If any of the forwarders will be running on a different operating system from the receiver, install theapp for the forwarder's OS on the receiver. For example, assume the receiver in the diagram above isrunning on a Linux box. In that case, you'll need to install the Windows app on the receiver. You don'tneed to install the *NIX app. Since the receiver is on Linux, that app was already installed along withthe rest of the Splunk instance.

105After you have downloaded the relevant app, remove its inputs.conf file before enabling it, toensure that its default inputs are not added to your indexer. For the Windows app, the location is:$SPLUNK_HOME/etc/apps/windows/default/inputs.conf.

5. Install a Splunk instance on each machine that will be generating data. These will become lightforwarders that forward the data to the receiver.

6. Set up inputs for each forwarder. See Add data and configure inputs in this manual.

7. Configure each forwarder through Splunk Web or the CLI. Using the CLI from$SPLUNK_HOME/bin/, first enable each Splunk instance as a light forwarder:

./splunk enable app SplunkLightForwarder -auth <username>:<password>

Next, begin forwarding to the designated receiver:

./splunk add forward-server <host>:<port> -auth <username>:<password>

For <host>:<port>, substitute the host and port number of the receiver. For example,splunk_indexer.acme.com:9995.

Alternatively, if you have many forwarders, you can use an outputs.conf file to specify thereceiver. For example:

[tcpout:my_indexers]server= splunk_indexer.acme.com:9995

You can create this file once, then distribute copies of it to the$SPLUNK_HOME/etc/system/local/ location of each forwarder.

Set up load balancing

Set up load balancing

In load balancing, a Splunk forwarder distributes data across several receiving Splunk instances.Each receiver gets a portion of the total data, and together the receivers hold all the data. To accessthe full set of forwarded data, you will need to set up distributed searching across all the receivers.For information on distributed search, see What is distributed search? in this manual.

Load balancing enables horizontal scaling for improved performance. In addition, its automaticswitchover capability ensures resiliency in the face of machine outages. If a machine goes down, theforwarder simply begins sending data to the next available receiver.

Load balancing can also be of use when monitoring data from network devices like routers. To handlesyslog and other data generated across port 514, a single forwarder can monitor port 514 anddistribute the incoming data across several Splunk indexers.

Splunk forwarders can perform two types of load balancing:

106 • Automatic load balancing: Forwarder routes data to different servers based on a specified time interval, for example, switching the data stream every 30 seconds, from server A to server B to server C and then back to server A. • Round-robin load balancing: Forwarder routes data to different servers, switching with each new event, for example, event 1 goes to server A, event 2 to server B, event 3 to server C, and event 4 back to server A.

For most purposes, automatic load balancing is recommended. It provides greater resiliency if aforwarder or receiver goes down. It also provides greater flexibility and easier configuration, becauseyou can combine it with a DNS list. Round-robin load balancing can result in somewhat more evenload balancing, because the forwarder switches receivers with each new event, but, in practice, anyadvantage diminishes at greater data volumes. In addition, round-robin requires that the forwarderperform parsing, consuming more RAM and CPU without improving the overall resiliency of thesystem. It is recommended only when you intend to distribute pre-indexing activity out to the edgenetwork.

This diagram shows a distributed search scenario, in which three light forwarders are performing loadbalancing across three receivers:

Targets for automatic load balancing

When configuring the set of target receivers, you can employ either DNS or static lists.

DNS lists provide greater flexibility and simplified scale-up, particularly for large deployments.Through DNS, you can change the set of receivers without needing to re-edit each forwarder'soutputs.conf file.

The main advantage of a static list is that it allows you to specify a different port for each receiver.This is useful if you need to perform load balancing across multiple receivers running on a singlehost. Each receiver can listen on a separate port.

107Static list target

To use a static list for the target, you simply specify each of the receivers in the target group's[tcpout] stanza in the forwarder's outputs.conf file. In this example, the target group consists ofthree receivers, specified by IP address and port number:

In your DNS server, create a DNS A record for each host's IP address, referencing the server nameyou specified in outputs.conf. For example:

splunkreceiver.mycompany.com A 10.10.10.1splunkreceiver.mycompany.com A 10.10.10.2splunkreceiver.mycompany.com A 10.10.10.3

The Splunk forwarder will use the DNS list to load balance, sending data in intervals, first to10.10.10.1, then to 10.10.10.2, then to 10.10.10.3, and then to 10.10.10.1 again. If a receiver is notavailable, the forwarder skips it and sends data to the next one on the list.

If you have a topology with many forwarders, the DNS list method allows you to update the set ofreceivers by making changes in just a single location, without touching the forwarders'outputs.conf files.

Configure automatic load balancing for horizontal scaling

To configure automatic load balancing, first determine your needs, particularly your horizontal scalingand failover requirements. Then develop a topology based on those needs, possibly includingmultiple forwarders as well as receivers and a search head to search across the receivers.

Assuming the topology of three forwarders and three receivers illustrated by the diagram at the startof this topic, set up automatic load balancing with these steps:

1. Install and enable a set of three Splunk instances as receivers. This example uses a DNS list todesignate the receivers, so they must all listen on the same port. For example, if the port is 9997,enable each receiver by going to its $SPLUNK_HOME/bin/ location and using this CLI command:

./splunk enable listen 9997 -auth <username>:<password>

1082. Install and enable the set of light forwarders. Once you've installed these Splunk instances, usethis CLI command on each of them to enable forwarding:

./splunk enable app SplunkLightForwarder -auth <username>:<password>

3. Set up a DNS list with an A record for each receiver's IP address:

splunkreceiver.mycompany.com A 10.10.10.1splunkreceiver.mycompany.com A 10.10.10.2splunkreceiver.mycompany.com A 10.10.10.3

4. Create a single outputs.conf file for use by all the forwarders. This one specifies the DNSserver name used in the DNS list and the port the receivers are listening on:

This outputs.conf file also uses the autoLB attribute to specify automatic (instead of round-robin)load balancing and the autoLBFrequency attribute to set a frequency of 40 seconds. Every 40seconds, the forwarders will switch to the next receiver. The default frequency, which rarely needschanging, is 30 seconds.

5. Distribute the outputs.conf file to all the forwarders, placing it in each forwarder's$SPLUNK_HOME/etc/system/local/ directory.

Specify automatic load balancing from the CLI

You can also use the CLI to specify automatic load balancing. You do this when you start forwardingactivity to a set of receivers, using this syntax:

./splunk add forward-server -method=autobalance indexer1:9991

Route and filter data

109Route and filter data

Forwarders can filter and route data to specific receivers based on criteria such as source,sourcetype, or patterns in the events themselves. For example, a forwarder can send all data fromone group of hosts to one Splunk server and all other data to a second Splunk server. A forwardercan also look inside the events and filter or route accordingly. For example, you might want to inspectWMI event codes to filter or route Windows events. This topic describes a number of typical routingscenarios.

Besides routing to receivers, forwarders can also filter and route data to specific queues or discardthe data altogether by routing to the null queue.

Only regular forwarders can route or filter data at the event level. Light forwarders do not have theability to inspect individual events.

Here's a simple illustration of a forwarder routing data to three Splunk receivers:

This topic describes how to route event data to Splunk instances. See Forward data to third-partysystems in this manual for information on routing to non-Splunk systems.

Configure routing

This is the basic pattern for defining most routing scenarios:

1. Determine what criteria to use for routing. How will you identify categories of events, and where willyou route them?

2. Edit props.conf to add a TRANSFORMS-routing attribute to determine routing based on event

metadata:

[<spec>]TRANSFORMS-routing=<transforms_stanza_name>

<spec> can be:

• <sourcetype>, the sourcetype of an event

• host::<host>, where <host> is the host for an event • source::<source>, where <source> is the source for an event

• <transforms_stanza_name> must match the name you defined in props.conf.

• Enter the regex rules in <routing_criteria> that determine which events get routed. This line is required. Use REGEX = . if you don't need additional filtering beyond the metadata specified in props.conf. • DEST_KEY should be set to _TCP_ROUTING to send events via TCP. It can also be set to _SYSLOG_ROUTING or _HTTPOUT_ROUTING for other output processors. • Set FORMAT to a <target_group> that matches the group name you defined in outputs.conf. A comma separated list will clone events to multiple target groups.

Examples later in this topic show how to use this syntax.

4. Edit outputs.conf to define the target group(s) for the routed data:

[tcpout:<target_group>]server=<ip>:<port>

Note:

• Set <target_group> to match the name you specified in transforms.conf.

• Set the IP address and port to match the receiving server.

The use cases described in this topic generally follow this pattern.

Filter and route event data to target groups

In this example, the forwarder filters three types of events, routing them to different target groups.The forwarder filters and routes according to these criteria:

• Events with a sourcetype of "syslog" to a load-balanced target group

• Events containing the word "error" to a second target group • All other events to a default target group

Here's how you do it:

1. Edit props.conf in $SPLUNK_HOME/etc/system/local to set two TRANSFORMS-routing

111attributes — one for syslog data and a default for all other data:

[default]TRANSFORMS-routing=errorRouting

[syslog]TRANSFORMS-routing=syslogRouting

2. Edit transforms.conf to set the routing rules for each routing transform:

[errorRouting]REGEX=errorDEST_KEY=_TCP_ROUTINGFORMAT=errorGroup

[syslogRouting]REGEX=.DEST_KEY=_TCP_ROUTINGFORMAT=syslogGroup

Note: In this example, if a syslog event contains the word "error", it will route to syslogGroup, noterrorGroup. This is due to the settings previously specified in props.conf. Those settingsdictated that all syslog events be filtered through the syslogRouting transform, while all non-syslog(default) events be filtered through the errorRouting transform. Therefore, only non-syslog eventsget inspected for errors.

3. Edit outputs.conf to define the target groups:

[tcpout]defaultGroup=everythingElseGroup

[tcpout:syslogGroup]server=10.1.1.197:9996, 10.1.1.198:9997

[tcpout:errorGroup]server=10.1.1.200:9999

[tcpout:everythingElseGroup]server=10.1.1.250:6666

syslogGroup and errorGroup receive events according to the rules specified in transforms.conf.All other events get routed to the default group, everythingElseGroup.

Replicate a subset of data to a third-party system

This example uses data filtering to route two data streams. It forwards:

• All the data, in cooked form, to a Splunk indexer (10.1.12.1:9997)

• A replicated subset of the data, in raw form, to a third-party server (10.1.12.2:1234)

The example sends both streams as TCP. To send the second stream as syslog data, first route thedata through an indexer.

For more information, see Forward data to third party systems in this manual.

Filter event data and send to queues

You can eliminate unwanted data by routing it to nullQueue, Splunk's /dev/null equivalent.When you filter out data in this way, the filtered data is not forwarded or added to the Splunk index atall, and doesn't count toward your indexing volume.

Although similar to forwarder-based routing, queue routing can be performed by either a forwarder ora full Splunk instance. It does not use the outputs.conf file, just props.conf andtransforms.conf.

Discard specific events and keep the rest

This example discards all sshd events in /var/log/messages by sending them to nullQueue:

1. In props.conf, set the TRANSFORMS-null attribute:

[source::/var/log/messages]TRANSFORMS-null= setnull

2. Create a corresponding stanza in transforms.conf. Set DEST_KEY to "queue" and FORMAT to

"nullQueue":

[setnull]

113REGEX = \[sshd\]DEST_KEY = queueFORMAT = nullQueue

That does it.

Keep specific events and discard the rest

Here's the opposite scenario. In this example, you use two transforms to keep only the sshd events.One transform routes sshd events to indexQueue, while another routes all other events tonullQueue.

Note: Null queue transforms are processed last, even when, as shown here, they appear first intransforms.conf.

1. In props.conf:

[source::/var/log/messages]TRANSFORMS-set= setnull,setparsing

2. In transforms.conf:

[setnull]REGEX = .DEST_KEY = queueFORMAT = nullQueue

[setparsing]REGEX = \[sshd\]DEST_KEY = queueFORMAT = indexQueue

Note: The order of the stanzas doesn't matter in this example. Splunk processes the default,nullQueue transform as the last step, after processing all other transforms.

Filter WMI events

To filter on WMI events, you must use the [wmi] sourcetype stanza in props.conf. The followingexample uses regex to filter out two Windows event codes, 592 and 593:

In props.conf:

[wmi]TRANSFORMS-wmi=wminull

In transforms.conf:

[wminull]REGEX=(?m)^EventCode=(592|593)DEST_KEY=queueFORMAT=nullQueue

114Filter data by target index

Splunk provides a forwardedindex filter that allows you to specify whether data gets forwarded,based on the data's target index. For example, if you have one data input targeted to "index1" andanother targeted to "index2", you can use the filter to forward only the data targeted to index1, whileignoring the index2 data. The forwardedindex filter uses whitelists and blacklists to specify thefiltering. For information on setting up multiple indexes, see the topic "Set up multiple indexes".

Use the forwardedindex.<n>.whitelist|blacklist attributes in outputs.conf to specify

which data should get forwarded on an index-by-index basis. You set the attributes to regexes thatfilter the target indexes. By default, the forwarder forwards data targeted for all external indexes, aswell as the data for the _audit internal index. It does not forward data to other internal indexes. Thedefault outputs.conf file specifies that behavior with these attributes:

Note: In previous releases you could achieve this result (internal index forwarding) by specifying the_TCP_ROUTING = * attribute/value in inputs.conf. This attribute/value pair no longer achievesthat result. If you wish to reinstate the 4.0.x simpler behavior, setforwardedindex.filter.disable = true in outputs.conf instead.

Clone dataClone data

In cloning, the forwarder sends duplicate copies of data to multiple target groups of receivers. Eachtarget group can be either a single receiving server or a load-balanced group of receivers.

Cloning has value for enabling a number of key use cases, such as:

• Providing data redundancy to promote data availability

• Geo-diverse dataset replication, for fast local access • Migration from one Splunk instance to another (not migration of past data)

115In this simple scenario, three forwarders send duplicate copies of data to two Splunk servers:

Enable data cloning

The most direct way to set up cloning is by editing outputs.conf. Simply create multiple target groups.Each target group will automatically receive all the forwarder's data. Here is an example of specifyingtwo target groups in a single outputs.conf file:

[tcpout]...

[tcpout:indexer1]server=10.1.1.197:9997

[tcpout:indexer2]server=10.1.1.200:9999

The forwarder will send duplicate data streams to the servers specified in both the indexer1 andindexer2 target groups.

Provide data redundancy

Data cloning provides a good solution for situations requiring data redundancy. You can use aforwarder to send all data to two or more target groups. If a server in one target group goes down,users can continue to search their data by switching to another target group.

Although the target groups can each consist of single Splunk receivers, the recommended approachis to set up target groups of multiple load-balanced receivers. That way, if a server within a targetgroup goes down while receiving data, the forwarder will automatically start forwarding data to thenext server in the group, ensuring that the target group in total still receives all forwarded data. Thisprovides a measure of protection by helping to ensure that two complete sets of the cloned data willexist in your system.

This example outputs.conf file configures a forwarder to clone raw data to two load-balancedtarget groups of indexers, with the indexing servers specified directly in the target groups. You canalso use DNS lists to specify the target group servers, as described in DNS list target in this manual.

The forwarder will send full data streams to both cloned_group1 and cloned_group2. The datawill be load-balanced within each group, rotating among receivers every 30 seconds (the defaultfrequency).

Specify cloning from the CLI

You can also use the CLI to specify cloning. You do this when you start forwarding activity to a set ofreceivers, using this syntax:

./splunk add forward-server <host>:<port> -method=clone

where <host>:<port> is the host and port number of the receiver.

This example sends cloned data to two receivers:

./splunk add forward-server -method=clone indexer1:9991

./splunk add forward-server -method=clone indexer2:9991

Forward data to third-party systems

Forward data to third-party systems

Splunk can forward raw data to non-Splunk systems. It can send the data over a plain TCP socket orpackaged in standard syslog. Because it is forwarding to a non-Splunk system, it can send only rawdata.

By editing props.conf and transforms.conf, you can configure the forwarder to route dataconditionally to third-party systems, in the same way that it routes data conditionally to other Splunkinstances. You can filter the data by host, source, or sourcetype. You can also use regex to furtherqualify the data.

TCP data

To forward TCP data to a third-party system, edit the forwarder's outputs.conf file to specify thereceiving server and port. You must also configure the receiving server to expect the incoming datastream on that port.

To filter the data first, edit the forwarder's props.conf and transforms.conf files as well.

117Edit the configuration files

To forward data, edit outputs.conf:

• Specify target groups for the receiving servers.

• Specify the IP address and TCP port for each receiving server. • Set sendCookedData to false, so that the forwarder sends raw data.

To filter the data, edit props.conf and transforms.conf:

• In props.conf, specify the host, source, or sourcetype of your data stream. Specify a transform to perform on the input. • In transforms.conf, define the transform and specify _TCP_ROUTING. You can also use regex to further filter the data.

Forward all data

This example shows how to send all the data from a Splunk forwarder to a third-party system. Sinceyou are sending all the data, you only need to edit outputs.conf:

[tcpout]indexAndForward = true

[tcpout:fastlane]server = 10.1.1.35:6996sendCookedData = false

Forward a subset of data

This example shows how to filter a subset of data and send the subset to a third-party system:

1. Edit props.conf and transforms.conf to specify the filtering criteria.

In props.conf, apply the bigmoney transform to all host names beginning with nyc:

[host::nyc*]TRANSFORMS-nyc = bigmoney

In transforms.conf, configure the bigmoney transform to specify TCP_ROUTING as the

DEST_KEY and the bigmoneyreader target group as the FORMAT:

[bigmoney]REGEX = .DEST_KEY=_TCP_ROUTINGFORMAT=bigmoneyreader

2. In outputs.conf, define the bigmoneyreader target group for the non-Splunk server, as wellas a default target group to receive any other data. If you want to forward only the data specificallyidentified in props.conf and transforms.conf, set defaultGroup=nothing:

The forwarder will send all data from host names beginning with nyc to the non-Splunk serverspecified in the bigmoneyreader target group. It will send data from all other hosts to the serverspecified in the default-clone-group-192_168_1_104_9997 target group.

Syslog data

You can configure a forwarder to send data in standard syslog format. The forwarder sends the datathrough a separate output processor. You can also filter the data with props.conf andtransforms.conf. You'll need to specify _SYSLOG_ROUTING as the DEST_KEY.

To forward syslog data, identify the third-party receiving server and specify it in a syslog targetgroup in the forwarder's outputs.conf file.

Forward syslog data

The forwarder sends RFC 3164 compliant events to a TCP/UDP-based server and port, making thepayload of any non-compliant data RFC 3164 compliant.

Note: If you have defined multiple event types for syslog data, the event type names must all includethe string 'syslog'.

In outputs.conf, specify the syslog target group:

[syslog:<target_group>]<attribute1> = <val1><attribute2> = <val2>...

The target group stanza requires this attribute:

Required Default Value Attribute This must be in the format <ipaddress_or_servername>:<port>. This is a combination of the IP address or servername of the syslog serverserver n/a and the port on which the syslog server is listening. Note that syslog servers use port 514 by default.

These attributes are optional:

Optional Default Value Attributetype udp The transport protocol. Must be set to "tcp" or "udp".

119 TCP priority. This must be in the format: <ddd>. This value will appear in the syslog header.priority 13 Compute <ddd> as (<facility> * 8) + <severity>. If facility is 4 (security/authorization messages) and severity is 2 (critical conditions), priority value will be: (4 * 8) + 2 = 34. This must be in the format sourcetype::syslog. The sourcetypesyslogSourceType n/a for syslog messages. The format used when adding a timestamp to the header. This musttimestampformat "" be in the format: <%b %e %H:%M:%S>. See Configure timestamps in this manual for details.

Send a subset of data to a syslog server

This example shows how to configure Splunk to forward data from hosts whose names begin with"nyc" to a syslog server named "loghost.example.com" over port 514:

1. Edit props.conf and transforms.conf to specify the filtering criteria.

In props.conf, apply the send_to_syslog transform to all host names beginning with nyc:

[host::nyc*]TRANSFORMS-nyc = send_to_syslog

In transforms.conf, configure the send_to_syslog transform to specify _SYSLOG_ROUTING as

the DEST_KEY and the my_syslog_group target group as the FORMAT:

[send_to_syslog]DEST_KEY = _SYSLOG_ROUTINGFORMAT = my_syslog_group

2. In outputs.conf, define the my_syslog_group target group for the non-Splunk server:

[syslog:my_syslog_group]server = loghost.example.com:514

Encrypt and authenticate data with SSL

Encrypt and authenticate data with SSL

The communication between forwarder and receiver can use SSL authentication and encryption, orjust SSL encryption.

To enable SSL, edit each forwarder's outputs.conf file and each receiver's inputs.conf file.

120Enable SSL on the forwarder

You enable SSL in the forwarder's outputs.conf. If you are using SSL just for encryption, you canset SSL attributes at any stanza level: default (tcpout), target group, or server. If you are also usingSSL for authentication, you must specify SSL attributes at the server level. Each receiving serverneeds a stanza that specifies its certificate names.

SSL attributes

This table describes the set of SSL attributes:

Attribute ValuesslCertPath Full path to client certificate file.sslPassword Password for the certificate. Default is "password".sslRootCAPath Path to root certificate authority file. Set to true or false. Default is "false", which enables SSL for encryption only. If set to "true", the forwarder will determine whether the receivingsslVerifyServerCert server is authenticated, checking sslCommonNametoCheck and altCommonNametoCheck for a match. If neither matches, authentication fails. Server's common name. Set only if sslVerfyServerCert is "true".sslCommonNameToCheck The forwarder checks the common name of the server's certificate against this value. Server's alternate name. Set only if sslVerfyServerCert is "true".altCommonNameToCheck The forwarder checks the alternate name of the server's certificate against this value.

Set SSL for encryption only

Add attribute/value pairs at the appropriate stanza level. Here, the attributes are specified at thetcpout level, so they set the SSL defaults for this forwarder:

You need to create a stanza for each receiver that the forwarder authenticates with.

Enable SSL on the receiver

You enable SSL in the receiver's inputs.conf. This involves two steps:

• Add an [SSL] stanza.

• Add listener stanzas for each port listening for SSL data.

Configure the [SSL] stanza

This table describes the attributes for the [SSL] stanza:

Attribute ValueserverCert Full path to server certificate file.password Password for the certificate, if any. If no password, leave blank or unset.RootCA Path to the root certificate authority file.dhfile Path to the dhfile.pem file. Optional. Set to true or false. Default is "false". If set to "true", the receiver will require arequireClientCert valid certificate from the client to complete the connection.

Edit the receiver's inputs.conf:

More about forwarders

More about forwarders

Certain capabilities are disabled in forwarders and light forwarders. This section describes forwardercapabilities in detail.

Splunk forwarder details

All functions and modules of the Splunk regular forwarder are enabled by default, with the exceptionof the distributed search module. The file$SPLUNK_HOME/etc/apps/SplunkForwarder/default/default-mode.conf includes thisstanza:

[pipeline:distributedSearch]disabled = true

For a detailed view of the exact configuration, see the configuration files for the SplunkForwarderapplication in $SPLUNK_HOME/etc/apps/SplunkForwarder/default.

123Splunk light forwarder details

Most features of Splunk are disabled in the Splunk light forwarder. Specifically, the Splunk lightforwarder:

• Disables event signing and checking whether the disk is full

($SPLUNK_HOME/etc/apps/SplunkLightForwarder/default/default-mode.conf). • Limits internal data inputs to splunkd and metrics logs only, and makes sure these are forwarded ($SPLUNK_HOME/etc/apps/SplunkLightForwarder/default/inputs.conf). • Disables all indexing ($SPLUNK_HOME/etc/apps/SplunkLightForwarder/default/indexes.conf). • Does not use transforms.conf and does not fully parse incoming data, but the CHARSET, CHECK_FOR_HEADER, NO_BINARY_CHECK, PREFIX_SOURCETYPE, and sourcetype properties from props.conf are used. • Disables the Splunk Web interface ($SPLUNK_HOME/etc/apps/SplunkLightForwarder/default/web.conf ). • Limits throughput to 256KBps ($SPLUNK_HOME/etc/apps/SplunkLightForwarder/default/limits.conf). • Disables the following modules in $SPLUNK_HOME/etc/apps/SplunkLightForwarder/default/default-mode.conf:

Events are records of activity in log files, stored in Splunk indexes. They are primarily what Splunkindexes. Events provide information about the systems that produce the log files. The term eventdata refers to the contents of a Splunk index.

Here's a sample event:

172.26.34.223 - - [01/Jul/2005:12:05:27 -0700] "GET

/trade/app?action=logout HTTP/1.1" 200 2953

When Splunk indexes events, it:

• Configures character set encoding.

• Configures linebreaking for multi-line events. • Identifies event timestamps (and applies timestamps to events if they do not exist). • Extracts a set of useful standard fields such as host, source, and sourcetype. • Improves data compression with segmentation. • Dynamically assigns metadata to events, if specified. • Anonymizes data if specified through sed or through configuration files.

For an overview of the Splunk indexing process, see the Indexing with Splunk chapter of this manual.

Configure character set encoding

Configure character set encoding

Splunk allows you to configure character set encoding for your data sources. Splunk has built-incharacter set specifications to support internationalization of your Splunk deployment. Splunksupports 71 languages (including 20 that aren't UTF-8 encoded). You can retrieve a list of Splunk'svalid character encoding specifications by using the iconv -l command on most *nix systems.

Splunk attempts to apply UTF-8 encoding to your sources by default. If a source doesn't use UTF-8encoding or is a non-ASCII file, Splunk will try to convert data from the source to UTF-8 encodingunless you specify a character set to use by setting the CHARSET key in props.conf.

Note: If a source's character set encoding is valid, but some characters from the specification are notvalid in the encoding you specify, Splunk escapes the invalid characters as hex values (for example:"\xF3").

Manually specify a character set to apply to an input by setting the CHARSET key in props.conf:

[spec]CHARSET=<string>

For example, if you have a host that is generating data in Greek (called "GreekSource" in thisexample) and that uses ISO-8859-7 encoding, set CHARSET=ISO-8859-7 for that host inprops.conf:

[host::GreekSource]CHARSET=ISO-8859-7

Note: Splunk will only parse character encodings that have UTF-8 mappings. Some EUC-JPcharacters do not have a mapped UTF-8 encoding.

Automatically specify a character set

Splunk can automatically detect languages and proper character sets using its sophisticatedcharacter set encoding algorithm.

Configure Splunk to automatically detect the proper language and character set encoding for aparticular input by setting CHARSET=AUTO for the input in props.conf. For example, if you wantSplunk to automatically detect character set encoding for the host "my-foreign-docs", setCHARSET=AUTO for that host in props.conf:

[host::my-foreign-docs]CHARSET=AUTO

127If Splunk doesn't recognize a character set

If you want to use an encoding that Splunk doesn't recognize, train Splunk to recognize the characterset by adding a sample file to the following directory:

$SPLUNK_HOME/etc/ngram-models/_<language>-<encoding>.txt

Once you add the character set specification file, you must restart Splunk. After you restart, Splunkcan recognize sources that use the new character set, and will automatically convert them to UTF-8format at index time.

For example, if you want to use the "vulcan-ISO-12345" character set, copy the specification file tothe following path:

/SPLUNK_HOME/etc/ngram-models/_vulcan-ISO-12345.txt

Configure linebreaking for multi-line events

Configure linebreaking for multi-line events

Overview of multi-line events and event linebreaking

Some events are made up of more than one line. Splunk handles most of these kinds of eventscorrectly by default, but there are cases of multi-line events that Splunk doesn't recognize properly bydefault. These require special configuration to change Slunk's default linebreaking behavior.

Multi-line event linebreaking and segmentation limitations

Splunk does apply limitations to extremely large events when it comes to linebreaking andsegmentation:

• Lines over 10,000 bytes: Splunk breaks lines over 10,000 bytes into multiple lines of 10,000 bytes each when it indexes them. It appends the field meta::truncated to the end of each truncated section. However, Splunk still groups these lines into a single event. • Segmentation for events over 100,000 bytes: Splunk only displays the first 100,000 bytes of an event in the search results. Segments after those first 100,000 bytes of a very long line are still searchable, however. • Segmentation for events over 1,000 segments: Splunk displays the first 1,000 individual segments of an event as segments separated by whitespace and highlighted on mouseover. It displays the rest of the event as raw text without interactive formatting.

Configuration

Many event logs have a strict one-line-per-event format, but some do not. Usually, Splunk canautomatically figure out the event boundaries. However, if event boundary recognition is not workingas desired, you can set custom rules by configuring props.conf.

To configure multi-line events, examine the format of the events. Determine a pattern in the events toset as the start or end of an event. Then, edit $SPLUNK_HOME/etc/system/local/props.conf,and set the necessary attributes for your data handling.

128There are two ways to handle multi-line events:

• Break the event stream into real events. This is recommended, as it increases indexing speed significantly. Use LINE_BREAKER (see below). • Break the event stream into lines and reassemble. This is slower but affords more robust configuration options. Use any linebreaking attribute besides LINE_BREAKER (see below).

Linebreaking general attributes

These are the props.conf attributes that affect linebreaking:

TRUNCATE = <non-negative integer>

• Change the default maximum line length (in bytes).

• Set to 0 if you never want truncation (very long lines are, however, often a sign of garbage data). • Defaults to 10000 bytes.

LINE_BREAKER = <regular expression>

• If not set, the raw stream will be broken into an event for each line delimited by \r or \n. • If set, the given regex will be used to break the raw stream into events. • The regex must contain a matching group. • Wherever the regex matches, the start of the first matched group is considered the first text NOT in the previous event. • The end of the first matched group is considered the end of the delimiter and the next character is considered the beginning of the next event. • For example, "LINE_BREAKER = ([\r\n]+)" is equivalent to the default rule. • The contents of the first matching group will not occur in either the previous or next events. • Note: There is a significant speed boost by using the LINE_BREAKER to delimit multi-line events rather than using line merging to reassemble individual lines into events.

LINE_BREAKER_LOOKBEHIND = <integer> (100)

• Change the default lookbehind for the regex based linebreaker.

• When there is leftover data from a previous raw chunk, this is how far before the end of the raw chunk (with the next chunk concatenated) Splunk begins applying the regex.

SHOULD_LINEMERGE = <true/false>

• When set to true, Splunk combines several input lines into a single event, with configuration based on the attributes described below. • Defaults to true.

Attributes available only when SHOULD_LINEMERGE = true

When SHOULD_LINEMERGE is set to true, these additional attributes have meaning:

AUTO_LINEMERGE = <true/false>

129 • Directs Splunk to use automatic learning methods to determine where to break lines in events. • Defaults to true.

BREAK_ONLY_BEFORE_DATE = <true/false>

• When set to true, Splunk will create a new event if and only if it encounters a new line with a date. • Defaults to false.

BREAK_ONLY_BEFORE = <regular expression>

• When set, Splunk will create a new event if and only if it encounters a new line that matches the regular expression. • Defaults to empty.

MUST_BREAK_AFTER = <regular expression>

• When set, and the regular expression matches the current line, Splunk always creates a new event for the next input line. • Splunk may still break before the current line if another rule matches. • Defaults to empty.

MUST_NOT_BREAK_AFTER = <regular expression>

• When set and the current line matches the regular expression, Splunk will not break on any subsequent lines until the MUST_BREAK_AFTER expression matches. • Defaults to empty.

MUST_NOT_BREAK_BEFORE = <regular expression>

• When set and the current line matches the regular expression, Splunk will not break the last event before the current line. • Defaults to empty.

MAX_EVENTS = <integer>

• Specifies the maximum number of input lines that will be added to any event. • Splunk will break after the specified number of lines are read. • Defaults to 256.

Examples

Specify event breaks

[my_custom_sourcetype]BREAK_ONLY_BEFORE = ^\d+\s*$

This example instructs Splunk to divide events in a file or stream by presuming any line that consistsof all digits is the start of a new event, for any source whose source type was configured ordetermined by Splunk to be sourcetype::my_custom_sourcetype .

130Merge multiple lines into a single event

The following log event contains several lines that are part of the same request. The differentiatorbetween requests is "Path". For this example, assume that all these lines need to be shown as asingle event entry.

This code tells Splunk to merge the lines of the event, and only break before the term Path=.

Answers

Have questions? Visit Splunk Answers and see what questions and answers the Splunk communityhas around multi-line event processing.

Handle event timestamps

Handle event timestamps

Look carefully at this sample event:

172.26.34.223 - - [01/Jul/2005:12:05:27 -0700] "GET

/trade/app?action=logout HTTP/1.1" 200 2953

Notice the time information in the event: [01/Jul/2005:12:05:27 -0700]. This is what is knownas a timestamp. Splunk uses timestamps to correlate events by time, create the histogram in SplunkWeb, and set time ranges for searches. Most events contain timestamps, and in those cases wherean event doesn't contain timestamp information, Splunk attempts to assign a timestamp value to theevent at index time.

Most events do not require additional handling of timestamp formatting, but there are situations thatrequire the involvement of a Splunk administrator to help set things right. In the case of some sourcesand distributed deployments, for example, the Splunk admin may have to reconfigure timestamprecognition and formatting. Other timestamp-handling activities that the admin might undertake

131include:

• Configuration of timestamp extraction for events with multiple timestamps

For more information about timestamps, see the Configure event timestamping chapter of thismanual.

Extract default fields automatically

Extract default fields automatically

When Splunk indexes event data, it extracts by default a set of fields that are common to mostevents, and which are commonly used in Splunk searches and reports. These default fields include:

• host: Identifies the originating hostname or IP address of the network device that generated the event. Used to narrow searches to events that have their origins in a specific host. • source: Identifies the filename or pathname from which the event was indexed. Used to filter events during a search, or as an argument in a data-processing command. • sourcetype: Identifies the type of application, network, or device data that the event represents, such as access_log or syslog. A Splunk administrator can predefine source types, or they can be generated automatically by Splunk at index time. Use sourcetype to filter events during a search, or as an argument in a data-processing command.

For a full listing of the default fields that Splunk identifies during the indexing process, and examplesof how they can be used in a search, see "Use default fields" in the User manual.

For detailed information on default field extraction, see "About default fields" in this manual.

Improve data compression with segmentation

Improve data compression with segmentation

Segmentation is what Splunk uses to break events up into searchable segments at index time, andagain at search time. Segments can be classified as major or minor. To put it simply, minorsegments are breaks within major segments. For example, the IP address 172.26.34.223 is, as awhole, a major segment. But this major segment can be broken down into minor segments such as172 as well as groups of minor segments like 172.26.34.

Splunk enables a Splunk admin to define how detailed the event segmentation should be. This isimportant because index-time segmentation affects indexing and search speed, impacts diskcompression, and affects your ability to use typeahead functionality. Search-time segmentation, onthe other hand, can also affect search speed as well as your ability to create searches by selectingitems from the results displayed in Splunk Web.

Index-time segmentation is set through segmenters.conf, while search-time segmentation is set in

the search results page in Splunk Web, as described here.

132For more information about "index time" and "search time," see Index time versus search time, in thismanual.

Levels of event segmentation

There are three levels of segmentation that the Splunk admin can choose from for index time andsearch time:

• Inner segmentation breaks events down into the smallest minor segments possible. For example, when an IP address such as 172.26.34.223 goes through inner segmentation, it is broken down into 172, 26, 34, and 223. Setting inner segmentation at index time leads to very efficient indexes in terms of search speed, but it also impacts indexing speed and restricts the typeahead functionality (it will only be able to typeahead at the minor segment level). • Outer segmentation is the opposite of inner segmentation. Under outer segmentation only major segments are indexed. In the previous example, the IP address would not be broken down into any components. If you have outer segmentation set at index time you will be unable to search on individual pieces of the IP address without using wildcard characters. Indexes created using outer segmentation tend to be marginally more efficient than those created with full segmentation, but are not quite as efficient as those created through inner segmentation. • Full segmentation is in some respects a combination of inner and outer segmentation. Under full segmentation, the IP address is indexed both as a major segment and as a variety of minor segments, including minor segment combinations like 172.26 and 172.26.34. This is the least efficient indexing option, but it provides the most versatility in terms of searching.

Note: By default, index-time segmentation is set to a combination of inner and outer segmentation,and search-time segmentation is set to full segmentation.

For more information about changing the segmentation level, see Configure segmentation to managedisk usage in this manual.

A Splunk admin can define index time and search time segmentation rules that apply specifically toevents with particular hosts, sources, or sourcetypes. If you run searches that involve a particularsourcetype on a regular basis, you could use this to improve the performance of those searches.Similarly, if you typically index a large number of syslog events, you could use this feature to helpdecrease the overall disk space that those events take up.

For details about how to set these special segmentation rules up, see Configure customsegmentation for a host, source, or source type in this manual.

Assign metadata to events dynamically

Assign metadata to events dynamically

Ths feature allows you to dynamically assign metadata to files as they are being consumed bySplunk. Use this feature to specify source type, host, or other metadata dynamically for for incomingdata. This feature is useful mainly with scripted data -- either a scripted input or a pre-existing fileprocessed by a script.

133Important: Splunk does not recommend using dynamic metadata assignment with ongoingmonitoring (tail) inputs. For more information about file inputs, refer to Monitor files and directories inthis manual.

To use this feature, you append a single dynamic input header to your file and specify the metadatafields you want to assign values to. The metadata fields most likely to be of interest are sourcetype,host, and source. You can see the list of all available pipeline metadata fields intransforms.conf.spec.

You can use this method to assign metadata instead of editing inputs.conf, props.conf andtransforms.conf.

Configure a single input file

To use this feature for an existing input file, edit the file (either manually or with a script) to add asingle input header:

***SPLUNK*** <metadata field>=<string> <metadata field>=<string> ...

• Set <metadata field>=<string> to a valid metadata/value pair. You can specify mutiple pairs. For example, sourcetype=log4j host=swan. • Add the single header anywhere in your file. Any data following the header will be appended with the attributes and values you assign until the end of the file is reached. • Add your file to $SPLUNK_HOME/var/spool/splunk or any other directory being monitored by Splunk.

Configure with a script

In the more common scenario, you write a script to dynamically add an input header to your incomingdata stream. Your script can also set the header dynamically based on the contents of the input file.

Anonymize data with sed

Anonymize data with sed

This utility allows you to anonymize your data by replacing or substituting strings in it at index timeusing a sed script.

Most UNIX users are familiar with sed, a Unix utility which reads a file and modifies the input asspecified by a list of commands. Now, you can use sed-like syntax to anonymize your data fromprops.conf.

Note: Edit or create a copy of props.conf in $SPLUNK_HOME/etc/system/local.

Define the sed script in props.conf

In a props.conf stanza, use SEDCMD to indicate a sed script:

[<stanza_name>]SEDCMD-<class> = <sed script>

134The stanza_name is restricted to the host, source, or sourcetype that you want to modify withyour anonymization or transform.

The sed script applies only to the _raw field at index time. Splunk currently supports the followingsubset of sed commands: replace (s) and character substitution (y).

Note: You need to restart Splunk to implement the changes you made to props.conf

Replace strings with regex match

The syntax for a sed replace is:

SEDCMD-<class> = s/<regex>/<replacement>/flags

• regex is a PERL regular expression.

• replacement is a string to replace the regex match and uses "\n" for back-references, where n is a single digit. • flags can be either: "g" to replace all matches or a number to replace a specified match.

Example

Let's say you want to index data containing social security numbers and credit card numbers. Atindex time, you want to mask these values so that only the last four digits are evident in your events.Your props.conf stanza may look like this:

Now, in you accounts events, social security numbers appear as ssn=xxxxx6789 and credit cardnumbers will appear as cc=xxxx-xxxx-xxxx-xxxx-1234.

Substitute characters

The syntax for a sed character substitution is:

SEDCMD-<class> = y/<string1>/<string2>/

which substitutes each occurrence of the characters in string1 with the characters in string2.

Example

Let's say you have a file you want to index, abc.log, and you want to substitute the capital letters"A", "B", and "C" for every lowercase "a", "b", or "c" in your events. Add the following to yourprops.conf:

[source::.../abc.log]SEDCMD-abc = y/abc/ABC/

Now, if you search for source="*/abc.log", you should not find the lowercase letters "a", "b", and"c" in your data at all. Splunk substituted "A" for each "a", "B" for each "b", and "C" for each "c'.

You may want to mask sensitive personal data that goes into logs. Credit card numbers and socialsecurity numbers are two examples of data that you may not want to index in Splunk. This pageshows how to mask part of confidential fields so that privacy is protected but there is enough of thedata remaining to be able to use it to trace events.

This example masks all but the last four characters of fields SessionId and Ticket number in anapplication server log.

Splunk uses timestamps to correlate events by time, create the timeline histogram in Splunk Web andto set time ranges for searches. Timestamps are assigned to events at index time. Most events get atimestamp value assigned to them based on information in the raw event data. If an event doesn'tcontain timestamp information, Splunk attempts to assign a timestamp value to the event as it'sindexed. Splunk stores timestamp values in the _time field (in UTC time format).

Timestamp processing is one of the key steps in event processing. For more information on eventprocessing, see Configure event processing.

Considerations when adding new data

If your data turns out to require timestamp configuration beyond what Splunk does automatically, youmust re-index that data once you've configured its timestamp extraction. It's a good idea to test a newdata input in a "sandbox" Splunk instance (or just a separate index) before adding it to yourproduction Splunk instance in case you have to clean it out and re-index it a few times to get it justright.

Precedence rules for timestamp assignment

Splunk uses the following precedence to assign timestamps to events:

1. Look for a time or date in the event itself using an explicit TIME_FORMAT if provided.

Use positional timestamp extraction for events that have more than one timestamp value in the rawdata.

2. If no TIME_FORMAT is provided, or no match is found, attempt to automatically identify a time or

date in the event itself.

Use positional timestamp extraction for events that have more than one timestamp value in the rawdata.

3. If an event doesn't have a time or date, use the timestamp from the most recent previous event ofthe same source.

4. If no events in a source have a date, look in the source (or file) name (Must have time in theevent).

5. For file sources, if no time or date can be identified in the file name, use the modification time onthe file.

6. If no other timestamp is found, set the timestamp to the current system time (at the event's indextime).

138Configure timestamps

Most events don't require any special timestamp handling. For some sources and distributeddeployments, you may have to configure timestamp formatting to extract timestamps from events.Configure Splunk's timestamp extraction processor by editing props.conf. For a complete discussionof the timestamp configurations available in props.conf, see Configure timestamp recognition.

You can also configure Splunk's timestamp extraction processor to:

• Apply timezone offsets.

• Pull the correct timestamp from events with more than one timestamp. • Improve indexing performance.

Finally, train Splunk to recognize new timestamp formats.

Configure timestamp recognition

Configure timestamp recognition

Splunk uses timestamps to correlate events by time, create the histogram in Splunk Web and to settime ranges for searches. Timestamps are assigned to events at index time.

Splunk assigns a timestamp to most events based on information in the raw event data. If an eventdoesn't contain timestamp information, Splunk attempts to assign a timestamp value to the event asit's indexed. Splunk stores timestamp values in the _time field (in UTC time format).

Most events don't require any special timestamp handling; you can just let Splunk handle it withoutany configuration.

Precedence rules for timestamp assignment

Splunk uses the following precedence to assign timestamps to events:

1. Look for a time or date in the event itself using an explicit TIME_FORMAT if provided.

Use positional timestamp extraction for events that have more than one timestamp value in the rawdata.

2. If no TIME_FORMAT is provided, or no match is found, attempt to automatically identify a time or

date in the event itself.

3. If an event doesn't have a time or date, use the timestamp from the most recent previous event ofthe same source.

4. If no events in a source have a time or date, look in the source (or file) name.

5. For file sources, if no time or date can be identified in the file name, use the modification time onthe file.

1396. If no other timestamp is found, set the timestamp to the current system time (the time at which theevent is indexed by Splunk).

Configure timestamps

Most events don't require any special timestamp handling; you can just let Splunk handle it withoutany configuration.

For some sources and distributed deployments, you may have to configure timestamp formatting toextract timestamps from events. Configure Splunk's timestamp extraction processor by editingprops.conf.

Configure how Splunk recognizes timestamps by editing props.conf. Splunk uses strptime()formatting to identify timestamp values in your events. Specify what Splunk recognizes as atimestamp by setting a strptime() format in the TIME_FORMAT= key.

Note: If your event has more than one timestamp, set Splunk to recognize the correct timestamp withpositional timestamp extraction.

Use $SPLUNK_HOME/etc/system/README/props.conf.example as an example, or create

your own props.conf. Make any configuration changes to a copy of props.conf in$SPLUNK_HOME/etc/system/local/, or your own custom application directory in$SPLUNK_HOME/etc/apps/.

Configure any of the following attributes in props.conf to set Splunk's timestamp recognition. Referto $SPLUNK_HOME/etc/system/README/props.conf.spec for full specification of the keys.

• <spec> indicates what to apply timestamp extraction to. This can be one of the following: ♦ <sourcetype>, the sourcetype of an event. ♦ host::<host>, where <host> is the host of an event. ♦ source::<source>, where <source> is the source of an event. • If an event contains data that matches the value of <spec>, then the timestamp rules specified in the stanza apply to that event. • Add additional stanzas to customize timestamp recognition for any type of event.

DATETIME_CONFIG = <filename relative to $SPLUNK_HOME>

140 • To use a custom datetime.xml, specify the correct path to your custom file in all keys that refer to datetime.xml. • Set DATETIME_CONFIG = NONE to prevent the timestamp processor from running. • Set DATETIME_CONFIG = CURRENT to assign the current system time to each event as it's indexed.

MAX_TIMESTAMP_LOOKAHEAD = <integer>

• Specify how far (how many characters) into an event Splunk should look for a timestamp. • Default is 150 characters. • Set to 0 to assign current system time at an event's index time.

TIME_PREFIX = <regular expression>

• Use a regular expression that points to the space exactly before your event's timestamp. ♦ For example, if your timestamp follows the phrase Time=, your regular expression should match this part of the event. • The timestamp processor only looks for a timestamp after the TIME_PREFIX in an event. • Default is none (empty).

TIME_FORMAT = <strptime-style format>

• Specify a strptime() format string to extract the date.

• Set strptime() values in the order that matches the order of the elements in the timestamp you want to extract. • Splunk's timestamp processor starts processing TIME_FORMAT immediately after a matching TIME_PREFIX value. • Doesn't support in-event timezones. • TIME_FORMAT starts reading after a matching TIME_PREFIX. • The <strptime-style format> value must contain the hour, minute, month, and day. • Default is empty.

TZ = <timezone string>

• Specify a timezone setting using a value from the zoneinfo TZID database. • For more details and examples learn how to configure timezone offsets. • Default is empty.

MAX_DAYS_AGO = <integer>

• Specify the maximum number of days in the past (from the current date) for an extracted date to be valid. • For example, if MAX_DAYS_AGO = 10 then dates that are older than 10 days ago are ignored. • Default is 2000.

Note: You must configure this setting if your data is more than 2000 days old.

MAX_DAYS_HENCE = <integer>

141 • Specify the maximum number of days in the future (from the current date) for an extracted date to be valid. • For example, if MAX_DAYS_HENCE = 3 then dates that are more than 3 days in the future are ignored. • The default value (2) allows dates that are tomorrow.

Note: If your machines have the wrong date set or are in a timezone that is one day ahead, set thisvalue to at least 3.

Enhanced strptime() support

Configure timestamp parsing in props.conf with the TIME_FORMAT= key. Splunk implements anenhanced version of Unix strptime() that supports additional formats (allowing for microsecond,millisecond, any time width format, and some additional time formats for compatibility). See the tablebelow for a list of the additionally supported strptime() formats.

In previous versions, Splunk parsed timestamps using only the standard Linux strptime() conversionspecifications. Now, in addition to standard Unix strptime() formats, Splunk's strptime()implementation supports recognition of the following date-time formats:

For GNU date-time nanoseconds. Specify any sub-second parsing by providing the %N width: %3N = milliseconds, %6N = microseconds, %9N = nanoseconds. For milliseconds, microseconds for Apache Tomcat. %Q and %q can format any time%Q,%q resolution if the width is specified. For hours on a 12-hour clock format. If %I appears after %S or %s (like%I "%H:%M:%S.%l") it takes on the log4cpp meaning of milliseconds.%+ For standard UNIX date format timestamps.%v For BSD and OSX standard date format.%z, %::z, %:::z GNU libc support.%o For AIX timestamp support (%o used as an alias for %Y).%p The locale's equivalent of AM or PM. (Note: there may be none.)strptime() format expression examples

Here are some sample date formats with the strptime() expressions that handle them:

This configuration assumes that all timestamps from host::foo are in the same format. Configureyour props.conf stanza to be as granular as possible to avoid potential timestamping errors.

Configure timestamps in other ways

You can also configure Splunk's timestamp extraction processor to:

• Apply timezone offsets.

• Pull the correct timestamp from events with more than one timestamp. • Improve indexing performance. • Train Splunk to recognize new timestamp formats.

In addition, you can use your browser's locale setting to configure how the browser formats Splunktimestamps. For information on the setting the browser locale, see User language and locale.

Answers

Have questions? Visit Splunk Answers and see what questions and answers the Splunk communityhas around timestamp recognition and configuration.

Improve Splunk's ability to recognize timestamps

Improve Splunk's ability to recognize timestamps

Splunk recognizes most timestamps by default. For more information read How timestamps work. IfSplunk doesn't recognize a particular timestamp, you can use the train dates command to teachSplunk the pattern. The output of train dates is a regular expression that you can add todatetime.xml and props.conf to configure the unique timestamp extraction.

143The train command lets you interactively teach Splunk new patterns for timestamps, fields, andsourcetypes. for more information about train and the different arguments you can use with it, go to$SPLUNK_HOME/bin and refer to the train help page:

./splunk help train

Important: Use train dates only when you can't configure the timestamp with props.conf.

Steps to configure timestamps with train dates

To teach Splunk a new timestamp pattern, complete the following steps:

1. Copy a sampling of your timestamp data into a plain text file.

Splunk learns the pattern of the timestamp based on the patterns in this text file.

2. Run the train dates command.

This feature is interactive. When prompted, provide the path to the text file containing your timestampdata. The command produces a regular expression for your timestamp.

3. Create a custom datetime.xml.

Copy the output of the train command into a copy of datetime.xml file.

Note: The default datetime.xml file is located in $SPLUNK_HOME/etc/datetime.xml. Do not

modify this file; instead, copy the default datetime.xml into a custom application directory in$SPLUNK_HOME/etc/apps/ or $SPLUNK_HOME/etc/system/local/. Refer to the topic aboutapplications in this manual for more information.

4. Edit your local props.conf.

Include the path to your custom datetime.xml file in the relevant stanzas.

./splunk [command]

Run the train dates command

The train command is an interactive CLI tool. For Splunk to learn a new date format, you need toexplicitly provide a file and pattern. Afterwards, Splunk returns a string for you to add todatetime.xml.

1. To begin training Splunk to recognize a new timestamp, go to $SPLUNK_HOME/bin and type:

If the values are sufficient, Splunk displays:

Learned pattern.----------------------------------------------------------------------------------If you are satisfied that the timestamps formats have been learned, hit control-c.----------------------------------------------------------------------------------

5. After you hit control-c, Splunk displays:

Patterns Learned.It is highly recommended that you make changes to a copy of the default datetime.xml file.For example, copy "/Applications/splunk/etc/datetime.xml" to "/Applications/splunk/etc/system/lIn that custom file, add the below timestamp definitions, and add the pattern namesto timePatterns and datePatterns list.For more details, see http://www.splunk.com/doc/latest/admin/TrainTimestampRecognition--------------------------------------------------------------------------------<define name="trainwreck_1_date" extract="day,litmonth,year,"> <text><![CDATA[:\d+\s\w+\s(\d+)\s(\w+)\s(\d+)]]></text>

Configure timestamp assignment for events with multiple

If an event contains more than one recognizable timestamp, you can tell Splunk to use a particulartimestamp. This is especially useful when indexing events that contain syslog host-chaining data.

Configure positional timestamp extraction by editing props.conf.

Configure positional timestamp extraction in props.conf

Configure Splunk to recognize a timestamp anywhere in an event by adding TIME_PREFIX = and

MAX_TIMESTAMP_LOOKAHEAD = keys to a [<spec>] stanza in props.conf. Set a value forMAX_TIMESTAMP_LOOKAHEAD = to tell Splunk how far into an event to look for the timestamp. Set avalue for TIME_PREFIX = to tell Splunk what pattern of characters to look for to indicate thebeginning of the timestamp.

Note: Use $SPLUNK_HOME/etc/system/README/props.conf.example as an example, or

147create your own props.conf. Make any configuration changes to a copy of props.conf in$SPLUNK_HOME/etc/system/local/, or your own custom application directory in$SPLUNK_HOME/etc/apps/.

To identify the timestamp: May 23 15:40:21 2007

Note: Optimize the speed of timestamp extraction by setting the value of

MAX_TIMESTAMP_LOOKAHEAD = to look only as far into an event as needed for the timestamp youwant to extract. In this example MAX_TIMESTAMP_LOOKAHEAD = is optimized to look 44 charactersinto the event .

Specify timezones of timestamps

Specify timezones of timestamps

If you're indexing data from different timezones, use timezone offsets to ensure that they're correctlycorrelated when you search. You can configure timezones based on the host, source, or source typeof an event.

Configure timezones in props.conf. By default, Splunk applies timezones using these rules, in thefollowing order:

1. Use the timezone in raw event data (for example, PST, -0800).

2. Use TZ if it is set in a stanza in props.conf and the event matches the host, source, or sourcetype specified by a stanza.

3. Use the timezone of the Splunk server that indexes the event.

Specify time zones in props.conf

Use $SPLUNK_HOME/etc/system/README/props.conf.example as an example, or create

your own props.conf. Make any configuration changes to a copy of props.conf in$SPLUNK_HOME/etc/system/local/, or your own custom application directory in$SPLUNK_HOME/etc/apps/.

Configure time zones by adding a TZ = key to a timestamp configuration stanza for a host, source,or sourcetype in props.conf. The Splunk TZ = key recognizes zoneinfo TZID's (See all thetimezone TZ ID's in the zoneinfo (TZ) database). Set a TZ = value to a TZID of the appropriatetimezone for any host, source, or source type. The TZ for a host, source, or source type should be set

148to the timezone of the events coming from that host, source, or sourcetype.

Note that the timezone of the indexer is not configured in Splunk. As long as the time is set correctlyon the host OS of the indexer, offsets to event timezones will be calculated correctly.

Examples

Events are coming to this indexer from New York City (in the US/Eastern timezone) and MountainView, California (US/Pacific). To correctly handle the timestamps for these two sets of events, theprops.conf for the indexer needs the timezone to be specified as US/Eastern and US/Pacificrespectively.

The first example sets the timezone of events from host names that match the regular expressionnyc.* with the US/Eastern timezone.

[host::nyc*]TZ = US/Eastern

The second example sets the timezone of events from sources in the path /mnt/ca/... with theUS/Pacific timezone.

[source::/mnt/ca/...]TZ = US/Pacific

zoneinfo (TZ) database

The zoneinfo database is a publicly maintained database of timezone values.

• UNIX versions of Splunk rely on a TZ database included with the UNIX distribution you're installing on. Most UNIX distributions store the database in the directory: /usr/share/zoneinfo. • Solaris versions of Splunk store TZ information in this directory: /usr/share/lib/zoneinfo. • Windows versions of Splunk ship with a copy of the TZ database.

Refer to the zoneinfo (TZ) database for values you can set as TZ = in props.conf.

Tune timestamp recognition for better indexing performance

Tune timestamp recognition for better indexing performance

Tune Splunk's timestamp extraction by editing props.conf. Adjust how far Splunk's timestampprocessor looks into events, or turn off the timestamp processor to make indexing faster.

Note: Use $SPLUNK_HOME/etc/system/README/props.conf.example as an example, or

create your own props.conf. Make any configuration changes to a copy of props.conf in$SPLUNK_HOME/etc/system/local/, or your own custom application directory in$SPLUNK_HOME/etc/apps/. For more information on configuration files in general, see howconfiguration files work.

149Adjust timestamp lookahead

Timestamp lookahead determines how far (how many characters) into an event the timestampprocessor looks for a timestamp. Adjust how far the timestamp processor looks by setting a value(the number of characters) for the MAX_TIMESTAMP_LOOKAHEAD = key in any timestamp stanza.

Note: You can set MAX_TIMESTAMP_LOOKAHEAD = to different values for each timestamp stanza.

The default number of characters that the timestamp processor looks into an event is 150. SetMAX_TIMESTAMP_LOOKAHEAD = to a lower value to speed up how fast events are indexed. Youshould do this if your timestamps occur in the first part of your event.

If your events are indexed in real time, increase Splunk's overall indexing performance by turning offtimestamp lookahead (set MAX_TIMESTAMP_LOOKAHEAD = 0). This causes Splunk to not look intoevent's for a timestamp, and sets an event's timestamp to be its indexing time (using current systemtime).

Example:

This example tells the timestamp processor to look 20 characters into events from source foo.

[source::foo]MAX_TIMESTAMP_LOOKAHEAD = 20...

Turn off the timestamp processor

Turn off the timestamp processor entirely to significantly improve indexing performance. Turn offtimestamp processing for events matching a host, source, sourcetype specified by a timestampstanza by adding a DATETIME_CONFIG = key to a stanza and setting the value to NONE. Whentimestamp processing is off, Splunk won't look for timestamps to extract from event data. Splunk willinstead set an event's timestamp to be its indexing time (using current system time).

Example:

This example turns off timestamp extraction for events that come from the source foo.

If you've read the "Configure event processing" chapter in this manual, you know that Splunkautomatically extracts a number of default fields for each event it processes prior to indexing. Thesedefault fields include index, which identifies the index in which the related event is located,linecount, which describes the number of lines the related event contains, and timestamp, whichdescribes the point in time at which an event occurred. (As discussed in the "Configure eventtimestamping" chapter, you have a number of options when it comes to changing the manner inwhich sets of events are timestamped.)

Note: For a complete list of the default fields that Splunk identifies for each event prior to indexing,see "Use default fields" in the User manual.

This chapter focuses mainly on three important default fields: host, source, and sourcetype.Splunk identifies host, source, and sourcetype values for each event it processes. This chapterexplains how Splunk does this, and shows you how you can override the automatic assignment ofhost and sourcetype values for events when it is necessary to do so.

This chapter also shows you how to have Splunk extract additional, custom fields at index time. It'simportant to note that this practice is strongly discouraged, however. Adding to the list of indexedfields can negatively impact indexing and search speed. In addition, it may require you to reindexyour entire dataset in order to have those fields show up for previously indexed events. It's best toextract fields at search time whenever possible. For more information, see "Index time versus searchtime" in this manual.

For more information about index-time field extraction, see "Configure index-time field extraction" inthis manual.

Defining host, source, and sourcetype

The host, source, and sourcetype fields are defined as follows:

• host - An event's host value is typically the hostname, IP address, or fully qualified domain name of the network host from which the event originated. The host value enables you to easily locate data originating from a specific device. For an overview of the methods Splunk provides for the override of automatic host assignment, see the "Host field overview" in this topic in this manual. • source - The source of an event is the name of the file, stream, or other input from which the event originates. For data monitored from files and directories, the value of source is the full path, such as /archive/server1/var/log/messages.0 or /var/log/. The value of source for network-based data sources is the protocol and port, such as UDP:514. • sourcetype - The source type of an event is the format of the data input from which it originates, such as access_combined or cisco_syslog. For an overview of how Splunk sets the source type value and the ways you can override automatic sourcetyping, see the "Override automatic source type assignment" topic in this manual.

151Under what conditions should you override host and sourcetype assignment?

Much of the time, Splunk can automatically identify host and sourcetype values that are both correctand useful. But situations do come up that require you to intervene in this process and provideoverride values.

You may want to change your default host assignment when:

• you are bulk-loading archive data that was originally generated from a different host and you want those events to have that host value. • your data is actually being forwarded from a different host (the forwarder will be the host unless you specify otherwise). • you are working with a centralized log server environment, which means that all of the data received from that server will have the same host even though it originated elsewhere.

You may want to change your default sourcetype assignment when:

• you want to give all event data coming through a particular input or from a specific source the same source type, for tracking purposes. • you want to apply source types to specific events coming through a particular input, such as events that originate from a discrete group of hosts, or even events that are associated with a particular IP address or userid.

There are also steps you can take to expand the range of source types that Splunk automaticallyrecognizes, or to simply rename source types. See the "Source type field overview" section, below,for more information.

About hosts

An event's host field value is the name of the physical device from which the event originates.Because it is a default field, which means that Splunk assigns it to every event it indexes, you use itto search for all events that have been generated by a particular host.

The host value can be an IP address, device hostname, or a fully qualified domain name, dependingon whether the event was received through a file input, network input, or the computer hosting theinstance of Splunk.

How Splunk assigns the host value

If no other host rules are specified for a source, Splunk assigns host a default value that applies toall data coming from inputs on a given Splunk server. The default host value is the hostname or IPaddress of the network host. When Splunk is running on the server where the event occurred (whichis the most common case) this is correct and no manual intervention is required.

For more information, see "Set a default host for a Splunk server" in this manual.

152Set a default host for a file or directory input

If you are running Splunk on a central log archive, or you are working with files forwarded from otherhosts in your environment, you may need to override the default host assignment for events comingfrom particular inputs.

There are two methods for assigning a host value to data received through a particular input. You candefine a static host value for all data coming through a specific input, or you can have Splunkdynamically assign a host value to a portion of the path or filename of the source. The latter methodcan be helpful when you have a directory structure that segregates each host's log archive in adifferent subdirectory.

For more information, see "Set a default host for a file or directory input" in this manual.

Override default host values based on event data

You may have a situation that requires you to override host values based on event data. Forexample, if you work in a centralized log server environment, you may have several host servers thatfeed into that main log server. The central log server is called the reporting host. The system wherethe event occurred is called the originating host (or just the host). In these cases you need to definerules that override the automatic host assignments for events received from that centralized log hostand replace them with distinct originating host values.

For more information, see "Override default host values based on event data" in this manual.

Tag host values

Tag host values to aid in the execution of robust searches. Tags enable you to cluster groups ofhosts into useful, searchable categories.

For more information, see "About tags and aliases" in the Knowledge Manager manual.

About source types

Any common data input format can be a source type. Most source types are log formats. Forexample, the list of common source types that Splunk automatically recognizes includes:

Note: For a longer list of source types that Splunk automatically recognizes, see "List of pretrainedsourcetypes" in this manual.

sourcetype is the name of the source type field. You can use the sourcetype field to find similartypes of data from any source type. For example, you could search

153sourcetype=weblogic_stdout to find all of your WebLogic server events even when WebLogicis logging from more than one domain (or "host," in Splunk terms).

Source vs source type

The source is the name of the file, stream, or other input from which a particular event originates. Fordata monitored from files and directories, the value of source is the full path, such as/archive/server1/var/log/messages.0 or /var/log/. The value of source fornetwork-based data sources is the protocol and port, such as UDP:514.

Events with the same source type can come from different sources. For example, say you'remonitoring source=/var/log/messages and receiving direct syslog input from udp:514. If yousearch sourcetype=linux_syslog, Splunk will return events from both of those sources.

Methods Splunk uses for source type assignation and their precedence

Splunk employs a variety of methods to assign source types to event data at index time. As itprocesses event data, Splunk steps through these methods in a defined order of precedence. It startswith hardcoded source type configurations in inputs.conf and props.conf, moves on torule-based source type association, and then works through methods like automatic source typerecognition and automatic source type learning. This range of methods enables you to configure howSplunk applies source type values to specific kinds of events, while letting Splunk assign source typevalues to the remaining events automatically.

The following list discusses these methods in the order that Splunk typically uses them to assignsource types to event data at index time:

1. Explicit source type specification based on the data input, as configured in inputs.confstanzas:

[monitor://$PATH]sourcetype=$SOURCETYPE

2. Explicit source type specification based on the data source, as configured in props.confstanzas:

For information about setting up or removing source type recognition rules, see "Configure rule-basedsource type recognition" in this manual.

1544. Automatic source type matching:

Splunk uses automatic source type recognition to match similar-looking files and, through that, assigna source type. It calculates signatures for patterns in the first few thousand lines of any file or streamof network input. These signatures identify things like repeating word patterns, punctuation patterns,line length, and so on. When Splunk calculates a signature, it compares it to previously seensignatures. If the signature appears to be a radically new pattern, Splunk creates a new source typefor the pattern.

Note: At this stage in the source type assignation process, Splunk just matches incoming data withsource types that it has learned previously. It doesn't create new source types for unique signaturesuntil the final stage of source typing (step 6, below).

See "List of pretrained source types" in this manual for a list of the source types that Splunk canrecognize out of the box. See "Train Splunk's source type autoclassifier" for more information aboutexpanding the list of source types that Splunk can assign through automatic source type recognition.

5. Delayed rule-based source type association:

This works like rule-based associations (see above), except you create a delayedrule:: stanza inprops.conf. This is a useful "catch-all" for source types, in case Splunk missed any with intelligentmatching (see above).

A good use of delayed rule associations is for generic versions of very specific source types that aredefined earlier with rule:: in step 3, above. For example, you could use rule:: to catch eventdata with specific syslog source types, such as "sendmail syslog" or "cisco syslog" and then havedelayedrule:: apply the generic "syslog" source type to the remaining syslog event data.

For more information about settting up or removing delayed rules for source type recognition, see"Configure rule-based source type recognition" in this manual.

6. Automatic source type learning:

If Splunk is unable to assign a source type for the event using the preceding six methods, it creates anew source type for the event signature (see step 4, above). Splunk stores learned patterninformation in sourcetypes.conf.

Set a default host for a Splunk server

Set a default host for a Splunk server

An event's host value is the IP address, host name, or fully qualified domain name of the physicaldevice on the network from which the event originates. Because Splunk assigns a host value atindex time for every event it indexes, host value searches enable you to easily find data originatingfrom a specific device.

155Default host assignment

If you have not specified other host rules for a source (using the information in this and subsequenttopics in this chapter), the default host value for an event is typically the hostname, IP address, orfully qualified domain name of the network host from which the event originated. When the eventoriginates from the server on which Splunk is running (which is the most common case) the hostassignment is correct, and there's no need for you to change anything. However, if you data is beingforwarded from a different host, or if you're bulk-loading archive data, you may want to change thedefault host value for that data.

To set the default value of the host field, you can use Splunk Manager, or edit inputs.conf.

Set the default host value using Manager

Use Manager to set the default host value for a server:

1. In Splunk Web, click on the Manager link in the upper right-hand corner of the screen.

2. In Manager, click System settings under System configurations.

3. On the System settings page, click General settings.

4. On the General settings page, scroll down to the Index settings section and change the Defaulthost name.

5. Save your changes.

This sets the value of the host field for all events that have not received another host name.

Set the default host value using inputs.conf

This host assignment is set in inputs.conf during Splunk installation. Modify the host entry by editing$SPLUNK_HOME/etc/system/local/, or your own custom application directory in$SPLUNK_HOME/etc/apps/. (We recommend using the latter directory if you want to make it easyto transfer your data customizations to other search servers).

This is the format of the host assignment in inputs.conf:

host = <string>

• Set <string> to your chosen default host value. <string> defaults to the IP address or domain name of the host where the data originated. • This is a shortcut for MetaData:Host = <string>. It sets the host of events from this input to be the specified string. Splunk automatically prepends host:: to the value when this shortcut is used.

Restart Splunk to enable any changes you have made to inputs.conf.

156Override the default host value for data received from a specific input

If you are running Splunk on a central log archive, or you are working with files copied from otherhosts in the environment, you may want to override the default host assignment for a particular inputon a static or dynamic basis.

For more information, see "Set a default host value for an input" in this manual.

Override the default host value using event data

If you have a centralized log host sending events to Splunk, many servers may be involved. Thecentral log server is called the reporting host. The system where the event occurred is called theoriginating host (or just the host). In this case you need to define rules that set the host field valuebased on the information in the events themselves.

For more information, see "Override default host values based on event data" in this manual.

Set a default host for a file or directory input

Set a default host for a file or directory input

In certain situations you may want to explicitly set a host value for all data coming in to Splunkthrough a particular file or directory input. You can set the host statically or dynamically.

• To statically set the host means you're setting the same host for every event that comes to Splunk through a designated file or directory input. • If you dynamically set the host value, Splunk extracts the host name from a portion of the source input using a regex or segment of the source's full directory path.

You can also assign host values to events coming through a particular file or directory input based ontheir source or sourcetype values (as well as other kinds of information). For more information, see"Overriding default host assignments based on event data," in this manual.

Note: Splunk currently does not enable the setting of default host values for event data receivedthrough TCP, UDP, or scripted inputs.

Statically setting the default host value for a file or directory input

This method applies a single default host value to each event received through a specific file ordirectory input.

Note:A static host value assignment only impacts new data coming in through the input with whichit's associated. You cannot assign a default host value to data that has already been processed, splitinto events, and indexed.

If you need to assign a host value to data that's already been indexed, you need to tag the host valueinstead.

157Via Splunk Web

You can statically define a host for a file or directory input whenever you add a new input of that typethrough the "Data inputs" page of Splunk Web's Manager interface:

1. In Splunk Web, click on the Manager link in the upper right-hand corner of the screen.

2. In Manager, click Data inputs under System configurations.

3. On the Data inputs page, select Files & Directories to go to the list page for that input type.

4. On the Files & directories page, you can either click the name of an input that you want to update,or click New to create a new file or directory input.

5. Once you're on the detail page for the file or directory input, select the Constant value option fromthe Set host dropdown.

6. Enter the static host value for the input in the Host field value field.

7. Save your changes.

For more information about inputs and input types, see "What Splunk can monitor" in the Adminguide.

Via configuration files

Edit inputs.conf to specify a host value for a monitored file or directory input. Include a host =attribute within the appropriate stanza.

[monitor://<path>]host = $YOUR_HOST

Edit inputs.conf in $SPLUNK_HOME/etc/system/local/, or your own custom application

directory in $SPLUNK_HOME/etc/apps/. For more information on configuration files in general, see"About configuration files" in the Admin manual.

For more information about inputs and input types, see "What Splunk can monitor" in the Adminmanual.

Example of static host value assignment for an input

This example covers any events coming in from /var/log/httpd. Any events coming from thisinput will receive a host value of webhead-1.

[monitor:///var/log/httpd]host = webhead-1

158Dynamically setting the default host value for a file or directory input

Use this method if you want to dynamically extract the host value for a file or directory input, eitherfrom a segment of the source input path, or from a regular expression. For example, if you want toindex an archived directory and the name of each file in the directory contains relevant hostinformation, you can use Splunk to extract this information and assign it to the host field.

Via SplunkWeb

Start by following the steps for setting up a static host assignment via Splunk Web, above. However,when you get to the Set host dropdown list on the input details page for a file or directory input,choose one of the following two values:

• Regex on path - Choose this option if you want to extract the host name via a regular expression. Enter the regex for the host you want to extract in the Regular expression field. • Segment in path - Choose this option if you want to extract the host name from a segment in your data source's path. Enter the segment number in the Segment # field. For example, if the path to the source is /var/log/[host server name] and you want the third segment (the host server name) to be the host value, enter 3 into the Segment # field.

Note: For a primer on regular expression syntax and usage, see Regular-Expressions.info. You cantest regexes by using them in searches with the rex search command. Splunk also maintains a list ofuseful third-party tools for writing and testing regular expressions.

Via configuration files

You can set up dynamic host extraction rules when you are configuring inputs.conf. Editinputs.conf in $SPLUNK_HOME/etc/system/local/, or your own custom application directoryin $SPLUNK_HOME/etc/apps/. For more information on configuration files in general, see "Aboutconfiguration files" in the Admin manual.

Add host_regex = <regular expression> to override the host field with a value extractedusing a regular expression.

[monitor://<path>]host_regex = $YOUR_REGEX

The regular expression extracts the host value from the filename of each input. The first capturinggroup of the regex is used as the host.

Note: If the regex fails to match, the default host = attribute is set as the host.

Important: For a primer on regular expression syntax and usage, see Regular-Expressions.info. Youcan test regexes by using them in searches with the rex search command. Splunk also maintains alist of useful third-party tools for writing and testing regular expressions.

host_segment = <integer>

Define a host_segment instead of a host_regex if you want to override the host field with a valueextracted using a segment of the data source path. For example, if the path to the source is

159/var/log/[host server name] and you want the third segment (the host server name) to bethe host value, your input stanza would look like:

[monitor://var/log/]host_segment = 3

Note: If the <integer> value is not an integer, or is less than 1, Splunk sets the default host =attribute as the host.

Note: You cannot simultaneously specify a host_regex and host_segment.

Examples of dynamic host assignment for an input

This example uses regex on the file path to set the host:

[monitor://var/log]host_regex = /var/log/(\w+)

With that regex, all events from /var/log/foo.log are given the a host value of foo.

This example uses the segment of the data source filepath to set the host:

[monitor://apache/logs/]host_segment = 3

It sets the host value to the third segment in the path apache/logs.

Override default host values based on event data

Override default host values based on event data

Splunk assigns default host names to your events based on data in those events. This topic showsyou how to override specific default host assignments when these default assignments are incorrect.

Configuration

To set up host value overrides based on event data, you need to edit transforms.conf and props.conf.Edit these files in $SPLUNK_HOME/etc/system/local/, or your own custom application directoryin $SPLUNK_HOME/etc/apps/. For more information about configuration files in general, see"About configuration files" in this manual.

160Fill in the stanza name and the regex fields with the correct values for your data.

Leave DEST_KEY = MetaData:Host to write a value to the host:: field. FORMAT = host::$1writes the REGEX value into the host:: field.

Note: Name your stanza with a unique identifier (so it is not confused with an existing stanza in$SPLUNK_HOME/etc/system/default/transforms.conf).

Edits to props.conf

Create a stanza in $SPLUNK_HOME/etc/system/local/props.conf to map the

transforms.conf regex to the source type in props.conf.

[<spec>]TRANSFORMS-$name=$UNIQUE_STANZA_NAME

• <spec> can be:

♦ <sourcetype>, the sourcetype of an event ♦ host::<host>, where <host> is the host for an event ♦ source::<source>, where <source> is the source for an event • $name is whatever unique identifier you want to give to your transform. • $UNIQUE_STANZA_NAME must match the stanza name of the transform you just created in transforms.conf.

Note: Optionally add any other valid attribute/value pairs from props.conf when defining your stanza.This assigns the attributes to the <spec> you have set. For example, if you have customline-breaking rules to set for the same <spec>, append those attributes to your stanza.

Example

Here is a set of events from the houseness.log file. They contain the host in the third position.

41602046:53 accepted fflanda

41602050:29 accepted rhallen41602052:17 accepted fflanda

Create a regex to extract the host value and add it to a new stanza in$SPLUNK_HOME/etc/system/local/transforms.conf:

The transform above works with the following stanza in props.conf:

The above stanza has the additional attribute/value pair SHOULD_LINEMERGE = false. Thisspecifies that Splunk should create new events at a newline.

Note: The additional -rhallen in the attribute TRANSFORMS-rhallen serves to differentiate thistransform from other transforms.

The events now appear in SplunkWeb as the following:

Handle incorrectly-assigned host values

Handle incorrectly-assigned host values

At some point, you may discover that the host value for some of your events might be set incorrectlyfor some reason. For example, you might be scraping some Web proxy logs into a directory directlyon your Splunk server and add that directory as an input to Splunk without remembering to overridethe value of the host field, causing all those events to think their original host value is the same asyour Splunk host.

If something like that happens, here are your options, in order of complexity:

• Delete and reindex the entire index

• Use a search to delete the specific events that have the incorrect host value and reindex those events • Tag the incorrect host values with a tag, and search with that • Set up a static field lookup to look up the host, map it in the lookup file to a new field name, and use the new name in searches • Alias the host field to a new field (such as temp_host), set up a static field lookup to look up the correct host name using the name temp_host, then have the lookup overwrite the original host with the new lookup value (using the OUTPUT option when defining the lookup)

Of these options, the last option will look the nicest if you can't delete and reindex the data, butdeleting and reindexing the data will always give the best performance.

Override automatic source type assignment

Override automatic source type assignment

You can override automatic source type assignment for event data that comes from specific inputs, orwhich has a particular source.

162Note: While source type assignment by input seems like a simple way to handle things, it isn't verygranular--when you use it Splunk gives all event data from an input the same source type, even ifthey actually have different sources and hosts. If you want to bypass automatic source typeassignment in a more targeted manner, arrange for Splunk to assign source types according to theevent data source.

Override automated source type matching for an input

Use these instructions to override automated source type assignation and explicitly assign a singlesource type value to data coming from a specific input such as /var/log/.

Note: This only affects new data coming in after the override is set up. To correct the source types ofevents that have already been indexed, create a tag for the source type instead.

Through Splunk Web

When you define a data input in Manager, you can set a sourcetype value that Splunk applies to allincoming data from that input. Manager gives you the option of picking a sourcetype value from a listor entering a unique sourcetype value of your own.

To select a sourcetype value for an input, click the Manager link to go to the Splunk Manager page,select Data inputs and then drill down to the details page of the input for which you want to define asourcetype.

Pick a sourcetype value from a list for an input

If the data from a particular input belongs to one of Splunk's pretrained source types, you can choosethe sourcetype value that Splunk would otherwise assign automatically from a drop down list. For adescription of Splunk's pretrained source types, see the reference list of pretrained sourcetypes in the"List of pretrained sourcetypes" topic, in this manual.

On the details page for the input that you're defining a source type for, select From list from Setsource type. Then choose a pretrained sourcetype from the Select source type from list dropdownlist.

Save your input settings. After that point, Splunk will assign the sourcetype that you've selected to allevents it indexes for that input.

Manually enter a sourcetype value for an input

You can manually enter a sourcetype value for data that Splunk receives from a particular input.

On the details page for the input that you're defining a source type for, select Manual from the Setsource type list, and then enter a source type in Source type.

Save your input settings. After that point, Splunk will assign the sourcetype that you've specified to allevents it indexes for that input.

163Through configuration files

When you configure inputs in inputs.conf, you can set a sourcetype as well. Edit inputs.conf in$SPLUNK_HOME/etc/system/local/, or your own custom application directory in$SPLUNK_HOME/etc/apps/. For more information on configuration files in general, see "Aboutconfiguration files" in this manual.

Note: This only impacts new data coming in after your configuration change. If you want to correctthe source types of events that have already been indexed, create a tag for the source type instead.

Include a sourcetype = attribute within the appropriate stanza in inputs.conf:

Override automatic source type matching for a source

Use these instructions to override automated source type assignation and explicitly assign a singlesource type value to data coming from a specific source.

Use these instructions to assign a source type based on a source through props.conf. Editprops.conf in $SPLUNK_HOME/etc/system/local/, or your own custom application directory in$SPLUNK_HOME/etc/apps/. For more information on configuration files in general, see "Aboutconfiguration files".

Important: If you are forwarding data to one or more receivers, and want to set up an override ofautomatic source type matching for a specific source, you must set it up on the props.conf file forthe forwarder. If you set it up on the receiver, the override will not take effect.

Note: This only impacts new data coming in after your configuration change. If you want to correctthe source types of events that have already been indexed, create a tag for the source type instead.

Learn more about props.conf.

Through configuration files

Add a stanza for your source in props.conf. In the stanza, identify the source path, using regexsyntax for flexibility if necessary. Then identify the source type by including a sourcetype =attribute:

[source::.../var/log/anaconda.log(.\d+)?]sourcetype = anaconda

This example sets any events from sources containing the string /var/log/anaconda.logfollowed by any number of numeric characters to sourcetype = anaconda.

164Splunk recommends that your stanza source path regexes (such as[source::.../web/....log]) be as specific as possible. It is HIGHLY recommended that younot have the regex end in "...". For example, don't do this:

[source::/home/fflanda/...]sourcetype = mytype

This is dangerous. The above example tells Splunk to process gzip files in /home/fflanda asmytype files rather than gzip files.

It would be much better to write:

[source::/home/fflanda/....log(.\d+)?]sourcetype = mytype

Note: For a primer on regular expression syntax and usage, see Regular-Expressions.info. You cantest regexes by using them in searches with the rex search command. Splunk also maintains a list ofuseful third-party tools for writing and testing regular expressions.

Advanced source type overrides

Advanced source type overrides

This topic shows you how to configure Splunk to override sourcetypes on a per-event basis. Itincludes an example that demonstrates the use of transforms.conf in tandem with props.confto override sourcetypes for events associated with a specific host, and goes on to show how you cando this for event data coming from a particular input or source.

For more information about performing basic source type overrides for event data that comes fromspecific inputs, or which has a particular source, see "Override automatic source type assignment" inthis manual.

Configuration

To do this you'll set up two stanzas, one in transforms.conf, and another in props.conf. Editthese files in $SPLUNK_HOME/etc/system/local/, or your own custom application directory in$SPLUNK_HOME/etc/apps/.

The transforms.conf stanza should follow this format:

• <unique_stanza_name> should reflect that it involves a sourcetype. You'll use this name later in the props.conf stanza.

165 • <your_regex> is a regular expression that identifies the events that you want to apply a custom sourcetype to (such as events carrying a particular hostname or other field value). • <your_custom_sourcetype_value> is the sourcetype value that you want to apply to the regex-selected events.

Note: For a primer on regular expression syntax and usage, see Regular-Expressions.info. You cantest regexes by using them in searches with the rex search command. Splunk also maintains a list ofuseful third-party tools for writing and testing regular expressions.

props.conf

Next you create a stanza in props.conf that references the transforms.conf stanza, as follows.

[<spec>]TRANSFORMS-<class> = <unique_stanza_name>

• <spec> can be:

♦ <sourcetype>, the sourcetype value of an event. ♦ host::<host>, where <host> is the host value for an event. ♦ source::<source>, where <source> is the source value for an event. • <class> is any name that you want to give to your stanza to identify it. In this case you might just use "sourcetype" to identify it as a sourcetype. • <unique_stanza_name>is the name of your stanza from transforms.conf.

Example - Sourcetyping events originating from different hosts, indexed from a single input

Let's say that you have a shared UDP input, UDP514. Your Splunk instance indexes a wide range ofdata from a number of hosts through this input. You've found that you need to apply a particularsourcetype--which, for the purposes of this example we'll call "my_log"--to data originating from threespecific hosts (host1, host2, and host3) that reaches Splunk through UDP514.

To start, you can use the regex that Splunk typically uses to extract the host field for syslog events.You can find it in system/default/transforms.conf:

You can create two kinds of rules in props.conf: rules and delayed rules. The only differencebetween the two is the point at which Splunk checks them during the source typing process. As itprocesses each string of event data, Splunk uses several methods to determine source types:

• After checking for explicit source type definitions based on the event data input or source, Splunk looks at the rule:: stanzas defined in props.conf and tries to match source types to the event data based on the classification rules specified in those stanzas. • If Splunk is unable to find a matching source type using the available rule:: stanzas, it tries to use automatic source type matching, where it tries to identify patterns similar to source types it has learned in the past. • When that method fails, Splunk then checks the delayedrule:: stanzas in props.conf, and tries to match the event data to source types using the rules in those stanzas.

You can set up your system so that rule:: stanzas contain classification rules for specializedsource types, while delayedrule:: stanzas contain classification rules for generic source types.This way the the generic source types are applied to broad ranges of events that haven't qualified formore specialized source types. For example, you could use rule:: stanzas to catch event data withspecific syslog source types, such as sendmail_syslog or cisco_syslog and then have adelayedrule:: stanza apply the generic syslog source type to remaining syslog event data.

Configuration

To set source typing rules, edit props.conf in $SPLUNK_HOME/etc/system/local/, or your

own custom application directory in $SPLUNK_HOME/etc/apps/. For more information onconfiguration files in general, see "About configuration files" in this manual.

Create a rule by adding a rule:: or delayedrule:: stanza to props.conf. Provide a name forthe rule in the stanza header, and declare the source type name in the body of the stanza. After thesource type declaration, list the the source type assignation rules. These rules use one or more

167MORE_THAN and LESS_THAN statements to find patterns in the event data that fit given regularexpressions by specific percentages.

Note: You can specify any number of MORE_THAN and LESS_THAN statements in a source typingrule stanza. All of the statements must match a percentage of event data lines before those lines canbe assigned the source type in question. For example, you could define a rule that assigns a specificsource type value to event data where more than 10% match one regular expression and less than10% match another regular expression.

Add the following to props.conf:

The MORE_THAN and LESS_THAN numerical values refer the percentage of lines that contain thestring specified by the regular expression. To match, a rule can be either MORE_THAN or LESS_THANthose percentages.

Note: For a primer on regular expression syntax and usage, see Regular-Expressions.info. You cantest regexes by using them in searches with the rex search command. Splunk also maintains a list ofuseful third-party tools for writing and testing regular expressions.

Examples

The following examples come from $SPLUNK_HOME/etc/system/default/props.conf.

Postfix syslog files

# postfix_syslog sourcetype rule

[rule::postfix_syslog]sourcetype = postfix_syslog# If 80% of lines match this regex, then it must be this typeMORE_THAN_80=^\w{3} +\d+ \d\d:\d\d:\d\d .* postfix(/\w+)?\[\d+\]:

Delayed rule for breakable text

# breaks text on ascii art and blank lines if more than 10% of lines have# ascii art or blank lines, and less than 10% have timestamps[delayedrule::breakable_text]sourcetype = breakable_textMORE_THAN_10 = (^(?:---|===|\*\*\*|___|=+=))|^\s*$LESS_THAN_10 = [: ][012]?[0-9]:[0-5][0-9]

Rename source types

Rename source types

When configuring a source type in props.conf, you can rename the source type; several sourcetypes can share one name.

168To rename the source type, add the following into your source type stanza:

rename = <string>

After renaming, you can search for the source type with:

sourcetype=<string>

Since one string can be used for multiple source types, to search for the original source type beforerenaming, use:

_sourcetype=<sourcetype>

Train Splunk's source type autoclassifier

Train Splunk's source type autoclassifier

Use these instructions to train Splunk to recognize a new source type, or give it new samples tobetter recognize a pre-trained source type. Autoclassification training enables Splunk to classifyfuture event data with similar patterns as a specific source type. This can be useful when Splunk isindexing directories that contains data with a mix of source types (such as /var/log). Splunk ships"pre-trained," with the ability to assign sourcetype=syslog to most syslog files.

Note: Keep in mind that source type autoclassification training applies to future event data, not eventdata that has already been indexed.

You can also bypass auto-classification in favor of hardcoded configurations, and just override asourcetype for an input, or override a sourcetype for a source. Or configure rule-based source typerecognition.

You can also anonymize your file using Splunk's built in anonymizer utility.

If Splunk fails to recognize a common format, or applies an incorrect source type value, you shouldreport the problem to Splunk support and send us a sample file.

via the CLI

Here's what you enter to train source types through the CLI:

# splunk train sourcetype $FILE_NAME $SOURCETYPE_NAME

Fill in $FILE_NAME with the entire path to your file. $SOURCETYPE_NAME is the custom source typeyou wish to create.

It's usually a good idea to train on a few different samples for any new source type so that Splunklearns how varied a source type can be.

169List of pretrained source typesList of pretrained source types

Splunk ships pre-trained to recognize many different source types. A number of source types areautomatically recognized, tagged and parsed appropriately. Splunk also contains a significant numberof pre-trained source types that are not automatically recognized but can be assigned via SplunkWeb or inputs.conf.

It's a good idea to use a pre-trained source type if it matches your data, as Splunk contains optimizedindexing properties for pre-trained source types. However, if your data does not fit with anypre-trained source types, Splunk can index virtually any format of data without custom properties.

To find out what configuration information Splunk is using to index a given source type, you can usethe btool utility to list out the properties. For more information on using btool, refer to "Commandline tools for use with Support's direction" in this manual.

The following example shows how to list out the configuration for the tcp source type:

Specify source type settings in props.conf

Specify source type settings in props.conf

There are source type specific settings in props.conf. Specify settings for a source type using thefollowing attribute/value pairs. Add a sourcetype stanza to props.conf in$SPLUNK_HOME/etc/system/local/, or your own custom application directory in$SPLUNK_HOME/etc/apps/. For more information on configuration files, see "About configurationfiles".

Note: The following attribute/value pairs can only be set for a stanza that begins with[<$SOURCETYPE>]:

invalid_cause = <string>

174 • Can only be set for a [<sourcetype>] stanza. • Splunk will not index any data with invalid_cause set. • Set <string> to "archive" to send the file to the archive processor (specified in unarchive_cmd). • Set to any other string to throw an error in the splunkd.log if running Splunklogger in debug mode. • Defaults to empty.

unarchive_cmd = <string>

• Only called if invalid_cause is set to "archive".

• <string> specifies the shell command to run to extract an archived source. • Must be a shell command that takes input on stdin and produces output on stdout. • Defaults to empty.

LEARN_MODEL = <true/false>

• For known sourcetypes, the fileclassifier will add a model file to the learned directory. • To disable this behavior for diverse sourcetypes (such as sourcecode, where there is no good exemplar to make a sourcetype) set LEARN_MODEL = false. ♦ More specifically, set LEARN_MODEL to false if you can easily classify your source by its name or a rule and there's nothing gained from trying to analyze the content. • Defaults to empty.

maxDist = <integer>

• Determines how different a sourcetype model may be from the current file. • The larger the value, the more forgiving. • For example, if the value is very small (e.g., 10), then files of the specified sourcetype should not vary much. • A larger value indicates that files of the given sourcetype vary quite a bit. • Defaults to 300.

Configure index-time field extractions

Configure index-time field extractions

We do not recommend that you add custom fields to the set of default fields that Splunkautomatically extracts and indexes at index time, such as timestamp, punct, host, source, andsourcetype. Adding to this list of fields can negatively impact indexing performance and searchtimes, because each indexed field increases the size of the searchable index. Indexed fields are alsoless flexible--whenever you make changes to your set of fields, you must re-index your entire dataset.For more information, see "Index time versus search time" in the Admin manual.

With those caveats, there are times when you may find a need to change or add to your indexedfields. For example, you may have situations where certain search-time field extractions arenoticeably impacting search performance. This can happen, for example, if you commonly search alarge event set with expressions like foo!=bar or NOT foo=bar, and the field foo nearly alwaystakes on the value bar.

175Conversely, you may want to add an indexed field if the value of a search-time extracted field existsoutside of the field more often than not. For example, if you commonly search only for foo=1, but 1occurs in many events that do not have foo=1, you may want to add foo to the list of fieldsextracted by Splunk at index time.

In general, you should try to extract your fields at search time. For more information see "Createsearch-time field extractions" in the Knowledge Manager manual.

• The <unique_stanza_name> is required for all transforms, as is the REGEX.

• REGEX is a regular expression that operates on your data to extract fields.

♦ Name-capturing groups in the REGEX are extracted directly to fields, which means that you don't have to specify a FORMAT for simple field extraction cases. ♦ If the REGEX extracts both the field name and its corresponding value, you can use the following special capturing groups to skip specifying the mapping in the FORMAT attribute:

_KEY_<string>, _VAL_<string>

176• For example, the following are equivalent:

Using FORMAT:

REGEX = ([a-z]+)=([a-z]+) FORMAT = $1::$2

Not using FORMAT:

REGEX = (?<_KEY_1>[a-z]+)=(?<_VAL_1>[a-z]+)

• FORMAT is optional. Use it to specify the format of the field/value pair(s) that you are extracting, including any field names or values that you want to add. You don't need to specify the FORMAT if you have a simple REGEX with name-capturing groups. ♦ For index-time transforms, you use $n to specify the output of each REGEX match (for example, $1, $2, and so on). ♦ If the REGEX does not have n groups, the matching fails. ♦ FORMAT defaults to $1. ♦ The special identifier $0 represents what was in the DEST_KEY before the REGEX was performed (in the case of index-time field extractions the DEST_KEY is _meta). For more information, see "How Splunk builds indexed fields," below. ♦ For index-time field extractions, you can set up FORMAT in several ways. It can be a <field-name>::<field-value> setup like

FORMAT = $1::$2 $3::$4 (where the REGEX extracts both the field name and the field value)

However you can also set up index-time field extractions that create concatenated fields:

FORMAT = ipaddress::$1.$2.$3.$4

• WRITE_META = true writes the extracted field name and value to _meta, which is where Splunk stores indexed fields. This attribute setting is required for all index-time field extractions, except for those where DEST_KEY = meta (see the discussion of DEST_KEY, below. ♦ For more information about _meta and its role in indexed field creation, see "How Splunk builds indexed fields," below.

• DEST_KEY is required for index-time field extractions where WRITE_META = false or is not set. It specifies where Splunk sends the results of the REGEX. ♦ For index-time searches, DEST_KEY = _meta, which is where Splunk stores indexed fields. For other possible KEY values see the transforms.conf page in this manual.

177 ♦ For more information about _meta and its role in indexed field creation, see "How Splunk builds indexed fields," below. ♦ When you use DEST_KEY = _meta you should also add $0 to the start of your FORMAT attribute. $0 represents the DEST_KEY value before Splunk performs the REGEX (in other words, _meta. ♦ Note: The $0 value is in no way derived from the REGEX.

• DEFAULT_VALUE is optional. The value for this attribute is written to DEST_KEY if the REGEX fails. ♦ Defaults to empty.

• SOURCE_KEY is optional. You use it to identify a KEY whose values the REGEX should be applied to. ♦ By default, SOURCE_KEY = _raw, which means it is applied to the entirety of all events. ♦ Typically used in conjunction with REPEAT_MATCH. ♦ For other possible KEY values see the transforms.conf page in this manual.

• REPEAT_MATCH is optional. Set it to true to run the REGEX multiple times on the SOURCE_KEY. ♦ REPEAT_MATCH starts wherever the last match stopped and continues until no more matches are found. Useful for situations where an unknown number of field/value matches are expected per event. ♦ Defaults to false.

• LOOKAHEAD is optional. Use it to specify how many characters to search into an event. ♦ Defaults to 256. You may want to increase your LOOKAHEAD value if you have events with line lengths longer than 256 characters.

Note: For a primer on regular expression syntax and usage, see Regular-Expressions.info. You cantest regexes by using them in searches with the rex search command. Splunk also maintains a list ofuseful third-party tools for writing and testing regular expressions.

Note: The capturing groups in your regex must identify field names that use ASCII characters(a-zA-Z0-9_-.). International characters will not work.

Link the new field to props.conf

To props.conf, add the following lines:

[<spec>]TRANSFORMS-<value> = <unique_stanza_name>

• <spec> can be:

♦ <sourcetype>, the sourcetype of an event. ♦ host::<host>, where <host> is the host for an event. ♦ source::<source>, where <source> is the source for an event. ♦ Note: You can use regex-type syntax when setting the <spec>. Also, source and source type stanzas match in a case-sensitive manner while host stanzas do not. For more information, see the props.conf spec file.

178 • <value> is any value you want, to give your attribute its name-space. • <unique_stanza_name> is the name of your stanza from transforms.conf.

Note: For index-time field extraction, props.conf uses TRANSFORMS-<class>, as opposed to

EXTRACT-<value>, which is used for configuring search-time field extraction.

Add an entry to fields.conf for the new field

Add an entry to fields.conf for the new indexed field:

[<your_custom_field_name>]INDEXED=true

• <your_custom_field_name> is the name of the custom field you set in the unique stanza that you added to transforms.conf. • Set INDEXED=true to indicate that the field is indexed.

Note: If a field of the same name is extracted at search time, you must set INDEXED=false for thefield. In addition, you must also set INDEXED_VALUE=false if events exist that have values of thatfield that are not pulled out at index time, but which are extracted at search time.

For example, say you're performing a simple <field>::1234 extraction at index time. This couldwork, but you would have problems if you also implement a search-time field extraction based on aregex like A(\d+)B, where the string A1234B yields a value for that field of 1234. This would turn upevents for 1234 at search time that Splunk would be unable to locate at index time with the<field>::1234 extraction.

Restart Splunk for your changes to take effect

Changes to configuration files such as props.conf and transforms.conf won't take effect untilyou shut down and restart Splunk.

How Splunk builds indexed fields

Splunk builds indexed fields by writing to _meta. Here's how it works:

• _meta is modified by all matching transforms in transforms.conf that contain either DEST_KEY = _meta or WRITE_META = true. • Each matching transform can overwrite _meta, so use WRITE_META = true to append _meta. ♦ If you don't use WRITE_META, then start your FORMAT with $0. • After _meta is fully built during parsing, Splunk interprets the text in the following way: ♦ The text is broken into units; each unit is separated by whitespace. ♦ Quotation marks (" ") group characters into larger units, regardless of whitespace. ♦ Backslashes ( \ ) immediately preceding quotation marks disable the grouping properties of quotation marks. ♦ Backslashes preceding a backslash disable that backslash.

179 ♦ Units of text that contain a double colon (::) are turned into extracted fields. The text on the left side of the double colon becomes the field name, and the right side becomes the value.

Note: Indexed fields with regex-extracted values containing quotation marks will generally not work,and backslashes may also have problems. Fields extracted at search time do not have theselimitations.

Here's an example of a set of index-time extractions involving quotation marks and backslashes todisable quotation marks and backslashes.

You would start by setting up a transform in transforms.conf named dnsRequest:

This transform defines a custom field named dns_requestor. It uses its REGEX to pull out the threesegments of the dns_requestor value. Then it uses FORMAT to order those segments with periodsbetween them, like a proper URL.

Note: This method of concatenating event segments into a complete field value is something you canonly perform with index-time extractions; search-time extractions have practical restrictions thatprevent it. If you find that you must use FORMAT in this manner, you will have to create a new indexedfield to do it.

props.conf

Then, the next step would be to define a field extraction in props.conf that references thednsRequest transform and applies it to events coming from the server1 source type:

[server1]TRANSFORM-dnsExtract = dnsRequest

fields.conf

Finally, you would enter the following stanza in fields.conf

[dns_requestor]INDEXED = true

Restart Splunk for your configuration file changes to take effect.

182Extract fields from file headers at index timeExtract fields from file headers at index time

Certain data sources and source types, such as CSV and MS Exchange log files, can have headersthat contain field information. You can configure Splunk to automatically extract these fields duringindex-time event processing.

For example, a legacy CSV file--which is essentially a static table--could have a header row like

name, location, message, "start date"

which behaves like a series of column headers for the values listed afterwards in the file.

How automatic header-based field extraction works

When you enable automatic header-based field extraction for a specific source or source type,Splunk scans it for header field information, which it then uses for field extraction. If a source has thenecessary header information, Splunk extracts fields using delimiter-based key/value extraction.

Splunk does this at index time by changing the source type of the incoming data to[original_sourcetype]-N, where N is a number). Next, it creates a stanza for this new sourcetype in props.conf, defines a delimeter-based extraction rule for the static table header intransforms.conf, and then ties that extraction rule to the new source type back in its newprops.conf stanza. Finally, at search time, Splunk applies field transform to events from the source(the static table file).

You can use fields extracted by Splunk for filtering and reporting just like any other field by selectingthem from the fields sidebar in the Search view (select Pick fields to see a complete list of availablefields).

Note: Splunk will record the header line of a static table in a CSV or similar file as an event. Toperform a search that gets a count of the events in the file without including the header event, youcan run a search that identifies the file as the source while explicitly excluding the comma delimitedlist of header names that appears in the event. Here's an example:

Enable automatic header-based field extraction for any source or source type by editingprops.conf. Edit this file in $SPLUNK_HOME/etc/system/local/, or your own customapplication directory in $SPLUNK_HOME/etc/apps/<app_name>/local.

For more information on configuration files in general, see "About configuration files" in the Adminmanual.

183To turn on automatic header-based field extraction for a source or source type, addCHECK_FOR_HEADER=TRUE under that source or source type's stanza in props.conf.

Example props.conf entry for an MS Exchange source:

Set CHECK_FOR_HEADER=FALSE to turn off automatic header-based field extraction for a source orsource type.

Important: Changes you make to props.conf (such as enabling automatic header-based fieldextraction) won't take effect until you restart Splunk.

Note: CHECK_FOR_HEADER must be in a source or source type stanza.

Changes Splunk makes to configuration files

If you enable automatic header-based field extraction for a source or sourcetype, Splunk addsstanzas to copies of transforms.conf and props.conf in$SPLUNK_HOME/etc/apps/learned/local/ when it extracts fields for that source or sourcetype.

Splunk creates a stanza in transforms.conf for each source type with unique header informationmatching a source type defined in props.conf. Splunk names each stanza it creates as[AutoHeader-N], where N in an integer that increments sequentially for each source that has aunique header ([AutoHeader-1], [AutoHeader-2],...,[AutoHeader-N]). Splunkpopulates each stanza with transforms that the fields (using header information).

Here the transforms.conf entry that Splunk would add for the MS Exchange source that wasenabled for automatic header-based field extraction in the preceding example:

Splunk then adds new sourcetype stanzas to props.conf for each source with a unique name,fieldset, and delimiter. Splunk names the stanzas as [yoursource-N], where yoursource is thesource type configured with automatic header-based field extraction, and N is an integer thatincrements sequentially for each transform in transforms.conf.

For example, say you're indexing a number of CSV files. If each of those files has the same set ofheader fields and with the same delimiter in transforms.conf, Splunk maps the events indexedfrom those files to a sourcetype of csv-1 in props.conf. But if that batch of CSV files also includesa couple of files with unique sets of fields and delimiters, Splunk gives the events it indexes fromthose files sourcetypes of csv-2 and csv-3, respectively. Events from files with the same source,fieldset, and delimiter in transforms.conf will have the same sourcetype value.

Note: If you want to enable automatic header-based field extraction for a particular source, and youhave already manually specified a source type value for that source (either by defining the sourcetype in Splunk Web or by directly adding the source type to a stanza in inputs.conf) be aware thatsetting CHECK_FOR_HEADER=TRUE for that source allows Splunk to override the source type valueyou've set for it with the sourcetypes generated by the automatic header-based field extractionprocess. This means that even though you may have set things up in inputs.conf so that all csvfiles get a sourcetype of csv, once you set CHECK_FOR_HEADER=TRUE, Splunk overrides thatsourcetype setting with the incremental sourcetype names described above.

Here's the source type that Splunk would add to props.conf to tie the transform to the MSExchange source mentoned earlier:

[MSExchange-1]REPORT-AutoHeader = AutoHeader-1...

Note about search and header-based field extraction

Use a wildcard to search for events associated with source types that Splunk generated duringheader-based field extraction.

For example, a search for sourcetype="yoursource" looks like this:

sourcetype=yoursource*

Examples of header-based field extraction

These examples show how header-based field extraction works with common source types.

MS Exchange source file

This example shows how Splunk extracts fields from an MS Exchange file using automaticheader-based field extraction.

185This sample MS Exchange log file has a header containing a list of field names, delimited by spaces:

# Some previous automatic header-based field extraction

Note that Splunk automatically detects that the delim is a comma.

Splunk then ties the transform to the source by adding this to a new source type stanza inprops.conf:

...[CSV-1]REPORT-AutoHeader = AutoHeader-2...

Splunk extracts the following fields from each event:

100,21,this is a long file,nomore

• foo="100" bar="21" anotherfoo="this is a long file"

anotherbar="nomore"

200,22,wow,o rly?

• foo="200" bar="22" anotherfoo="wow" anotherbar="o rly?"

300,12,ya rly!,no wai!

• foo="300" bar="12" anotherfoo="ya rly!" anotherbar="no wai!"

Answers

Have questions? Visit Splunk Answers and see what questions and answers the Splunk communityhas around extracting fields.

187Add and manage usersAbout users and rolesAbout users and roles

If you're running Splunk Enterprise, you can create users with passwords and assign them to rolesyou have created. Splunk Free does not support user authentication.

Splunk comes with a single default user, the admin user. The default password for the admin user ischangeme. As the password implies, you should change this password immediately upon installingSplunk.

About roles

A role contains a set of capabilities, like whether or not someone is allowed to add inputs or editsaved searches, etc. The various capabilities are listed in Add users and assign roles and in$SPLUNK_HOME/etc/system/README/authorize.conf.spec. Once a role exists, you canassign users to that role.

Additionally, whenever you create a user, you can automatically create a role for that user.

By default, Splunk comes with the following roles predefined:

• admin -- this role has the most capabilities assigned to it.

• power -- this role can edit all shared objects (saved searches, etc) and alerts, tag events, and other similar tasks. • user -- this role can create and edit its own saved searches, run searches, edit its own preferences, create and edit event types, and other similar tasks.

Role names must use lowercase characters only. They cannot contain spaces, colons, or forwardslashes.

Find existing users and roles

To locate an existing user or role in Manager, use the Search bar at the top of the Users or Rolespage. Wildcards are supported. Splunk searches for the string you enter in all available fields bydefault. To search a particular field, specify that field. For example, to search only email addresses,type "email=<email address or address fragment>:, or to search only the "Full name" field, type"realname=<name or name fragment>. To search for users in a given role, use "roles=".

188Add users and assign rolesAdd users and assign roles

This topic describes how to create new users and change the properties (like password) of existingusers. This topic also describes how to assign users to roles in Splunk's role-based access controlsystem.

Note: Role names must use lowercase characters. For example: "admin", not "Admin".

Add and edit users via Splunk Web

• In Splunk Web, click Manager.

• Click Access controls. • Click Users. • Click New or edit an existing user. • Specify new or changed information for this user. • Assign this user to an existing role or roles and click Save.

When you create a user, you can create a role for that user as well. You can then edit that role tospecify what access that user has to Splunk.

NB: Passwords with special characters that would be interpreted by the shell (for example '$' or '!')must be either escaped or single-quoted:

./splunk edit user admin -password 'fflanda$' -role admin -auth

admin:changeme

or

./splunk edit user admin -password fflanda\$ -role admin -auth

admin:changeme

Add and edit roles using Splunk Web

• In Splunk Web, click Manager.

• Click Access controls. • Click Roles. • Click New or edit an existing role. • Specify new or changed information for this role. In particular, you can:

189 ♦ restrict what data this role can search with a search filter (see "Search filter format" below for more information) ♦ restrict over how large of a window of time this role can search ♦ specify whether this role inherits capabilities and properties from any other roles ♦ choose individual capabilities for this role ♦ specify an index or indexes that this role will search by default ♦ specify whether this role is restricted to a specific index or indexes. • Click Save.

Note: Members of multiple roles inherit capabilities and properties from the role with the loosestpermissions.

Add and edit roles using authorize.conf

Configure roles by editing authorize.conf. Roles are defined by lists of capabilities. You can also useroles to create fine-grained access controls by setting a search filter for each role.

Caution: Do not edit or delete any roles in

$SPLUNK_HOME/etc/system/default/authorize.conf. This could break your admincapabilities. Edit this file in $SPLUNK_HOME/etc/system/local/, or your own custom applicationdirectory in $SPLUNK_HOME/etc/apps/. For more information on configuration files in general, seeAbout configuration files.

Add the following attribute/value pairs to $SPLUNK_HOME/etc/system/local/authorize.conf:

You can include these attributes:

• role_$ROLE_NAME ♦ The name you want to give your role, for example security, compliance, ninja. Make sure the name is lowercase. • $CAPABILITY1 ♦ Any capability from the list. here. You can have any number of capabilities for a role. • importRoles = <role>;<role>;... ♦ When set, the current role will inherit all the capabilities from <role>. ♦ Separate multiple roles, if any, with semicolons. • srchFilter = <search> ♦ Use this field for fine-grained access controls. Searches for this role will be filtered by this expression. • srchTimeWin = <string> ♦ Maximum time span (in seconds) of a search executed by this role. • srchDiskQuota = <int> ♦ Maximum amount of disk space (MB) that can be taken by search jobs of a user that belongs to this role.

190 • srchJobsQuota = <int> ♦ Maximum number of concurrently running searches a member of this role can have. • rtSrchJobsQuota = <number> ♦ Maximum number of concurrently running real-time searches a member of this role can have. • srchIndexesDefault = <string> ♦ Semicolon delimited list of indexes to search when no index is specified. ♦ These indexes can be wildcarded, with the exception that '*' does not match internal indexes. ♦ To match internal indexes, start with '_'. All internal indexes are represented by '_*'. • srchIndexesAllowed = <string> ♦ Semicolon delimited list of indexes this role is allowed to search. ♦ Follows the same wildcarding semantics as srchIndexesDefault.

Note: You must reload authentication or restart Splunk after making changes to authorize.conf.Otherwise, your new roles will not appear in the Role list. To reload authentication, go to theManager > Authentication section of Splunk Web. This refreshes the authentication caches, butdoes not boot current users.

Search filter format

The srchFilter/Search filter field can include any of the following search terms:

• source= • host= and host tags • index= and index names • eventtype= and event type tags • sourcetype= • search fields • wildcards • use OR to use multiple terms, or AND to make searches more restrictive

Note: Members of multiple roles inherit properties from the role with the loosest permissions. In thecase of search filters, if a user is assigned to roles with different search filters, they are all combinedvia OR.

The search terms cannot include:

Map a user to a role via Splunk Web

Once you've created a role in authorize.conf, map a user or users to it via Splunk Web.

• Click on the Manager link in the upper right-hand corner.

• Then, click the Users link. • Edit an existing user or create a new one.

191 • Choose which role to map to from the Role list. ♦ Any custom roles you have created via authorize.conf will be listed here.

Example of creating a role in authorize.conf

This example creates the role "ninja", which inherits capabilities from the default "user" role. ninja hasalmost the same capabilities as the default "power" role, except it cannot schedule searches. Inaddition:

• The search filter limits ninja to searching on host=foo.

• ninja is allowed to search all public indexes (those that do not start with underscore) and will search the indexes mail and main if no index is specified in the search. • ninja is allowed to run 8 search jobs and 8 real-time search jobs concurrently. (These counts are independent.) • ninja is allowed to occupy up to 500MB total space on disk for all its jobs.

List of available capabilities

This list shows capabilities available for roles. Check authorize.conf for the most up-to-date version ofthis list. The admin role has all the capabilities in this list except for the "delete_by_keyword"capability.

Capablity Meaning Has access to objects in the system (user objects, search jobs,admin_all_objects etc.).change_authentication Can change authentication settings and reload authentication.change_own_password Can change own user password.delete_by_keyword Can use the "delete" search operator.edit_deployment_client Can change deployment client settings.edit_deployment_server Can change deployment server settings.edit_dist_peer Can add and edit peers for distributed search.edit_forwarders Can change forwarder settings.edit_httpauths Can edit and end user sessions.edit_input_defaults Can change default hostnames for input data.edit_monitor Can add inputs and edit settings for monitoring files.

192edit_roles Can edit roles and change user/role mappings.edit_scripted Can create and edit scripted inputs. Can edit general distributed search settings like timeouts,edit_search_server heartbeats, and blacklists.edit_server Can edit general server settings like server name, log levels, etc. Can change settings for receiving TCP inputs from another Splunkedit_splunktcp instance.edit_splunktcp_ssl Can list or edit any SSL-specific settings for Splunk TCP input.edit_tcp Can change settings for receiving general TCP inputs.edit_udp Can change settings for UDP inputs.edit_user Can create, edit, or remove users.edit_web_settings Can change settings for web.conf.get_metadata Enables the "metadata" search processor.get_typeahead Enables typeahead.indexes_edit Can change index settings like file size and memory limits.license_tab Can access and change the license.list_forwarders Can show forwarder settings.list_httpauths Can list user sessions. Can list various inputs, including input from files, TCP, UDP, scripts,list_inputs etc.request_remote_tok Can get a remote authentication token.rest_apps_management Can edit settings in the python remote apps handler.rest_apps_view Can list properties in the python remote apps handler.rest_properties_get Can get information from the services/properties endpoint.rest_properties_set Can edit the services/properties endpoint.restart_splunkd Can restart Splunk through the server control handler.rtsearch Can run real-time searches.schedule_search Can schedule saved searches.search Can run searches.use_file_operator Can use the "file" search operator.Set up user authentication with SplunkSet up user authentication with Splunk

Splunk ships with support for three types of authentication systems:

• Splunk's own built-in system

193 • LDAP • A scripted authentication API for use with an external authentication system, such as PAM or RADIUS.

Splunk with an Enterprise license comes with Splunk's built-in authentication enabled by default.Splunk's authentication allows you to add users, assign them to roles, and give those roles custompermissions as needed for your organization. Splunk's built-in system always takes precedence overany external systems, including LDAP.

If your enterprise uses LDAP and you'd like to use the users and groups defined there instead ofusing Splunk's authentication system, look here for information. To use scripted authentication toconnect with a system such as PAM or RADIUS, see this topic.

Set up user authentication with LDAP

Set up user authentication with LDAP

Splunk ships with support for three types of authentication systems:

• Splunk's own built-in system, described in Add users and assign roles. • LDAP, described in the topic you're now reading. • A scripted authentication API for use with an external authentication system, such as PAM or RADIUS, described in Configure Splunk to use scripted authentication.

Splunk supports LDAP v2 and v3, but does not support LDAP referrals. LDAP v3 is the defaultprotocol used. Check the Splunk Community Wiki for information about ways to authenticate againstan LDAP server that returns referrals (such as Active Directory).

Overview of the process

This topic provides procedures to do the following:

• Configure Splunk to use LDAP authentication

• Map existing LDAP groups to Splunk roles

In addition, see how to import your CA if your LDAP server requires it for SSL use.

Be sure to read "Things to know about Splunk and LDAP" at the end of this topic before proceeding.

User Management

You cannot add, edit, or delete LDAP users with Splunk. Instead, you must manage users within yourLDAP server. For example:

• To add an LDAP user to a Splunk role, add the user to the LDAP group on your LDAP server. • To change a user's role membership, change the LDAP group that the user is a member of on your LDAP server. • To remove a user from a Splunk role, remove the user from the LDAP group on your LDAP server.

194Note: Beginning with 4.1, Splunk automatically checks LDAP membership information when a userattempts to log into Splunk. You no longer need to reload the authentication configuration whenadding or removing users.

Configure LDAP

This topic describes how to configure LDAP through Splunk Web. If you want to configure LDAP byediting authentication.conf, you can see complete configuration examples in the Configurationfile reference and the Splunk Community Wiki topic "Authenticate against an LDAP server thatreturns referrals".

If you are configuring authentication via the configuration file and wish to switch back to the defaultSplunk authentication, the simplest way is to move the existing authentication.conf file out ofthe way (rename to *.disabled is fine) and restart Splunk. This will retain your previousconfiguration unchanged if you expect to return to it later.

Determine your User and Group Base DN

Before you map your LDAP settings in Splunk, figure out your user and group base DN, ordistinguished name. The DN is the location in the directory where authentication information is stored.If group membership information for users is kept in a separate entry, enter a separate DN identifyingthe subtree in the directory where the group information is stored. If your LDAP tree does not havegroup entries, you can set the group base DN to the same as the user base DN to treat users as theirown group. This requires further configuration, described later.

If you are unable to get this information, please contact your LDAP Administrator for assistance.

Set up LDAP via Splunk Web

First, set LDAP as your authentication strategy:

1. Click Manager in Splunk Web.

2. Under System configurations, click Access controls.

3. Click Authentication method.

4. Select the LDAP radio button.

5. Click Configure Splunk to work with LDAP.

6. Click New.

7. Enter an LDAP strategy name for your configuration.

8. Enter the Host name of your LDAP server. Be sure that your Splunk Server can resolve the hostname.

9. Enter the Port that Splunk should use to connect to your LDAP server.

• Splunk uses this attribute to locate user information.

• Note: You must set this attribute for authentication to work.

• Note: This is recommended to return only applicable users. For example, (department=IT). • Default value is empty, meaning no user entry filtering.

15. Enter the User name attribute that contains the user name.

• Note: The username attribute cannot contain whitespace. The username must be lowercase. • In Active Directory, this is sAMAccountName. • The value uid should work for most other configurations.

16. Enter the Real name attribute (common name) of the user.

• Typical values are displayName or cn (common name).

17. Enter the Group mapping attribute.

• This is the user entry attribute whose value is used by group entries to declare membership. • The default is dn for active directory; set this attribute only if groups are mapped using some other attribute besides user DN. • For example, a typical attribute used to map users to groups is uid.

18. Enter the Group base DN. You can specify multiple group base DN entries by separating themwith semicolons.

196 • Ths is the location of the user groups in LDAP. • If your LDAP environment does not have group entries, you can treat each user as its own group: ♦ Set groupBaseDN to the same value as userBaseDN. This means you will search for groups in the same place as users. ♦ Next, set the groupMemberAttribute and groupMappingAttribute to the same attribute as userNameAttribute. This means the entry, when treated as a group, will use the username value as its only member. ♦ For clarity, you should probably also set groupNameAttribute to the same value as userNameAttribute.

19. Enter the Group base filter for the object class you want to filter your groups on.

• Note: This is recommended to return only applicable groups. For example, (department=IT). • Default value is empty, meaning no group entry filtering.

20. Enter the Group name attribute.

• This is the group entry attribute whose value stores the group name. • This is usually cn.

21. Enter the Group member attribute.

• This is the group attribute whose values are the group's members. • This is typically member or memberUid.

Map existing LDAP groups to Splunk roles

Once you have configured Splunk to authenticate via your LDAP server, map your existing LDAPgroups to any roles you have created. If you do not use groups, you can map users individually.

Note: You can map either users or groups, but not both. If you are using groups, all users you wantto access Splunk must be members of an appropriate group. Groups inherit capabilities from thehighest level role they're a member of.

All users that can login are visible in the Users page in Splunk Manager. Assign roles to groups in thegroup mapping page under Access controls in Splunk Manager.

Test your LDAP configuration

If you find that your Splunk install is not able to successfully connect to your LDAP server, try thesetroubleshooting steps:

You can set up a stanza to map any custom roles you have created in authorize.conf to LDAP groupsyou have enabled for Splunk access in authentication.conf:

[roleMap]admin = SplunkAdminsitusers = ITAdmins

Map users directly

If you need to map users directly to a Splunk role, you can do so by setting the groupBaseDN to thevalue of userBaseDN. Also, set the attributes for groupMappingAttribute,groupMemberAttribute, and groupNameAttribute to the same attribute asuserNameAttribute. For example:

Converting from Splunk built-in authentication to LDAP

Usernames in Splunk's built-in authentication system always take precedence over the sameusernames in LDAP. So, if you have converted from Splunk's built-in authentication system to LDAP,you might need to delete users from Splunk's built-in system to ensure that you're using LDAPcredentials. This is only necessary if usernames are the same in both systems.

If your LDAP usernames are the same as the names you previously used in the built-in system,saved searches should work without any conversion.

200If you have existing saved searches created when your system was using Splunk's built-inauthentication and you'd like to transfer them to an LDAP user of a different name, edit the metadata:

1. Modify $SPLUNK_HOME/etc/apps/<app_name>/metadata/local.meta and swap the

owner = <username> field under each savedsearch permission stanza to the correspondingLDAP username and save your changes.

2. Restart Splunk for your changes to take effect.

Things to know about Splunk and LDAP

When configuring Splunk to work with your LDAP instance, note the following:

• Entries in Splunk Web and authentication.conf are case sensitive.

• Splunk only works with one LDAP server at a time. • Any user explicitly created locally using Splunk native authentication will have precedence over an LDAP user of the same name. For example, if the LDAP server has a user with a cname of 'admin' and the default Splunk user of the same name is present, the Splunk user will win. Only the local password will be accepted, and upon login the roles mapped to the local user will be in effect. • The number of LDAP groups Splunk Web can display for mapping to roles is limited to the number your LDAP server can return in a query. ♦ To prevent Splunk from listing unnecessary groups, use the groupBaseFilter. Example: groupBaseFilter = (|(cn=SplunkAdmins)(cn=SplunkPowerUsers)(cn=Help Desk)) ♦ If you must role map more than the maximum number of groups, you can edit authentication.conf directly:

Have questions? Visit Splunk Answers and see what questions and answers the Splunk communityhas around LDAP authentication with Splunk.

Set up user authentication with external systems

Set up user authentication with external systems

Splunk ships with support for three types of authentication systems:

• Splunk's own built-in system, described in Add users and assign roles. • LDAP, described in Set up user authentication with LDAP. • A scripted authentication API for use with an external authentication system, such as PAM or RADIUS, described in this topic.

201How scripted authentication works

In scripted authentication, a user-generated Python script serves as the middleman between theSplunk server and an external authentication system such as PAM or RADIUS.

The API consists of a few functions that handle communications between Splunk and theauthentication system. You need to create a script with handlers that implement those functions.

To use your authentication system with Splunk, make sure the authentication system is running andthen do the following:

• Create the Python authentication script.

Splunk provides several example authentication scripts and associated configuration files, includingone set for RADIUS and another for PAM. There is also a simple script called dumbScripted.py,which focuses on the interaction between the script and Splunk.

You can use an example script and configuration file as the starting point for creating your own script.You must modify them for your environment.

You can find these examples in $SPLUNK_HOME/share/splunk/authScriptSamples/. That

directory also contains a README file with information on the examples, as well as additionalinformation on setting up the connection between Splunk and external systems.

Important: Splunk does not provide support for these scripts, nor does it guarantee that they will fullymeet your authentication and security needs. They are meant to serve as examples that you canmodify or extend as needed.

Create the authentication script

You must create a Python script that implements these authentication functions:

• userLogin • getUserInfo • getUsers

The Splunk server will call these functions as necessary, either to authenticate user login or to obtaininformation on a user's roles.

The script can optionally also include a handler for this function:

• getSearchFilter

This table summarizes the authentication functions, their arguments, and their return values:

Note the following:

• userInfo must specify a sem

Return a user's • <userId> is deprecated; you s information,getUserInfo --username=<username> • <username> is required. including name • <realname> is optional, but its and role(s). • <roles> is required. To return For example: admin:power • This example returns just the ro

--status=success --userInfo=;docsp --status=success|fail --userInfo=< --userInfo=<userId>;<username>;<re --userInfo=<userId>;<username>;<re Return information for Note the following:getUsers none all Splunk users. • See getUserInfo for informa information. • Separate each user's informatiogetSearchFilter Optional. --username=<username> --status=success|fail --search_fil Returns the filters applied specifically to this user, along with those applied to the user's roles. The filters are OR'd together.

Note: User-based search filters are optional and not recommended. A better approach is to assign search

203 filters to roles and then assign users to the appropriate roles.See the example scripts for detailed information on how to implement these functions.

Test the script

Since the communication between Splunk and the script occurs via stdin and stdout, you can testthe script interactively in your command shell, without needing to call it from Splunk. Be sure to sendone argument per line and end each function call with an EOF (Ctrl-D).

Test each function individually, using this pattern:

> python [script] [function name]

The following example shows a debugging session that does some simple testing of a fictional scriptcalled "example.py", with two users "alice" and "bob". "alice" is a member of the "admin" and "super"roles, and "bob" is a member of the "user" role.

Set cache durations

To significantly speed authentication performance when using scripted authentication, make use ofSplunk's authentication caching capability. You do so by adding the optional [cacheTiming]stanza. Each script function (except getSearchFilter) has a settable cacheTiming attribute,which turns on caching for that function and specifies its cache duration. For example, to specify thecache timing for the getUserInfo function, use the getUserInfoTTL attribute. Caching for afunction occurs only if its associated attribute is specified.

The cacheTiming settings specify the frequency at which Splunk calls your script to communicatewith the external authentication system. You can specify time in seconds (s), minutes (m), hours (h),days (d), etc. Typically, you'll limit the cache frequency to seconds or minutes. If a unit is notspecified, the value defaults to seconds. So, a value of "5" is equivalent to "5s".

This example shows typical values for the caches:

[cacheTiming]userLoginTTL = 10sgetUserInfoTTL = 1mgetUsersTTL = 2m

You'll want to set userLoginTTL to a low value, since this setting determines login requirements.

To refresh all caches immediately, use the CLI command reload auth:

./splunk reload auth

Note: This command does not boot current users off the system.

You can also refresh caches in Splunk Web:

1. Click Manager in the upper right-hand corner of Splunk Web.

2. Under System configurations, click Access controls.

2053. Click Authentication method.

4. Click Reload authentication configuration to refresh the caches.

Each specified function, except getUsers, has a separate cache for each user. So, if you have 10users logged on and you've specified the getUserInfoTTL attribute, the getUserInfo functionwill have 10 user-based caches. The getUsers function encompasses all users, so it has a single,global cache.

Edit pamauth for PAM authentication

If you're using PAM and you're unable to authenticate after following the steps in the exampledirectory's README, edit /etc/pam.d/pamauth and add this line:

auth sufficient pam_unix.so

Use single sign-on (SSO) with Splunk

Use single sign-on (SSO) with Splunk

Use this topic to configure Splunk to work with your enterprise's SSO solution.

How Splunk works with SSO

Configuring Splunk to work with SSO requires that any Splunk instance to be accessed via SSO issecured behind an HTTP proxy. The HTTP proxy you configure is then responsible for handlingauthentication and is the sole entity capable of communicating with Splunk Web (and the underlyingSplunk server process, splunkd). Splunk's SSO implementation currently supports IIS and ApacheHTTP proxies.

How authentication is handled

Splunk's SSO implementation expects that your user authentication is handled outside of Splunk by aweb proxy. The web proxy server must be configured to authenticate against your externalauthentication system. Once a user has been authenticated by the proxy, the proxy must insert theauthenticated user's username as a REMOTE_USER header in all HTTP requests forwarded to SplunkWeb.

Splunk accepts incoming HTTP requests which include a REMOTE_USER header from a trusted proxy.If the user in the REMOTE_USER header is not currently authenticated by Splunk, an authenticationrequest is made to Splunk via a trusted authentication endpoint the splunkd process provides. IfSplunk returns a valid session key, Splunk Web can begin making secure requests as the user set inthe REMOTE_USER header. All subsequent requests from the proxy to Splunk Web must include theREMOTE_USER header. If REMOTE_USER is not provided in every request, the REMOTE_USER isassumed to not be authenticated and will receive a Splunk login screen.

Note: If your proxy uses some other remote user header name besides REMOTE_USER, you canchange the name of the header through the remoteUser attribute in web.conf, as describedbelow.

206Important: Splunk's SSO implementation supports logging in to Splunk Web only, not the commandline interface or directly to splunkd endpoints.

Configure Splunk to work with SSO

Setting up SSO on Splunk requires several steps:

• Set up a proxy server.

Set up a proxy server

Splunk's SSO implementation supports most proxy servers, including Apache and IIS HTTP proxies.The proxy server must handle its own authentication and must insert the authorized username into aREMOTE_USER (or equivalent) header for all HTTP requests it forwards to Splunk.

For more information on configuring an Apache or IIS proxy server, refer to this material:

• For Apache, see http://httpd.apache.org/docs/2.0/mod/mod_proxy.html

• For IIS, the Splunk SSO implementation has been tested with Helicon's ISAPI_Rewrite 3.0. Get it at: http://www.helicontech.com/download-isapi_rewrite3.htm

Note: The Helicon ISAPI must include the proxy module. Do not use the 'lite' version of HeliconISAPI, as it does not have the required proxy module.

Edit server.conf

Edit $SPLUNK_HOME/etc/system/local/server.conf to set the value of trustedIP to the IP

address that will make the secure requests to the splunkd process. Typically, this is the address ofyour Splunk Web interface. If on the same machine, you can use the localhost IP (127.0.0.1). If youhost Splunk Web on a different interface, use the IP for that interface.

Note: You can only configure splunkd to trust one Splunk Web IP.

Example:

[general]serverName = myservertrustedIP = 127.0.0.1

[sslConfig]sslKeysfilePassword = $1&Jf96BQ0bdFG0

Edit web.conf

In the [settings] stanza of $SPLUNK_HOME/etc/system/local/web.conf, set one or more of

the following attributes:

Attribute Required? Default Value

207 Set this to the IP address of the authenticating proxy or proxies. Specify a single address or atrustedIP yes n/a comma-separated list of addresses; IP ranges and netmask notation are not supported. If you host Splunk Web behind a proxy that does not place Splunk Web at the proxy's root, change this setting to reflect the offset from the root. Forroot_endpoint no the proxy's root example, if your proxy hosts Splunk Web at "fflanda.com:9000/splunk", set the value to /splunk. Sets the remote user header. Most proxies forward the authenticated username in an HTTP header called REMOTE_USER. However, some may use aremoteUser no REMOTE_USER different header, such as REMOTE-USER (with a hyphen instead of an underscore). If the proxy you are using does not use REMOTE_USER, specify the HTTP header that Splunk Web should look for. For Apache 1.x proxies only. Set this attribute to "true". This configuration instructs CherryPy (the Splunk Web HTTP server) to look for an incoming X-Forwarded-Host header and to use the value of that header to construct canonical redirect URLstools.proxy.on no false that include the proper host name. For more information, refer to the CherryPy documentation on running behind an Apache proxy. This setting is only necessary for Apache 1.1 proxies. For all other proxies, the setting must be "false", which is the default. Specifies the SSO mode for Splunk Web. The value is either "permissive" or "strict":

• Permissive mode honors incoming requests

from IPs not specified in the trustedIP settingSSOMode no permissive but refuses to use SSO authentication if it receives requests from these unsupported IPs. • Strict mode completely shuts down all requests unless they originate from an IP address specified in the trustedIP setting.

Set up users in Splunk that match users in your authentication system

You must create or map users in Splunk that have the same username as the users authenticatingvia the proxy. For example, if you configure LDAP to use a name field that resolves as "omarlittle",Splunk assumes that the proxy will set REMOTE_USER to "omarlittle". Alternatively, you can create aSplunk user with a username of "omarlittle" to allow this user to login via SSO using native Splunkauthentication. For information about creating users in Splunk, refer to "Add users and assign roles"in this manual.

Debug issues

Splunk provides a URI for debugging any problems with SSO. This URI is located at:

http://YourSplunkServer:8000/debug/sso

This page provides diagnostic information about headers, usernames, and more.

Delete user accounts using the CLI

Delete user accounts using the CLI

Remove all the user data (user accounts) from your Splunk installation by typing ./splunk cleanfollowed by the userdata argument. This deletes all the user accounts other than the default useraccounts included with Splunk (admin, power, user).

To remove all of the user accounts in the system:

./splunk clean userdata

To remove the user accounts in the system and force Splunk to skip the confirmation prompt:

./splunk clean userdata -f

User language and locale

User language and locale

When a user logs in, Splunk automatically uses the language that the user's browser is set to. Toswitch languages, change the browser's locale setting. Locale configurations are browser-specific.

Splunk detects locale strings. A locale string contains two components: a language specifier and alocalization specifier. This is usually presented as two lowercase letters and two uppercase letters

209linked by an underscore. For example, "en_US" means US English and "en_GB" means BritishEnglish. When looking for a suitable translation, Splunk first tries to find an exact match for the wholelocale, but will fallback to just the language specifier if the entire setting is not available. For example,translations for "fr" answer to requests for "fr_CA" and "fr_FR".

The user's locale also affects how dates, times, numbers, etc., are formatted, as different countrieshave different standards for formatting these entities.

Splunk provides built-in support for these locales:

en_GBen_USja_JPzh_CNzh_TW

How browser locale affects timestamp formatting

By default, timestamps in Splunk are formatted according the browser locale. If the browser isconfigured for US English, the timestamps are presented in American fashion:MM/DD/YYYY:HH:MM:SS. If the browser is configured for British English, then the timestamps will bepresented in the traditional European date format: DD/MM/YYYY:HH:MM:SS.

Override the browser locale

The locale that Splunk uses for a given session can be changed by modifying the url that you use toaccess Splunk. Splunk urls follow the form http://host:port/locale/.... For example, whenyou access Splunk to log in, the url may appear ashttp://hostname:8000/en-US/account/login for US English. To use British English settings,you can change the locale string to http://hostname:8000/en-GB/account/login. Thissession then presents and accepts timestamps in British English format for its duration.

Requesting a locale for which the Splunk interface has not been localized results in the message:Invalid language Specified.

Refer to "Translate Splunk" in the Developer Manual for more information about localizing Splunk.

Configure user session timeouts

Configure user session timeouts

The amount of time that elapses before a Splunk user's session times out depends on the interactionamong three timeout settings:

• The splunkweb session timeout.

• The splunkd session timeout. • The browser session timeout.

The splunkweb and splunkd timeouts determine the maximum idle time in the interaction betweenbrowser and Splunk. The browser session timeout determines the maximum idle time in interaction

210between user and browser.

The splunkweb and splunkd timeouts generally have the same value, as the same Manager fieldsets both of them. To set the timeout in the Manager:

1. Click Manager in the upper right-hand corner of Splunk Web.

2. Under System configurations, click System settings.

3. Click General settings.

4. In the System timeout field, enter a timeout value.

5. Click Save.

This sets the user session timeout value for both splunkweb and splunkd. Initially, they share thesame value of 60 minutes. They will continue to maintain identical values, if you change the valuethrough the Manager.

If, for some reason, you need to set the timeouts for splunkweb and splunkd to different values,you can do so by editing their underlying configuration files, web.conf (tools.session.timeoutattribute) and server.conf (sessionTimeout attribute). For all practical purposes, there's noreason to give them different values. In any case, if the user is using SplunkWeb (splunkweb) toaccess the Splunk instance (splunkd), the smaller of the two timeout attributes prevails. So, iftools.session.timeout in web.conf has a value of "90" (minutes), and sessionTimeout inserver.conf has a value of "1h" (1 hour; 60 minutes), the session will timeout after 60 minutes.

In addition to setting the splunkweb/splunkd session value, you can also specify the timeout forthe user browser session by editing the ui_inactivity_timeout value in web.conf. The Splunkbrowser session will time out once this value is reached. The default is 60 minutes. Ifui_inactivity_timeout is set to less than 1, there's no timeout -- the session will stay alive whilethe browser is open.

The countdown for the splunkweb/splunkd session timeout does not begin until the browsersession reaches its timeout value. So, to determine how long the user has before timeout, add thevalue of ui_inactivity_timeout to the smaller of the timeout values for splunkweb andsplunkd. For example, assume the following:

• splunkweb timeout: 15m

• splunkd timeout: 20m

• browser (ui_inactivity_timeout) timeout: 10m

The user session stays active for 25m (15m+10m). After 25 minutes of no activity, the user will beprompted to login again.

Note: If you change a timeout value, either in the Manager or in configuration files, you must restartSplunk for the change to take effect.

211Manage indexesAbout managing indexesAbout managing indexes

When you add data to Splunk, Splunk processes it and stores it in an index. By default, data youfeed to Splunk is stored in the main index, but you can create and specify other indexes for Splunk touse for different data inputs.

Indexes are stored in directories, which are located in $SPLUNK_HOME/var/lib/splunk. An index

is a collection of directories. Index directories are also called buckets and are organized by age. Fordetailed information on index storage, see "How Splunk stores indexes".

In addition to the main index, Splunk comes preconfigured with a number of internal indexes. Internalindexes are named starting with an underscore (_). The internal indexes store audit, indexing volume,Splunk logging, and other data. You can see a full list of indexes in Splunk Web if you click on theManager link in the upper right hand of Splunk Web and then click Indexes:

• main: the default Splunk index. All processed data is stored here unless otherwise specified. • _internal: this index includes internal logs and metrics from Splunk's processors. • sampledata: a small amount of sample data is stored here for training purposes. • _audit: events from the file system change monitor, auditing, and all user search history.

Read on in this section for information about ways to manage the indexing process, including:

• Setting up multiple indexes, moving indexes, removing index data

If you're interested in the indexing process

Refer to:

• The section How indexing works in this manual.

• The section "How Splunk stores indexes" in this manual. • The section Set up and use summary indexes in the Knowledge Manager manual, for information on working with extremely large datasets. • The topic about Search performance on the Community Wiki.

Set up multiple indexes

Set up multiple indexes

Splunk ships with an index called main that, by default, holds all your events. By default, Splunk alsocreates a number of other indexes for use by its internal systems, as well as for additional Splunkfeatures such as summary indexing and event auditing.

Splunk with an Enterprise license lets you add an unlimited number of additional indexes. The main

212index serves as the default index for any input or search command that doesn't specify an index,although you can change the default. You can add indexes using Splunk Web, Splunk's CLI, orindexes.conf.

Why have multiple indexes?

There are several key reasons for having multiple indexes:

• To control user access.

The main reason you'd set up multiple indexes is to control user access to the data that's in them.When you assign users to roles, you can limit user searches to specific indexes based on the rolethey're in.

In addition, if you have different policies for retention for different sets of data, you might want to sendthe data to different indexes and then set a different archive or retention policy for each index.

Another reason to set up multiple indexes has to do with the way Splunk search works. If you haveboth a high-volume/high-noise data source and a low-volume data source feeding into the sameindex, and you search mostly for events from the low-volume data source, the search speed will beslower than necessary, because Splunk also has to search through all the data from the high-volumesource. To mitigate this, you can create dedicated indexes for each data source and route data fromeach source to its dedicated index. Then, you can specify which index to search on. You'll probablynotice an increase in search speed.

Specify an index or indexes to search

When Splunk searches, it targets the default index (by default, main) unless otherwise specified. Ifyou have created a new index, or want to search in any index that is not default, you can specify theindex in your search:

index=hatch userid=henry.galeThis searches in the hatch index for the userid=henry.gale.

You can also specify an alternate default index for a given role to search when you create or edit thatrole.

Create and edit indexes

You can create or edit indexes with Splunk Web, the Splunk CLI, or directly, via indexes.conf.

Note: When setting the maximum size (maxDataSize), you should use "auto_high_volume" for highvolume indexes (such as the main index), otherwise use "auto".

3. When you've set the values you want, click Save. The index is created. You must restart Splunkwhen you create a new index or edit the properties of an existing index.

You can edit an index by clicking on the index name in the Indexes section of Manager in SplunkWeb. If you edit the properties of an existing index, you must restart Splunk.

Properties that you cannot change are grayed out. To change these properties, use indexes.conf.

Note: Some index properties are configurable only if you create or edit indexes through theindexes.conf file. Check the indexes.conf topic for a complete list of properties.

Use the CLI

To use Splunk's CLI, navigate to the $SPLUNK_HOME/bin/ directory and use the ./splunkcommand.

Important: You must stop Splunk before you edit the properties of an existing index. You do notneed to stop Splunk to create a new index.

To add or edit a new index called "fflanda" using the CLI:

./splunk [add|edit] index fflanda

You can also specify a value for any option in indexes.conf by passing it as a flag (for example,-dir) to the [add|edit] index <name> command.

You must restart Splunk when you create a new index or edit the properties of an existing index.

Edit indexes.conf

To add a new index, you add a stanza to indexes.conf in $SPLUNK_HOME/etc/system/local,

identified by the name of the new index. See configuration details and examples in the indexes.conftopic.

Note: The most accurate and up-to-date list of settings available for a given configuration file is in the.spec file for that configuration file. You can find the latest version of the .spec and .example files

214in the Configuration file reference in this manual, or in $SPLUNK_HOME/etc/system/README.

Disable or delete an index

You can disable the use of an index in Splunk Web. To do this, navigate to Manager > Indexes andclick Disable to the right of the index you want to disable.

To delete an index, edit indexes.conf and remove its stanza. You cannot delete an index withSplunk Web or the CLI.

Important: You must stop Splunk before deleting an index.

Route events to specific indexes

Just as you can route events to specific queues, you can also route events to specific indexes.

By default, Splunk sends all events to the index called main. However, you may want to send specificevents to other indexes. For example, you might want to segment data or to send event data from anoisy source to an index that is dedicated to receiving it. You can route data locally or route data youare receiving from remote sources or Splunk instances.

Note: When you place data in an alternate index, you must specify the index in your search with theindex= command when you want to search that index:

index=fooSend all events from a data input to a specific index

To configure routing for all events from a particular data input to an alternate index, add the followingto the appropriate stanza in inputs.conf.

Add the following stanza to $SPLUNK_HOME/etc/system/local/transforms.conf:

Note the following:

• <transforms_name> must match the <transforms_name> identifier you specified in

props.conf.

• <your_custom_regex> must provide a match for the attribute you identified earlier.

• DEST_KEY must be set to the index attribute _MetaData:Index.

• <alternate_index_name> specifies the alternate index that the events will route to.

Example

In this example, we route events of windows_snare_log sourcetype to the appropriate index basedon their log types. "Application" logs will go to an alternate index, while all other log types, such as

216"Security", will go to the default index.

To make this determination, we use props.conf to direct events of windows_snare_log

sourcetype through the transforms.conf stanza named "AppRedirect", where a regex then looksfor the log type, "Application". Any event with a match on "Application" in the appropriate location isrouted to the alternate index, "applogindex". All other events go to the default index.

Add this stanza to $SPLUNK_HOME/etc/system/local/transforms.conf:

This stanza processes the events directed here by props.conf. Events that match the regex, bycontaining the string "Application" in the specified location, get routed to the alternate index,"applogindex". All other events route to the default index.

Set limits on disk usage

Set limits on disk usage

There are several methods for controlling disk space used by Splunk. Most disk space will be used bySplunk's indexes, which include the compressed raw data. If you run out of disk space, Splunk will

217stop indexing. You can set a minimum free space limit to control how low you will let free disk spacefall before indexing stops. Indexing will resume once space exceeds the minimum.

Set minimum free disk space

You can set a minimum amount of space to keep free on the disk where indexed data is stored. If thelimit is reached, the server stops indexing data until more space is available. The default minimum is2000MB.

Note:

• Splunk will not clear any of its own disk space with this method. It will simply pause for more space to become available. • Events can be lost if they are not written to a file during such a pause.

You can set minimium free disk space through Splunk Web, the CLI, or the server.confconfiguration file.

• Click Manager in the upper right corner of Splunk Web.

• Enter your desired minimum free disk space in megabytes.

• Click Save.

Restart Splunk for your changes to take effect.

From the command line interface (CLI)

You can set the minimum free disk space via Splunk's CLI. To use the CLI, navigate to the$SPLUNK_HOME/bin/ directory and use the ./splunk command. Here, you set the minimum freedisk space to 20,000MB (20GB):

# splunk set minfreemb 20000

# splunk restart

218In server.conf

You can also set the minimum free disk space in the server.conf file. The relevant stanza/attribute isthis:

[diskUsage]minFreeSpace = <num>

Note that <num> represents megabytes. The default is 2000.

Control database storage

The indexes.conf file contains index configuration settings. You can control disk storage usage byspecifying maximum index size or maximum age of data. When one of these limits is reached, theoldest indexed data will be deleted (the default) or archived. You can archive the data by using apredefined archive script or creating your own.

For detailed instructions on how to use indexes.conf to set maximum index size or age, see "Set aretirement and archiving policy".

For information on creating archive scripts, see "Archive indexed data".

Why you might care

You might not care, actually. Splunk handles indexed data by default in a way that gracefully ages thedata through several stages. After a long period of time, typically several years, Splunk removes olddata from your system. You might well be fine with the default scheme it uses.

However, if you're indexing large amounts of data, have specific data retention requirements, orotherwise need to carefully plan your aging policy, you've got to read this topic. Also, to back up yourdata, it helps to know where to find it. So, read on....

219How Splunk ages data

Each of the index directories is known as a bucket. To summarize so far:

• A Splunk index resides across many age-designated index directories. • An index directory is a bucket.

A bucket moves through several stages as it ages:

• hot • warm • cold • frozen

As buckets age, they "roll" from one stage to the next. Newly indexed data goes into a hot bucket,which is a bucket that's both searchable and actively being written to. After the hot bucket reaches acertain size, it becomes a warm bucket, and a new hot bucket is created. Warm buckets aresearchable, but are not actively written to. There are many warm buckets.

Once Splunk has created some maximum number of warm buckets, it begins to roll the warm bucketsto cold based on their age. Always, the oldest warm bucket rolls to cold. Buckets continue to roll tocold as they age in this manner. After a set period of time, cold buckets roll to frozen, at which pointthey are either archived or deleted. By editing attributes in indexes.conf, you can specify the bucketaging policy, which determines when a bucket moves from one stage to the next.

Here are the stages that buckets age through:

Bucket Description Searchable? stage Contains newly indexed data. Open forHot writing. One or more hot buckets for each Yes. index. Data rolled from hot. There are many warmWarm Yes. buckets. Data rolled from warm. There are many cold Yes, but only when the search specifies aCold buckets. time range included in these files. Data rolled from cold. Splunk deletes frozenFrozen No. data by default, but you can also archive it.The collection of buckets in a particular stage is sometimes referred to as a database or "db": the "hotdb", the "warm db", the "cold db", etc.

What the index directories look like

Each bucket occupies its own subdirectory within a larger database directory. Splunk organizes thedirectories to distinguish between hot/warm/cold buckets. In addition, the bucket directory names arebased on the age of the data.

220Here's the directory structure for the default index:

Bucket Default location Notes type There can be multiple hot subdirectories. Each hot bucke occupies its own subdirectory,Hot $SPLUNK_HOME/var/lib/splunk/defaultdb/db/* uses this naming convention:

hot_v1_<ID>

There are multiple warm subdi

Each warm bucket occupies its subdirectory, which uses this n convention:

db_<newest_time>_<oldest_ti

where <newest_time> and

Warm $SPLUNK_HOME/var/lib/splunk/defaultdb/db/* <oldest_time> are timestam indicating the age of the data w

The timestamps are expressed

epoch time (in seconds). For e db_1223658000_122365440 is a warm bucket containing da October 10, 2008, covering the period of 9am-10am. There are multiple cold subdire When warm buckets roll to colCold $SPLUNK_HOME/var/lib/splunk/defaultdb/colddb/* get moved into this directory, b not renamed. Deletion is the default; archivin N/A: Data deleted, or archived into a directory structure of yourFrozen accomplished through user-cre design. script. Location for data that has been archived and later thawed. See Thawed $SPLUNK_HOME/var/lib/splunk/defaultdb/thaweddb/* "Restore archived data" for inf on restoring archived data to a "thawed" state.The paths for hot/warm and cold directories are configurable, so you can store cold buckets in aseparate location from hot/warm buckets. See "Use multiple partitions for index data".

Caution: All index locations must be writable.

221Configure your indexes

You configure indexes in indexes.conf. You can edit a copy of indexes.conf in

$SPLUNK_HOME/etc/system/local/ or in your own custom application directory in$SPLUNK_HOME/etc/apps/. Do not edit the copy in $SPLUNK_HOME/etc/system/default. Forinformation on configuration files and directory locations, see "About configuration files".

This table lists the key indexes.conf attributes affecting buckets and what they configure. It alsoprovides links to other topics that show how to use these attributes. For the most detailed informationon these attributes, as well as others, always refer to "indexes.conf".

What it Attribute Default configures

The path that containshomePath the hot and $SPLUNK_HOME/var/lib/splunk/defaultdb/db/ warm buckets. The path that containscoldPath $SPLUNK_HOME/var/lib/splunk/defaultdb/colddb/ the cold buckets. The path that containsthawedPath $SPLUNK_HOME/var/lib/splunk/defaultdb/thaweddb/ any thawed buckets. Determines rolling behavior, hot to warm. The maximum size for a hot bucket. When a hot bucketmaxDataSize Depends; see indexes.conf. reaches this size, it rolls to warm. This attribute also determines the approximate size for all buckets.

222 Determines rolling behavior, warm to cold. The maximum number of warmmaxWarmDBCount 300 buckets. When the maximum is reached, warm buckets begin rolling to cold. Determines rolling behavior, cold to frozen. The maximummaxTotalDataSizeMB size of an 500000 (MB) index. When this limit is reached, cold buckets begin rolling to frozen. Determines rolling behavior, cold to frozen.frozenTimePeriodInSecs 188697600 (in seconds; approx. 6 years) Maximum age for a bucket, after which it rolls to frozen. Script to run just before a Default behavior is to log the bucket's directory name and thencoldtoFrozenScript cold bucket delete it once it rolls. rolls to frozen.

223Use multiple partitions for index data

Splunk can use multiple disks and partitions for its index data. It's possible to configure Splunk to usemany disks/partitions/filesystems on the basis of multiple indexes and bucket types, so long as youmount them correctly and point to them properly from indexes.conf. However, we recommend thatyou use a single high performance file system to hold your Splunk index data for the best experience.

If you do use multiple partitions, the most common way to arrange Splunk's index data is to keep thehot/warm buckets on the local machine, and to put the cold bucket on a separate array or disks (forlonger term storage). You'll want to run your hot/warm buckets on a machine with with fast read/writepartitions, since most searching will happen there. Cold buckets should be located on a reliable arrayof disks.

Configure multiple partitions

1. Set up partitions just as you'd normally set them up in any operating system.

2. Mount the disks/partitions.

3. Edit indexes.conf to point to the correct paths for the partitions. You set paths on a per-index basis,so you can also set separate partitions for different indexes. Each index has its own [<index>]stanza, where <index> is the name of the index. These are the settable path attributes:

• homePath = <path on server>

♦ This is the path that contains the hot and warm databases for the index. ♦ Caution: The path must be writable.

• coldPath = <path on server>

♦ This is the path that contains the cold databases for the index. ♦ Caution: The path must be writable.

• thawedPath = <path on server>

♦ This is the path that contains any thawed databases for the index.

Buckets and Splunk administration

When you're administering Splunk, it helps to understand how Splunk stores indexes across buckets.In particular, several admin activities require a good understanding of buckets:

For information on setting a retirement and archiving policy, see "Set a retirement and archivingpolicy". You can base the retirement policy on either size or age of data.

For information on how to archive your indexed data, see "Archive indexed data". For information onarchive signing, see "Configure archive signing". To learn how to restore data from archive, read"Restore archived data".

To learn how to backup your data, read "Back up indexed data". This topic also discusses how tomanually roll hot buckets to warm (so that you can then back them up). Also, see "Best practices forbacking up" on the Community Wiki.

Troubleshoot your buckets

This section tells you how to deal with an assortment of bucket problems. We're starting small, butwe'll add new issues as they arise.

Recover invalid hot buckets

A hot bucket becomes an invalid hot (invalid_hot_<ID>) bucket when Splunk detects that themetadata files (Sources.data, Hosts.data, SourceTypes.data) are corrupt or incorrect.Incorrect data usually signifies incorrect time ranges; it can also mean that event counts are incorrect.

Splunk ignores invalid hot buckets. Data does not get added to such buckets, and they cannot besearched. Invalid buckets also do not count when determining bucket limit values such asmaxTotalDataSizeMB. This means that invalid buckets do not negatively affect the flow of datathrough the system, but it also means that they can result in disk storage that exceeds the configuredmaximum value.

To recover an invalid hot bucket, use the recover-metadata command:

1. Make backup copies of the metadata files, Sources.data, Hosts.data,

2. Rebuild the metadata from the raw data information:

3. If successful, rename the bucket as it would normally be named.

For more information

For more information on buckets, see "indexes.conf" in this manual and "Understanding buckets" onthe Community Wiki.

Configure segmentation to manage disk usage

Configure segmentation to manage disk usage

Segmentation is how Splunk breaks events up during indexing into usable chunks, called tokens. Atoken is a piece of information within an event, such as an error code or a user ID. The level ofsegmentation you choose can increase or decrease the size of the chunks.

Segmentation can affect indexing and searching speed, as well as disk space usage. You canchange the level of segmentation to improve indexing or searching speed, although this is nottypically necessary.

You can adjust segmentation rules to provide better index compression or improve the usability for aparticular data source. If you want to change Splunk's default segmentation behavior, editsegmenters.conf. Once you have set up rules in segmenters.conf, tie them to a specific

225source, host or sourcetype via props.conf. Segmentation modes other than inner and full are notrecommended.

Edit all configuration files in $SPLUNK_HOME/etc/system/local, or your own custom application

directory in $SPLUNK_HOME/etc/apps/.

Note: You can enable any number of segmentation rules applied to different hosts, sources, and/orsourcetypes in this manner.

There are many different ways you can configure segementers.conf, and you should figure outwhat works best for your data. Specify which segmentation rules to use for specific hosts, sources, orsourcetypes by using props.conf and segmentation. Here are the main types of index-timesegmentation:

Full segmentation

Splunk is set to use full segmentation by default. Full segmentation is the combination of inner andouter segmentation.

Inner segmentation

Inner segmentation is the most efficient segmentation setting for both search and indexing, while stillretaining the most search functionality. It does, however, make typeahead less comprehensive.Switching to inner segmentation does not change search behavior at all.

To enable inner segmentation, set SEGMENTATION = inner for your source, sourcetype, or host inprops.conf. Under these settings, Splunk indexes smaller chunks of data. For example,user.id=foo is indexed as user id foo.

Outer segmentation

Outer segmentation is the opposite of inner segmentation. Instead of indexing only the small tokensindividually, outer segmentation indexes entire terms, yielding fewer, larger tokens. For example,"10.1.2.5" is indexed as "10.1.2.5," meaning you cannot search on individual pieces of the phrase.You can still use wildcards, however, to search for pieces of a phrase. For example, you can searchfor "10.1*" and you will get any events that have IP addresses that start with "10.1". Also, outersegmentation disables the ability to click on different segments of search results, such as the 48.15segment of the IP address 48.15.16.23. Outer segmentation tends to be marginally more efficientthan full segmentation, while inner segmentation tends to be much more efficient.

To enable outer segmentation, set SEGMENTATION = outer for your source, sourcetype, or host inprops.conf. Also for search to behave properly, add the following stanza to$SPLUNK_HOME/etc/system/local/segmenters.conf, so that the search system knows tosearch for larger tokens:

The most space-efficient segmentation setting is to disable segmentation completely. This hassignificant implications for search, however. By setting Splunk to index with no segmentation, yourestrict searches to time, source, host, and sourcetype. You must pipe your searches through thesearch command to further restrict results. Use this setting only if you do not need any advancedsearch capabilities.

To disable segmentation, set SEGMENTATION = none for your source, sourcetype, or host inprops.conf. Searches for keywords in this source, sourcetype, or host will return no results. Youcan still search for indexed fields.

Splunk Web segmentation for search results

Splunk Web has settings for segmentation in search results. These have nothing to do withindex-time segmentation. Splunk Web segmentation affects browser interaction and can speed upsearch results. To set search-result segmentation:

1. Perform a search. Look at the results.

2. Click Options... above the returned set of events.

3. In the Event Segmentation dropdown box, choose from the available segmentation types: full,inner, outer, or raw. The default is "full".

Configure custom segmentation for a host, source, or source

typeConfigure custom segmentation for a host, source, or source type

By default, Splunk fully segments events to allow for the most flexible searching. To learn more aboutsegmentation in general, refer to this page about segmentation.

If you know how you want to search for or process events from a specific host, source, or sourcetype,you can configure custom segmentation for that specific type of event. Configuring customsegmentation for a given host, source, or sourcetype improves indexing and search performance andcan reduce index storage size.

Configure custom segmentation in props.conf

Configure custom segmentation for events of a host, source, or sourcetype by adding theSEGMENTATION and SEGMENTATION-<segment selection> attributes to the appropriate stanzain props.conf. Assign values to the attributes using rules for index-time and search-time (SpunkWeb) segmentation defined in segmenters.conf.

Add your stanza to $SPLUNK_HOME/etc/system/local/props.conf. Specify the following

[<spec>] can be:

• <sourcetype>: A sourcetype in your event data.

• This specifies the segmentation rule ("segmenter") from segmenters.conf to use at index time.

SEGMENTATION-<segment selection> = <segmenter>

• This setting affects how search results appear in Splunk Web; it does not change the index-time segmentation. • This specifies that Splunk Web should use the specified segmenter (from segmenters.conf) for the given <segment selection> choice. The <segment selection> choices appear as segmentation types that the user can select when viewing search results in Splunk Web. Look here for more information. • Default <segment selection> choices are: all, inner, outer, and raw. • Do not change the set of default <segment selection> choices, unless you have some overriding reason for doing so. In order for a changed set of <segment selection> choices to appear in Splunk Web, you will first need to edit the Splunk Web UI, which you probably will not want to attempt to do. You can, however, change the segmenter that a given <segment selection> calls.

<segmenter>

• This is a segmentation rule defined in segmenters.conf.

• Pre-defined default rules include: inner, outer, none, and full. • You can create your own custom rule by editing $SPLUNK_HOME/etc/system/local/segmenters.conf. • For more information on configuring segmenters.conf, see this page.

Example

The following example can increase search performance and reduce the size of syslog events in yourindex.

Add the following to the [syslog] source type stanza in props.conf:

[syslog]SEGMENTATION = innerSEGMENTATION-all = inner

This changes the segmentation of all events that have a sourcetype of syslog to innersegmentation, both at index time (through the SEGMENTATION attribute) and at search time in SplunkWeb (through the SEGMENTATION-<segment selection> attribute).

a small amount of index data here and without it your index may appear to vanish.

6. Start the server:

> .\splunk start

The Splunk Server picks up where it left off, reading from, and writing to, the new copy of the index.

Caution: Do not try to break up and move parts of an index manually. If you must subdivide anexisting index, contact Splunk Support for assistance.

Remove indexed data from Splunk

230Remove indexed data from Splunk

You can remove data from indexes in two ways:

• Delete events from future searches with the delete operator.

• Remove all data from one or more indexes with the CLI clean command.

Caution: Removing data is irreversible. Use caution when choosing what events to remove fromsearches, or what data to remove from your Splunk indexes. If you want to get your data back, youmust re-index the applicable data source(s).

Delete data from future searches with the "delete" operator

Splunk provides the special operator delete to delete data from future searches. Before using thedelete operator, read this section carefully.

Who can delete?

The delete operator can only be accessed by a user with the "delete_by_keyword" capability. Bydefault, Splunk ships with a special role, "can_delete" that has this capability (and no others). Theadmin role does not have this capability by default. Splunk recommends you create a special userthat you log into when you intend to delete index data.

For more information, refer to "Add users and assign roles" in this manual.

How to delete

To use the delete operator, run a search that returns the events you want deleted. Make sure thatthis search returns ONLY events you want to delete, and no other events.

For example, if you want to remove the events you've indexed from a source called/fflanda/incoming/cheese.log so that they no longer appear in searches, do the following:

1. Disable or remove that source so that it no longer gets indexed.

2. Search for events from that source in your index:

source="/fflanda/incoming/cheese.log"3. Look at the results to confirm that this is the data you want to delete.

4. Once you've confirmed that this is the data you want to delete, pipe the search to delete:

source="/fflanda/incoming/cheese.log" | deleteSee the page about the delete operator in the Search Reference Manual for more examples.

Piping a search to the delete operator marks all the events returned by that search so that they arenever returned by any future search. No user (even with admin permissions) will be able to see thisdata when searching with Splunk.

Note: Piping to delete does not reclaim disk space.

231The delete operator also does not update the metadata of the events, so any metadata searcheswill still include the events although they are not searchable. The main All indexed data dashboardwill still show event counts for the deleted sources, hosts, or sourcetypes.

Remove data from indexes with the CLI "clean" command

To delete index data permanently from your disk, use the CLI clean command. This commandcompletely deletes the data in one or all indexes, depending on whether you provide an<index_name> argument. Typically, you run clean before re-indexing all your data.

How to use the "clean" command

Here are the main ways to use the clean command:

• To access the help page for clean, type:

./splunk help clean

• To permanently remove event data from all indexes, type:

./splunk clean eventdata

• To permanently remove event data from a single index, type:

./splunk clean eventdata <index_name>

where <index_name> is the name of the targeted index.

• Add the -f parameter to force clean to skip its confirmation prompts.

Examples

Note: You must stop Splunk before you run the clean command:

./splunk stop

This example removes event data from all indexes:

./splunk clean eventdata

This example removes event data from the _internal index and forces Splunk to skip theconfirmation prompt:

./splunk clean eventdata _internal -f

Optimize indexesOptimize indexes

While Splunk is indexing data, one or more instances of the splunk-optimize process will runintermittently, merging index files together to optimize performance when searching the data. The

232splunk-optimize process can use a significant amount of cpu, but should not consume itindefinitely, only for a short amounts of time. You can alter the number of concurrent instances ofsplunk-optimize by changing the value set for maxConcurrentOptimizes in indexes.conf,but this is not typically necessary.

splunk-optimize should only run on hot buckets. You can run it on warm buckets manually, ifyou find one with a larger number of .tsidx files (more than 25):

./splunk-optimize <directory>

If splunk-optimize does not run often enough, search efficiency will be affected.

For more information on buckets, see "How Splunk stores indexes".

233Define alertsHow alerting worksHow alerting works

Alerts are searches you've configured to run on a schedule and send you their results. Use alerts tonotify you of changes in your data, network infrastructure, file system or other devices you'remonitoring. Alerts can be sent via email or RSS, or trigger a shell script. You can turn any savedsearch into an alert.

An alert is comprised of:

• a schedule for performing the search

• conditions for triggering an alert • actions to perform when the triggering conditions are met

Enable alerts

Set up an alert at the time you create a saved search, or enable an alert on any existing savedsearch. Configure both basic and advanced conditional alerts for searches by:

• Scheduling and defining alerts for saved searches through Splunk Web (if you have permission to edit them). • Entering or updating saved search configurations in savedsearches.conf. For more information, see "Set up alerts in savedsearches.conf", in this chapter.

Specify default email alert action settings

To specify the mail host, email format, subject, sender, and whether or not the results of the alertshould be included inline:

You can also set default settings for alert actions (including scripted alerts) by making changesdirectly to alert_actions.conf.

PDF report settings

On the Email alert settings page, select Use PDF report server to open the PDF report settingssection. This is where you enable the ability to have .pdf printouts of report results sent asattachments with alert emails.

234Note: You must have the PDF Printer app set up on a central Linux host before you can enable thePDF printing functionality here. For more information see "Configure PDF printing for Splunk Web" inthe Installation manual.

Scripted alerts

Alerts can also trigger shell scripts. When you configure an alert, specify a script you've written. Youcan use this feature to send alerts to other applications. Learn more about configuring scripted alerts.

You can use scripted alerts to send syslog events, or SNMP traps.

Considerations

When configuring alerts, keep the following in mind:

• Too many alerts/saved searches running at once may slow down your system -- depending on the hardware, 20-30 alerts running at once should be OK. If the searches your alerts are based on are complex, you should make the interval longer and spread the searches out more. • Set a time frame for alerts that makes sense -- if the search takes longer than 4-5 minutes to run, don't set it to run every five minutes. • You must have a mail server running on the LAN that the Splunk server can connect to. Splunk does not authenticate against the mail server. • Read more about best practices for alert configuration on the Splunk Community Wiki, here.

Set up alerts in savedsearches.conf

Set up alerts in savedsearches.conf

Configure alerts with savedsearches.conf. Use the

$SPLUNK_HOME/etc/system/README/savedsearches.conf.example as an example, orcreate your own savedsearches.conf. Edit this file in $SPLUNK_HOME/etc/system/local/, oryour own custom application directory in $SPLUNK_HOME/etc/apps/. For more information onconfiguration files in general, see how configuration files work.

Follow these steps:

1. Create a saved search.

2. Schedule the search.

3. Define alert conditions.

4. Configure alert actions.

You can set up an alert at the time you create a saved search, or add the alert configurations to yoursaved search stanza later.

235Note: You must have email enabled on your Splunk server for alerts to be sent out. Alternately, yourSplunk server must be able to contact your email server. Configure email settings in Manager.

Create a saved search

First, set up a saved search, either via Splunk Web or savedsearches.conf.

Schedule the search

Next, schedule your search. This means your search runs on a schedule that you specify. Forexample, you can arrange to have Splunk run your search every hour, or every day at midnight.

To schedule a search via savesearches.conf, add the following attribute/value pairs to yoursaved search stanza:

userid = <integer>

• UserId of the user who created this saved search.

♦ Splunk needs this information to log who ran the search, and create editing capabilities in Splunk Web. • Possible values: Any Splunk user ID. • User IDs are found in $SPLUNK_HOME/etc/passwd. ♦ Look for the first number on each line, right before the username. ♦ For example 2:penelope....

enableSched = < 0 | 1 >

• Set this to 1 to enable schedule for search

• Defaults to 0.

cron_sched = <cron string>

• The cron schedule used to execute the search.

• For example, */5 * * * * causes the search to execute every five minutes.

Note Cron scheduling lets you use standard cron notation to define your scheduled search interval. Inparticular, cron can accept this type of notation: 00,20,40 * * * *, which runs the search every hour athh:00, hh:20, hh:40. Along the same lines, a cron of 03,23,43 * * * * runs the search every hour athh:03, hh:23, hh:43. Splunk recommends that you schedule your searches so that they're staggeredover time. This reduces system load. Running all of them (*/20) every 20 minutes means they wouldall launch at hh:00 (20, 40) and might slow your system every 20 min.

max_concurrent = <integer>

• The maximum number of concurrent instances of this search the scheduler is allowed to run. • Defaults to 1.

236Set up alert conditions

Next, define alert conditions for the scheduled search. When Splunk runs a scheduled search, theseare the the conditions that trigger an alert action (such as an email) when they are met.

• Basic conditional alerts trigger alert actions when set thresholds in the number of events, sources, or hosts in your results are exceeded. • Advanced conditional alerts are based on the results of a conditional search that is evaluated against the results of the scheduled search. If the conditional search returns one or more events, the event is triggered.

Define a basic conditional alert

Define a threshold number of events, sources, or hosts. If the alert conditions are met when thesearch is run, Splunk notifies you via email or triggers a shell script. You can also set counttype =always if you want the alert action (such as an email) to be triggered each time the scheduledsearch runs.

counttype = <string>

• Set the type of count for alerting.

• Possible values: number of events, number of hosts, number of sources, and always. • Used in conjunction with the relation and quantity attributes, except when set to always. • Use counttype = always to trigger the alert action each time the scheduled search is run.

• Number to compare against the given counttype.

So if you have the following:

counttype = number of events

Splunk alerts you if your search results have risen by 25 since the last time the search ran.

For more information about configuring alert actions, see the "Configure alert actions" subtopic,below.

237Define an advanced conditional alert

If you'd rather define an advanced conditional alert, you use the alert_condition attribute inplace of counttype, relation, and quantity.

alert_condition = <string>

• In <string>, enter a search. Splunk will evaluate this secondary search on the artifacts of the saved search to determine whether to trigger an alert action. • Alert actions are triggered if this secondary search yields a non-empty search result list.

For an in-depth discussion of a use case for advanced conditional alerting over basic conditionalalerting, see "Set alert conditions for scheduled searches" in the User Manual. This topic discussesalert setup using Manager, but the underlying principles are the same.

For more information about configuring alert actions, see the following subtopic, "Configure alertactions."

Configure alert actions

You can configure three different kinds of alert actions--actions that happen when alert conditions aremet--for your scheduled searches. These alert actions are notification by email, notification by RSS,and the triggering of a shell script.

To enable or disable an alert action for a particular scheduled, alerted search, add the following to thesearch definition:

action.<action_name> = 0 | 1

• Indicates whether the alert action is enabled or disabled for a particular saved search. Set to 0 (disabled) by default. • action_name can be email, script, or rss.

Global defaults for all alert actions are configured in alert_actions.conf (or via SplunkManager). You can override these defaults at the individual search level in savedsearches.conf.If you don't need to override the alert action defaults, all you need to do is indicate which alert actionsare enabled for a given scheduled search (see above).

To set a parameter for an alert action, the syntax is as follows:

action.<action_name>.<parameter> = <value>

The parameter options for each <code><action_name> are defined in the following sections.

Notification by email

Use the email action to have Splunk contact stakeholders when the scheduled search triggers analert:

action.email = 1

238The email action has a number of parameters. Defaults can be set for all of these parameters inalert_actions.conf, with the exception of the action.email.to parameter, which should beset for each scheduled search that uses the email alert action.

action.email.to = <email list>

• The email addresses to which Splunk will send the email, arranged in a comma-delimited list. • This parameter is not set at the alert_actions.conf level. You must define it for every email alert action that you configure.

action.email.from = <email address>

• The email address that is used as the sender's address.

• Default is splunk@$LOCALHOST (or whatever is set for from in alert_actions.conf).

action.email.subject = <string>

• The subject of the alert email.

• Default is SplunkAlert-<savedsearchname> (or whatever is set for subject in alert_actions.conf).

action.email.sendresults = <bool>

• Specify whether to include the search results in the email. The results can be attached or included in the body of the email (see the action.email.inline parameter, below). • Default is false (or whatever is set for sendresults in alert_actions.conf). • Note: When you are using an advanced conditional alert, be aware that only the results of the original search are included with the email. The results of the triggering conditional search are discarded

action.email.inline = <true | false>

• Specify whether the search results are included in the body of the alert mail. • Default is false (or whatever is set for inline in alert_actions.conf).

action.email.mailserver = <string>

• The address of the MTA server that sends the alert emails. • Default is $LOCALHOST (or whatever is set for mailserver in alert_actions.conf).

action.email.preprocess_results = <search-string>

• An optional search string to preprocess results before emailing them. Usually one would set this up to filter out unwanted internal fields. • Default is an empty string (or whatever is set for preprocess_results in alert_actions.conf).

Note: You can also arrange to have .pdf printouts of dashboards delivered by email on a setschedule. For more information, see "Schedule delivery of dashboard PDF printouts via email" in this

239manual.

There are settings for this feature in alert_actions.conf. For example, you can identify the URLof the PDF report server, and the report paper size and orientation.

Important: Use of the .pdf printout feature requires the setup of the PDF Printer app on a centralLinux host. If you don't have this set up, contact a system administrator. For more information see"Configure PDF printing for Splunk Web" in the Installation manual.

The following is an example of what an email alert looks like:

Create an RSS feed

Use the rss action to have Splunk alert you via RSS when the scheduled search triggers an alert:

action.rss = 1

Whenever the alert conditions are met for a scheduled search that has Create an RSS feed selected,Splunk sends a notification out to its RSS feed. The feed is located athttp://[splunkhost]:[port]/rss/[saved_search_name]. So, let's say you're running asearch titled "errors_last15" and have a Splunk instance that is located on localhost and uses port8000, the correct link for the RSS feed would behttp://localhost:8000/rss/errors_last15.

You can also access the RSS feed for a scheduled search through the Searches and reports page inManager. If a scheduled search has been set up to provide an RSS feed for alerting searches, whenyou look it up on the Searches and reports page, you will see a RSS symbol in the RSS feed column:

240You can click on this symbol to go to the RSS feed.

Note: The RSS feed for a scheduled search will not display any searches until the search has run onits schedule and the alerting conditions that have been defined for it have been met. If you set thesearch up to alert each time it's run (by setting Perform actions to always), you'll see searches in theRSS feed after first time the search runs on its schedule.

Warning: The RSS feed is exposed to any user with access to the webserver that displays it.Unauthorized users can't follow the RSS link back to the Splunk application to view the results of aparticular search, but they can see the summarization displayed in the RSS feed, which includes thename of the search that was run and the number of results returned by the search.

Trigger a shell script

Use the script action to have Splunk run a shell script when the scheduled search triggers an alert:

action.script = 1

The script action has a filename parameter which is usually defined at the individual searchlevel, although a default filename can also be set in alert_actions.conf:

action.script.filename = <script filename>

• The filename of the shell script that you want Splunk to run. The script should live in $SPLUNK_HOME/bin/scripts/.

Example - Basic conditional alert configuration

This example is for a saved search titled "sudoalert." It runs a search for events containing the term"sudo" on a 12 minute interval. If a scheduled "sudoalert" run results in greater than 10 events, alertactions are triggered that send the results via email and post them to an RSS feed.

Enable summary indexing

Summary indexing is an additional kind of alert action that you can configure for any scheduledsearch. You use summary indexing when you need to perform analysis/reports on large amounts ofdata over long timespans, which typically can be quite time consuming, and a drain on performance ifseveral users are running similar searches on a regular basis.

With summary indexing, you define a scheduled search that computes sufficient statistics (asummary) for events covering a time slice. Each time Splunk runs the search it saves the results intoa summary index that you've designated. You can then search and report on this smaller (and thusfaster) summary index instead of working with the much larger dataset that the summary index isbased on.

Note: Do not attempt to set up a summary index until you have read and understood "Use summaryindexing for increased reporting efficiency" in the Knowledge Manager manual.

For more information about configuring summary index searches in savedsearches.conf, see"Configure summary indexes" in the Knowledge Manager Manual.

Configure scripted alerts

Configure scripted alerts

Configure scripted alerts with savedsearches.conf. Use the

$SPLUNK_HOME/etc/system/README/savedsearches.conf.example as an example, orcreate your own savedsearches.conf. Edit this file in $SPLUNK_HOME/etc/system/local/, oryour own custom application directory in $SPLUNK_HOME/etc/apps/. For more information onconfiguration files in general, see "About configuration files".

Script options

Your alert can trigger a shell script, which must be located in $SPLUNK_HOME/bin/scripts. Use thefollowing attribute/value pairs:

action_script = <string>

• Your search can trigger a shell script.

• Specify the name of the shell script to run. • Place the script in $SPLUNK_HOME/bin/scripts.

If you want to run a script written in a different language (e.g. PERL, Python, VBScript) you mustspecify the interpreter you want Splunk to use in the first line of your script, following the #!. Forexample:

to run a PERL script:

---- myscript.pl ----

#!/path/to/perl............

to use Python to interpret the script file:

---- myscript.py -----

#!/path/to/python..........

For an example on how scripts can be configured to work with alerts, see send SNMP traps.

Example

You can configure Splunk to send alerts to syslog. This is useful if you already have syslog set up tosend alerts to other applications, and you want Splunk's alerts to be included.

Check the Splunk Wiki for information about the best practices for using UDP when configuringSyslog input.

Write a script that calls logger (or any other program that writes to syslog). Your script can call anynumber of the variables your alert returns.

Create the following script and make it executable:

logger $5

Put your script in $SPLUNK_HOME/bin/scripts.

Now write an alert that calls your script. See Set Up Alerts for information on alert configuration.Configure the alert to call your script by specifying the path in the Trigger shell script field of thealert.

243Edit your saved search to call the script. If your script is in $SPLUNK_HOME/bin/scripts you don'thave to specify the full path.

This logs the trigger reason to syslog:

Check out this excellent topic on troubleshooting alert scripts on the Splunk Community Wiki.

Send SNMP traps to other systems

Send SNMP traps to other systems

You can use Splunk as a monitoring tool to send SNMP alerts to other systems such as a NetworkSystems Management console.

If you're interested in sending SNMP traps on Windows, check this Community Wiki topic.

Configuration

Requirements

• Perl is required to run the script below.

• Net-SNMP package is required in order to use the /usr/bin/snmptrap command - if you have another way of sending an SNMP trap from a shell script then modify as needed. • Admin access to the $SPLUNK_HOME/bin/scripts directory of your Splunk install. • For security reasons, scripts must reside in $SPLUNK_HOME/bin/scripts.

Create shell script

♦ For security reasons, scripts must reside in this directory. Create the directory if it doesn't already exist. ♦ Copy the code below into sendsnmptrap.pl. • chmod +x sendsnmptrap.pl to make it executable. • Change the Host:Port of the SNMP trap handler, paths to external commands splunk and snmptrap, and the user/password if necessary. • The perl script will work on MS Windows systems with Perl. However, on some Windows systems, perl may not be installed, or perl scripts may not be configured to be directly executable via Splunk. In these cases, you may find it easier to send SNMP traps using a Windows CMD script.

#!/usr/bin/perl## sendsnmptrap.pl: A script to for Splunk alerts to send an SNMP trap.#

Configure your alert to call a shell script

• Create a saved search. Read about setting up saved searches for more information. • Turn your saved search into an alert. Read about setting alert conditions from scheduled searches for more information. • Set up your alert so that it calls your shell script by specifying the name of your script which resides in $SPLUNK_HOME/bin/scripts:

245Set up backups and retention policiesWhat you can back upWhat you can back up

Splunk data falls into two major categories:

• Indexed event data, including both the compressed raw data and the indexes that access it • Configuration data, including user data

How much space you will need

How much space you will need

This topic describes how to estimate the size of your Splunk index and associated data so that youcan plan your storage capacity requirements.

When Splunk indexes your data, the resulting data falls into two categories: the compressed,persisted raw data and the indexes that point to this data. With a little experimentation, you canestimate how much disk space you will need.

Typically, the compressed, persisted data amounts to approximately 10% of the raw data that comesinto Splunk. The associated indexes range in size anywhere from 10% to 110% of the data theyaccess. This value is affected strongly by the number of unique terms in the data. Depending on thedata's characteristics, you might want to tune your segmentation settings. For an introduction to howsegmentation works and how it affects index size, you can also watch this video on segmentation byone of Splunk's lead developers.

The best way to get an idea of your space needs is to experiment by installing a copy of Splunk andindexing a representative sample of your data, and then checking the sizes of the resulting directoriesin defaultdb.

To do this, first index your sample. Then:

1. Go to $SPLUNK_HOME/var/lib/splunk/defaultdb/db.

2. Run du -shc hot_v*/rawdata to determine the size of the compressed, persisted raw data.Typically, this amounts to about 10% of the size of the original sample data set.

3. Run du -ch hot_v* and look at the last total line to see the size of the index.

4. Add the two values together.

This is the total size of the index and associated data for the sample you indexed. You can now usethis to extrapolate the size requirements for your Splunk index and rawdata directories over time.

246Back up indexed dataBack up indexed data

This topic discusses backing up Splunk indexed data. It first gives an overview of how your indexeddata moves through Splunk, then describes a basic backup strategy based on common or defaultSplunk index configurations. Finally, it provides options for setting or changing the retirement policyfor your Splunk index data.

The default values and policies described in this topic are set in indexes.conf. If you have a morecomplex index configuration, or have unusual data volumes, you can refer there for detailedinformation and options. Before modifying any configuration file, read "About configuration files".

For more information on backing up indexed data, see "Best practices for backing up" on theCommunity Wiki.

For information on setting a data retirement and archiving policy, see "Set a retirement and archivingpolicy".

How data ages

When Splunk is indexing, the data moves through a series of stages based on policies that youdefine. At a high level, the default behavior is as follows:

When data is first indexed, it is put into a "hot" database, or bucket.

The data remains in the hot bucket until the policy conditions are met for it to be reclassified as"warm" data. This is called "rolling" the data into the warm bucket. By default, this happens when ahot bucket reaches a specified size or age. When a hot bucket is rolled, its directory is renamed, andit becomes a warm bucket. It is safe to back up the warm buckets.

Next, when you reach a specified number of warm buckets, the oldest bucket becomes a cold bucket,thus maintaining a constant number of warm buckets. (If your colddb directory is located on anotherfileshare, the buckets are moved there and deleted from the warm db directory.) The default numberof warm buckets is 300.

Finally, at a time based on your defined policy requirements, the bucket will roll from cold to "frozen".By default, Splunk deletes frozen buckets. If you need to archive or otherwise preserve the data, youcan provide a script that performs actions on the bucket prior to deletion.

Summary:

• hot bucket - Currently written to; non-incrementally changing; do not back this up. • warm bucket - Rolled from hot; added to incrementally; can be safely backed up; consists of multiple warm buckets. • cold bucket - Rolled from warm; buckets are moved to another location. • frozen bucket - Default policy is to delete.

247For detailed information on how buckets work and where they are stored, see "How Splunk storesindexes".

Choose your backup strategy

The general recommendation is to schedule backups of your warm buckets regularly, using theincremental backup utility of your choice.

Hot buckets can only be backed up by taking a snapshot of the files, using a tool like VSS (onWindows/NTFS), ZFS snapshots (on ZFS), or a snapshot facility provided by the storage subsystem.If you do not have such a facility available, the data within the hot bucket can only be backed up afterit has rolled to a warm bucket.

Splunk rolls a hot bucket to a warm bucket based on the policy defined in indexes.conf. Bydefault, the main index rolls a hot bucket when it reaches a certain size. (While it is possible to force aroll of a hot bucket to a warm bucket, this is not recommended, as each forced roll permanentlydecreases search performance over the data. In cases where hot data needs to be backed up, asnapshot backup is the preferred method.)

You can set retirement and archiving policy by controlling the size of indexes or buckets or the age ofthe data.

The sizes, locations, and ages of index files are set in indexes.conf. See "How Splunk storesindexes" for detailed information on buckets and indexes.conf.

Caution: All index locations must be writable.

Recommendations for recovery

If you experience a non-catastrophic disk failure (for example you still have some of your data, butSplunk won't run), Splunk recommends that you move the index directory aside and restore from abackup rather than restoring on top of a partially corrupted datastore. Splunk will automatically createhot directories on startup as necessary and resume indexing. Monitored files and directories will pickup where they were at the time of the backup.

Rolling buckets manually from hot to warm

To roll the buckets of an index manually from hot to warm, use the following command, replacing<index_name> with the name of the index you want to roll:

From the CLI

From the search bar

This has been deprecated and cannot be used from the search bar any longer

248Back up configuration informationBack up configuration information

All Splunk's configuration information is contained in configuration files. To back up the set ofcofiguration files, make an archive or copy of $SPLUNK_HOME/etc/. This directory, along with itssubdirectories, contains all the default and custom settings for your Splunk install, and all apps,including saved searches, user accounts, tags, custom source type names, and other configurationinformation.

Copy this directory to a new Splunk instance to restore. You don't have to stop Splunk to do this.

For more information about configuration files, including the structure of the underlying directories,read "About configuration files".

Set a retirement and archiving policy

Set a retirement and archiving policy

Configure data retirement and archiving policy by controlling the size of indexes or the age of data inindexes.

Splunk stores indexed data in buckets. For a discussion of buckets and how Splunk uses them, see"How Splunk stores indexes".

Splunk index buckets go through four stages of retirement. When indexed data reaches a frozenstate, Splunk deletes it. (Splunk deletes all frozen data by default. You must specify an archivingscript to avoid losing frozen data.)

Retirement Description Searchable? stage Open for writing. One or more hot buckets forHot Yes. each index. Data rolled from hot. There are many warmWarm Yes. buckets. Data rolled from warm. There are many coldCold Yes. buckets. N/A: Splunk deletes frozen data byFrozen Data rolled from cold. Eligible for deletion. default.Splunk defines the sizes, locations, and ages of indexes and their buckets in indexes.conf.

Edit a copy of indexes.conf in $SPLUNK_HOME/etc/system/local/, or in your own custom

application directory in $SPLUNK_HOME/etc/apps/. Do not edit the copy in$SPLUNK_HOME/etc/system/default. For information on configuration files and directory

249locations, see "About configuration files".

Note: To configure data, all index locations must be writable.

Remove files beyond a certain size

If an index grows bigger than a specified maximum size, the oldest data is rolled to frozen, whichmeans it gets immediately deleted unless you have created a script to archive the data, as describedin "Archive indexed data". The default maximum size for an index is 500000 MB. To change themaximum size, edit this line in indexes.conf:

maxTotalDataSizeMB = <non-negative number>

For example:

[main]maxTotalDataSizeMB = 2500000

Note: Make sure that the data size you specify for maxTotalDataSizeMB is expressed inmegabytes.

Restart Splunk for the new setting to take effect. Depending on how much data there is to process, itcan take some time for Splunk to begin to move buckets out of the index to conform to the newpolicy. You might see high CPU usage during this time.

Remove data beyond a certain age

Splunk ages out data by buckets. Specifically, when the most recent data in a particular bucketreaches the configured age, the entire bucket is rolled.

Splunk also rolls buckets when the reach a maximum size. If you are indexing a large volume ofevents, bucket size is less a concern for retirement policy because the buckets will fill quickly. Youcan reduce bucket size by setting a smaller maxDataSize in indexes.conf so they roll faster. Butnote that it takes longer to search more small buckets than fewer large buckets. To get the resultsyou are after, you will have to experiment a bit to determine the right size. Due to the structure of theindex, there isn't a direct relationship between time and data size.

To remove data beyond a specified age, set frozenTimePeriodinSecs in indexes.conf to the

number of seconds to elapse before the data gets erased. The default value is 188697600 seconds,or approximately 6 years. This example configures Splunk to cull old events from its index when theybecome more than 180 days (15552000 seconds) old:

[main]frozenTimePeriodInSecs = 15552000

Note: Make sure that the time you specify for frozenTimePeriodInSecs is expressed in seconds.

Restart Splunk for the new setting to take effect. Depending on how much data there is to process, itcan take some time for Splunk to begin to move buckets out of the index to conform to the newpolicy. You might see high CPU usage during this time.

250I changed the archive policy and restarted but it's not working

If you changed your archive policy to be more restrictive because you've run out of disk space, youmay notice that events haven't started being archived according to your new policy. This is most likelybecause you must free up some space so the process has room to run. Stop Splunk, clear out ~5GBof disk space, and then start Splunk again (refer to Start Splunk in this manual for details on stoppingand starting Splunk). After a while (exactly how long depends on how much data there is to process)you should see INFO entries about BucketMover in splunkd.log showing that buckets are beingarchived.

Archive data

If you want to archive your frozen data instead of deleting it, you must create an archiving script, asdescribed in "Archive indexed data". You can later restore the archived data, as described in"Restore archived data".

Archive indexed data

Archive indexed data

Set up Splunk to archive your data automatically as it ages. To do this, configure indexes.conf to callarchiving scripts located in $SPLUNK_HOME/bin. Edit a copy of indexes.conf in$SPLUNK_HOME/etc/system/local/, or in your own custom application directory in$SPLUNK_HOME/etc/apps/. Do not edit the copy in $SPLUNK_HOME/etc/system/default. Forinformation on configuration files and directory locations, see "About configuration files".

Caution: By default, Splunk deletes all frozen data. To avoid losing your data, you must specify avalid coldToFrozenScript in indexes.conf.

For detailed information on data storage in Splunk, see "How Splunk stores indexes".

Sign your archives

Splunk supports archive signing; configuring this allows you to verify integrity when you restore anarchive.

Use Splunk's index aging policy to archive

Splunk rotates old data out of the index based on your data retirement policy. Data moves throughseveral stages, which correspond to file directory locations. Data starts out in the hot database,located as subdirectories under $SPLUNK_HOME/var/lib/splunk/defaultdb/db/. Then, datamoves through the warm database, also located as subdirectories under$SPLUNK_HOME/var/lib/splunk/defaultdb/db. Eventually, data is aged into the colddatabase $SPLUNK_HOME/var/lib/splunk/defaultdb/colddb.

Finally, data reaches the frozen state. Splunk erases frozen index data once it is older thanfrozenTimePeriodinSecs in indexes.conf. The coldToFrozenScript (also specified inindexes.conf) runs just before the frozen data is erased. The default script simply writes the nameof the directory being erased to the log file$SPLUNK_HOME/var/log/splunk/splunkd_stdout.log. If you want to archive frozen data

To substitute your own script, add the following stanza to

Note the following:

• <index> specifies which index to archive.

• <script> specifies the archiving script. ♦ Define the <$script> path relative to $SPLUNK_HOME/bin. The script needs to be located in that directory or a subdirectory.

Splunk ships with two archiving scripts in the $SPLUNK_HOME/bin directory. You can modify these(or you can create your own):

• compressedExport.sh: Export with tsidx files compressed as gz.

• flatfileExport.sh: Export as a flat text file (not recommended for current performance and resource issues -- it can take a long time, and use a lot of ram, 2-3GB, while running).

Note: If using one of these scripts, modify it to specify the archive location for your installation. Bydefault, the location is set to opt/tmp/myarchive. Also, rename the script or move it to anotherlocation to avoid having changes overwritten when you upgrade Splunk. These are example scriptsand should not be applied to a production instance without editing to suit your environment andtesting extensively.

Windows Users

Windows users use this notation:

coldToFrozenScript = <script> "$DIR"

Note: Enclose with a double quotes " if it contains a space.

For <script>, you can use one of these example scripts:

• WindowsCompressedExport.bat. Download the example script here.

• WindowsFlatfileExport.bat (not recommended for current performance and resource issues -- it can take a long time, and use a lot of ram, 2-3GB, while running). Download the example script here.

Note: Rename the script or move it to another location to avoid having changes overwritten when youupgrade Splunk. These are example scripts and should not be applied to a production instancewithout editing to suit your environment and testing extensively.

252Examples

The following configuration will archive main index frozen buckets in d:\myarchive

Change the parameter for "dest_base" in WindowsFlatfileExport.bat or

WindowsCompressedExport.bat.

set dest_base=d:\myarchive

In c:\Program Files\Splunk\etc\system\local\inputs.conf specify:

Restore archived indexed data

Restore archived indexed data

Restore archived data by moving the archive into the thawed directory,$SPLUNK_HOME/var/lib/splunk/defaultdb/thaweddb. You can restore an archive to aSplunk server regardless of operating system with some restrictions -- data generated on 64bitsystems is not likely to work well on 32 bit systems, and data cannot be moved from PowerPC orSparc systems to x86 or x8-64 systems or vice versa. Data in thaweddb is not subject to the server'sindex aging scheme (hot > warm> cold > frozen). You can put old archived data in thawed for as longas you need. When the data is no longer needed, simply delete it or move it out of thawed.

The details of how to restore archived data depends on how it was archived. You can restorearchived data to any index or instance of Splunk, with the caveat that you do not introduce bucket IDconflicts to your index. Archived data does not need to be restored to its pre-archival location.

Here is an example of safely moving a previously saved archive bucket to thawed.

Authentication includes SSL and HTTPS, user-based access controls (known as roles) and LDAP.

SSL/HTTPS

You can configure SSL for both Splunk's back-end (splunkd talking to the browser) and the front-end(HTTPS when logging into Splunk Web). To set up SSL for Splunk's back-end, see instructions here.To enable HTTPS for Splunk Web, follow these instructions.

Configure roles

You no longer have to use Splunk's default roles of Admin, Power or User. While these roles remainbuilt into Splunk, you can now define your own roles out of a list of capabilities. Create flexible rolesfor Splunk users either in Manager or by editing authorize.conf.

Learn more about configuring roles (in the "Add and manage users" section of this manual).

Learn more about configuring LDAP (in the "Add and manage users" section of this manual).

Scripted authentication

Use scripted authentication to tie Splunk's authentication into an external authentication system, suchas RADIUS or PAM.

Learn more about scripted authentication (in the "Add and manage users" section of this manual).

Audit

Splunk includes audit features to allow you to track the reliability of your data. Watch files anddirectories with the file system change monitor, monitor activities within Splunk (such as searchesor configuration changes) with audit events, cryptographically sign audit events events with auditevent signing, and block sign any data entering your Splunk index with IT data signing.

255File system change monitor

You can use the file system change monitor in Splunk Preview to watch any directory or file. Splunkindexes an event any time the file system undergoes any sort of change or someone edits thewatched files. The file system change monitor's behavior is completely configurable throughinputs.conf.

Learn more about how to configure the file system change monitor.

Audit events

Watch your Splunk instance by monitoring audit events. Audit events are generated wheneveranyone accesses any of your Splunk instances -- including any searches, configuration changes oradministrative activities. Each audit event contains information that shows you what changed whereand when and who implemented the change. Audit events are especially useful in distributed Splunkconfigurations for detecting configuration and access control changes across many Splunk Servers.

Learn more about how audit events work.

Audit event signing

If you are using Splunk with an Enterprise license, you can configure audit events to becryptographically signed. Audit event signing adds a sequential number (for detecting gaps in data toreveal tampering), and appends an encrypted hash signature to each audit event.

Configure auditing by setting stanzas in audit.conf, and inputs.conf.

Learn more about audit event signing.

IT data signing

If you are using Splunk with an Enterprise license, you can configure Splunk to verify the integrity ofIT data as it is indexed. If IT data signing is enabled, Splunk creates a signature for blocks of data asit is indexed. Signatures allow you to detect gaps in data or tampered data.

Learn more about IT data signing.

Secure access to Splunk with HTTPS

Secure access to Splunk with HTTPS

You can enable HTTPS via Splunk Web or web.conf. You can also enable SSL through separateconfigurations. Splunk can listen on HTTPS or HTTP, but not both.

Important: If you are using Firefox 3, enabling SSL for a Splunk deployment may result in an "invalidsecurity exception" being displayed in the browser. Refer to this workaround documentation for moreinformation.

256Enable HTTPS using Splunk Web

To enable HTTPS in Splunk Web, navigate to Manager > System settings > General Settings andselect the Yes radio button underneath the Enable SSL (HTTPS) in Splunk Web setting.

Note: You must restart Splunk to enable the new settings. Also, you must now append "https://" tothe URL you use to access Splunk Web.

Enable HTTPS by editing web.conf

In order to enable HTTPS, modify web.conf. Edit this file in

$SPLUNK_HOME/etc/system/local/, or your own custom application directory in$SPLUNK_HOME/etc/apps/. For more information on configuration files in general, see howconfiguration files work.

[settings]httpport = <port number>enableSplunkWebSSL = true

• httpport ♦ Set the port number to your HTTPS port. • enableSplunkWebSSL ♦ Set this key to true to enable SSL for Splunk Web.

Once you have made the changes to web.conf, you must restart Splunk for the changes to takeeffect.

Change HTTPS certificates by editing web.conf

The certificates used for SSL between Splunk Web and the client browser are located in$SPLUNK_HOME/share/splunk/certs/.

Important: Splunk STRONGLY recommends that you DO NOT use the default Splunk Webcertificate. Use of the default Splunk Web certificate will not result in confidential data transmission.

The certificates to use for Splunk Web HTTPS are specified in web.conf under the [settings]stanza.

$ ./bin/splunk restart splunkweb

• You can also use the above procedure to generate a new self-signed certificate if your self-signed certificate has expired.

Secure access to your Splunk server with SSL

258Secure access to your Splunk server with SSL

Overview

The Splunk management port (default 8089) supports both SSL and plain text connections. SSL isturned on by default for communications among Splunk servers. Distributed search will often performbetter with SSL enabled because of its built-in data compression.

To make changes to SSL settings, edit server.conf.

Important: If you are using Firefox 3, enabling SSL for a Splunk deployment may result in an "invalidsecurity exception" being displayed in the browser. Refer to this workaround documentation for moreinformation.

Note: This only enables SSL for Splunk's back-end communication. To turn on SSL for the browser,see "Secure access to Splunk with HTTPS".

Working with SSL settings

When the Splunk server is turned on for the first time, the server generates a certificate for thatinstance. This certificate is stored in the $SPLUNK_HOME/etc/auth/ directory by default.

• keyfile = Certificate for this Splunk instance (created on Splunk start-up by default - if the certCreateScript tag is present).

Note: The path to the keyfile is relative to the caPath setting. If your keyfile is kept outside$SPLUNK_HOME, you must specify a full (absolute) path outside of $SPLUNK_HOME to reach it.

• keyfilePassword = Password for the pem file store, is set to password by default. • caCertFile = This is the name of the certificate authority file. • caPath = Path where the Splunk certificates are stored. Default is $SPLUNK_HOME/etc/auth. • certCreateScript = Script for creating & signing server certificates.

With the default script enabled, on startup, Splunk will generate a certificate in the caPath directory.

259Deactivate SSL

To deactivate SSL, simply set enableSplunkdSSL to FALSE. This will disable SSL.

To disable SSLv2 and tell the HTTP server to only accept connections from SSLv3 clients, includethe supportSSLV3Only attribute and set it to TRUE. By default, this setting is FALSE.

Generate a new root certificate

By default, all Splunk servers use the same root certificate. This allows Splunk instances to connectto each other out of the box.

Important: The default Splunk root certificate (which can be found in

$SPLUNK_HOME/etc/auth/ca.pem) uses a private key that every other user of Splunk in the worldhas access to. Possession of a certificate authority?s private key will allow attackers to generatecertificates that are signed by the trusted authority, which would defeat attempts to controlauthentication via PKI. This is only important if you wish to use SSL authentication functionality.

The script $SPLUNK_HOME/bin/genRootCA.sh (%SPLUNK_HOME%\bin\genRootCA.bat on

Windows) allows you to create a root certificate to be used in creating subsequent server and webcertificates. Run this script when you want to regenerate the certificates Splunk uses. It generatescacerts.pem (public key) and ca.pem (public/private password protected PEM). When you run it, itchecks to see if certs are already in place, and if they are, prompts you to overwrite them. It thenwraps these files into an X509-formatted cert. Distribute cacerts.pem to clients as desired andkeep ca.pem in a secure location.

genRootCA.sh Example for the *nix platforms

The following example generates a new root certificate and private key pair at$SPLUNK_HOME/etc/auth/ca.pem.

Note: if Splunk is installed anywhere but /opt/splunk, you will need to set the environmentvariable OPENSSL_CONF to the path to your Splunk installation's openssl.cnf.

$ export OPENSSL_CONF=$SPLUNK_HOME/openssl/openssl.cnf$ cd $SPLUNK_HOME$ ./bin/genRootCA.sh -d ./splunk/etc/auth/There is ca.pem in this directory. If you choose to replace the CA then splunk servers will reqnew certs signed by this CA before they can interact with it.Do you wish to replace the CA ? [y/N]yrm: cacert.pem: No such file or directoryThis script will create a root CAIt will output two files. ca.pem cacert.pemDistribute the cacert.pem to all clients you wish to connect to you.Keep ca.pem for safe keeping for signing other clients certsRemember your password for the ca.pem you will need to later to sign other client certsYour root CA will expire in 10 years

genRootCA.bat Example for the Windows platform

The following example generates a new root certificate and private key pair at%SPLUNK_HOME%\etc\auth. Make sure that the OPENSSL_CONF environment variable points to theSplunk installation's openssl.cnf. Also note that path followed by the -d option, which specifies thedestination directory for the generated key pair, is a DOS-style path and does not contain spaces.

>cd "c:\Program Files\Splunk\bin"

>set OPENSSL_CONF=c:\Program Files\Splunk\openssl.cnf>splunk.exe cmd cmd.exe /c genRootCA.bat -d c:\progra~1\Splunk\etc\authC:\Program Files\Splunk\bin>splunk.exe cmd cmd.exe /c genRootCA.bat -d c:\progra~1\Splunk\etc\aThere is ca.pem in this directory. If you choose to replace the CA then splunkservers will require new certs signed by this CA before they can interact with it.Do you wish to replace the CA ? [y/N]yDeleting certs cacert.pem and ca.pemdel /f /q cacert.pemdel /f /q ca.pemThis script will create a root CA.It will output two files: ca.pem cacert.pem.Distribute the cacert.pem to all clients you wish to connect to you.Keep ca.pem for safe keeping for signing other clients certs.Remember your password for the ca.pem you will need to later to sign other client certs.Your root CA will expire in 10 years."C:\Program Files\Splunk\bin\openssl.exe" req -newkey rsa:1024 -passout pass:password -subj /colocalityName=SanFrancisco/organizationName=SplunkInc/commonName=SplunkCA/organizationName=Splun

Generate a new signed certificate and private key pair

By default, all Splunk servers use a certificate signed by the common root certificate discussedabove. This allows Splunk instances to connect to each other out of the box.

Important: Splunk STRONGLY recommends that you DO NOT use the default self-signed certificate.Use of these default certificate will not result in confidential transmission of data.

$SPLUNK_HOME/bin/genSignedServerCert.sh allows you to create a new private key and

server certificate using the current Splunk root certificate.

This shell script is a wrapper for the Python script that Splunk runs to generate certificates when you start it for the first time. This scriptgenSignedServerCert.sh creates a CSR (certificate signing request), self-signs it, and outputs a signed private key and certificate pair.genSignedServer.sh Example

The following example will generate a new private key and new server certificate for the serverexample.splunk.com which is signed against the local Splunk root certificate.

* Create certificate server2.pem signed by the root CA

* Store the server2.pem key file locally with your client/server application* Enter a secret pass phrase when requested* The pass phrase is used to access server2.pem in your application* Enter the application's host name as the Common Name when requested* Enter the root CA pass phrase (Getting CA Private Key) to sign the key file* The key file will expire after one year or sooner when the root CA expiresGenerating a 1024 bit RSA private key...........................++++++....................++++++writing new private key to 'server2.pemkey.pem'Enter PEM pass phrase:Verifying - Enter PEM pass phrase:-----You are about to be asked to enter information that will be incorporatedinto your certificate request.What you are about to enter is what is called a Distinguished Name or a DN.There are quite a few fields but you can leave some blankFor some fields there will be a default value,If you enter '.', the field will be left blank.-----Country Name (2 letter code) [AU]:USState or Province Name (full name) [Some-State]:CALocality Name (eg, city) []:SanFranciscoOrganization Name (eg, company) [Internet Widgits Pty Ltd]:Splunk Inc.Organizational Unit Name (eg, section) []:SecurityCommon Name (eg, YOUR name) []:example.splunk.com

Generate a new signed certificate and private key pair on Windows

On Windows run genSignedServercert.py.

* Create certificate server2.pem signed by the root CA* Store the server2.pem key file locally with your client/server application* Enter a secret pass phrase when requested* The pass phrase is used to access server2.pem in your application* Enter the application's host name as the Common Name when requested* Enter the root CA pass phrase (Getting CA Private Key) to sign the key file* The key file will expire after one year or sooner when the root CA expiresLoading 'screen' into random state - doneGenerating a 1024 bit RSA private key.................++++++......................................................++++++writing new private key to 'server2key.pem'Enter PEM pass phrase:Verifying - Enter PEM pass phrase:Verify failureEnter PEM pass phrase:Verifying - Enter PEM pass phrase:-----You are about to be asked to enter information that will be incorporatedinto your certificate request.What you are about to enter is what is called a Distinguished Name or a DN.There are quite a few fields but you can leave some blankFor some fields there will be a default value,If you enter '.', the field will be left blank.-----Country Name (2 letter code) [AU]:USState or Province Name (full name) [Some-State]:CALocality Name (eg, city) []:San FranciscoOrganization Name (eg, company) [Internet Widgits Pty Ltd]:Splunk, Inc.Organizational Unit Name (eg, section) []:Splunk Customer SupportCommon Name (eg, YOUR name) []:Splunk SupportEmail Address []:support@splunk.com

Please enter the following 'extra' attributes

to be sent with your certificate requestA challenge password []:<password>An optional company name []:

Generate a CSR (Certificate Signing Request)

If your organization requires that your Splunk deployment use a certificate signed by an external CAor you otherwise wish to use certificates signed by a root certificate other than the default Splunkauthority, you can use the following procedure to generate the CSR to send to the CA:

You are prompted for the following X.509 attributes of the certificate:

• Country Name: Use the two-letter code without punctuation for country, for example: US or GB. • State or Province: Spell out the state completely; do not abbreviate the state or province name, for example: California • Locality or City: The Locality is the city or town name, for example: Oakland. Do not abbreviate. For example: Los Angeles, not LA, Saint Louis, not St. Louis. • Company: If your company or department contains an &, @, or any other non-alphanumeric symbol that requires you to use the shift key, you must spell out the symbol or omit it. For example, Fflanda & Rhallen Corporation would be Fflanda Rhallen Corporation or Fflanda and Rhallen Corporation. • Organizational Unit: This field is optional; but you can specify it to help identify certificates registered to an organization. The Organizational Unit (OU) field is the name of the department or organization unit making the request. To skip the OU field, press Enter. • Common Name: The Common Name is the Host + Domain Name, for example www.company.com or company.com. This must match the host name of the server where you intend to deploy the certificate exactly.

This creates a private key ([certificate name].key), which is stored locally on your server, and a CSR([certificate name].csr), which contains the public key associated with the private key. You can thenuse this information to request a signed certificate from an external CA.

To copy and paste the information into your CA's enrollment form, open the .csr file in a text editorand save it as a .txt file.

Note: Do not use Microsoft Word; it can insert extra hidden characters that alter the contents of theCSR.

264Generate a CSR (Certificate Signing Request) on Windows

This is very similar to the method described above, but it requires an extra step to set the ENVvariable OPENSSL_CONF -

• Open up a Command Prompt window and navigate to $SPLUNK_HOME\bin

• Set the OPENSSL_CONF ENV variable - C:\Program Files\Splunk\bin>set OPENSSL_CONF=C:\Program Files\Splunk\openssl.cnf • Verify the variable has been set correctly - >echo %OPENSSL_CONF% • Run the command to generate the CSR - >openssl.exe req -new -key "C:\Program Files\Splunk\etc\auth\server.pem" -out server.csr -passin pass:password • As above, you are then prompted for the following X.509 attributes of the certificate.

Distribute certificates to your search peers

Distribute certificates to your search peers

When you enable distributed search on a Splunk instance (and restarting), keys are generated in$SPLUNK_HOME/etc/auth/distServerKeys/

Distribute the files $SPLUNK_HOME/etc/auth/distServerKeys/trusted.pem and private.pem from one

host to the others which will participate in distributed search.

Support for different keys from multiple Splunk instances

Any number of Splunk instances can have their own unique certificates stored on other instances forauthentication. The instances can store keys in$SPLUNK_HOME/etc/auth/distSearchKeys/<peer_name>/<trusted|private>.pem

For example: if you have Splunk instances A and B and they both have different keys and want tosearch Splunk instance C, do the following:

• On peer C, create $SPLUNK_HOME/etc/auth/distSearchKeys/A/ and

Configure archive signing

Configure archive signing

Use archive signing to sign your Splunk data as it is archived (moved from colddb to frozen). Thislets you verify integrity when you restore an archive. Configure the size of the slice by setting yourarchiving policies.

265How archive signing works

Data is archived from the colddb to frozen when either:

• the size of your index reaches a maximum that you specify.

• data in your index reaches a certain age.

Specify archiving policies to define how your data is archived.

Splunk ships with two standard scripts, but you may use your own. Data is archived from the colddbto frozen with a coldToFrozen script that you specify. The coldToFrozen script tells Splunk howto format your data (gz, raw, etc..), and where to archive it. Archive signing happens after thecoldToFrozen script formats your data into its archive format, and then the data is moved to thearchive location that you specified according to your archive policy.

An archive signature is a hash signature of all the data in the data slice.

To invoke archive signing, use the standalone signtool utility. Add signtool -s<path_of_archive> to the coldToFrozen script anywhere after the data formatting lines, butbefore the lines that copy your data to your archive. See the section below on configuringcoldToFrozen scripts.

Verify archived data signatures

Configure coldToFrozen scripts

Configure any coldToFrozen script by adding a line for the signtool utility.

Note: If you use a standard Splunk archiving script, either rename the script or move it to anotherlocation (and specify that location in indexes.conf) to avoid having changes overwritten when youupgrade Splunk.

Standard Splunk archiving scripts

The two standard archiving scripts that are shipped with Splunk are shown below with archivesigning.

Splunk's two archiving scripts are:

compressedExport.sh

This script exports files with the tsidx files compressed as gz.

#!/bin/shgzip $1/*.tsidxsigntool -s <path_to_archive> # replace this with the path to the archive you want signedcp -r $1 /opt/tmp/myarchive #replace this with your archive directory

266flatfileExport.sh

This script exports each splunk 'source' event stream as a flat text file.

Note: flatfileExport.sh is currently not recommended for performance and resource issues we hope toaddress in the future. It can take a long time (tens of minutes to hours), and use a lot of ram, 2-3GB,while running.

Your own custom scripts

You can also use your own scripts to move data from cold to frozen.

Sign or verify your data slices

Use signtool, located in $SPLUNK_HOME/bin, to sign data slices as they are archived or verify theintegrity of an archive.

Syntax

To sign:

signtool [- s | -- sign] archive_path

To verify:

signtool [-v | --verify] archive_path

Configure IT data block signing

Configure IT data block signing

IT data signing helps you certify the integrity of your IT data. If you enable IT data signing and indexsome data, Splunk tells you if that data is ever subsequently tampered with at the source. Forexample, if you have enabled IT data signing and index a log file in Splunk, Splunk will show you ifanyone removes or edits some entries from that log file on the original host. You can thus use Splunkto confirm that your data has been tampered with.

Note: Signing IT data is different than signing Splunk audit events. IT data signing refers to signingexternal IT data while it is indexed by Splunk; audit events are events that Splunk's auditing featuregenerates and stores in the audit index.

267How IT data signatures work

Splunk takes external IT data (typically in the form of log files), and applies digital signatures andsignature verification to show whether indexed or archived data has been modified since the indexwas initially created.

A signature for a block of IT data involves three things:

• A hash is generated for each individual event.

• The events are grouped into blocks of a size you specify. • A digital signature is generated and applied to each block of events.

Note: Splunk can encrypt the hash to create a digital signature if you have configured the public andprivate keys in audit.conf. See "Configure audit event signing" for details.

This digital signature is stored in a database you specify and can be validated as needed. Splunk candemonstrate data tampering or gaps in the data by validating the digital signature at a later date. Ifthe signature does not match the data, an unexpected change has been made.

Configure IT data signing

This section explains how to enable and configure IT data signing. You enable and configure IT datasigning for each index individually, and then specify one central database for all the signing data.

You configure IT data signing in indexes.conf. Edit this file in

$SPLUNK_HOME/etc/system/local/ or in your custom application directory, in$SPLUNK_HOME/etc/apps/. Do not edit the copy in default. For more information onconfiguration files in general, see "About configuration files".

You can:

• Enable IT data signing and specify the number of events contained in your IT data signatures. • Disable IT data signing. • Specify the database to store signing data in.

Note: You must configure audit event signing by editing audit.conf to have Splunk encrypt thehash signature of the entire data block.

Enable IT data signing and specify the number of events in an IT data signature

By default, IT data signing is disabled for all indexes.

To enable IT data signing, set the blockSignSize attribute to an integer value greater than 0. Thisattribute specifies the number of events that make up a block of data to apply a signature to. Youmust set this attribute for each index using IT data signing.

This example enables IT data signing for for the main index and sets the number of events persignature block to 100:

[main]

268blockSignSize=100...

Note: the maximum number of events for the blockSignSize attribute is 2000.

You now must reindex your data for this change to take effect:

./splunk stop./splunk clean all.splunk start

Disable IT data signing

To disable IT data signing, set the blockSignSize attribute to 0 (the default). This exampledisables IT data signing off for the main index:

[main]blockSignSize=0...

Specify the signature database

The IT data signature information for each index with IT data signing enabled is stored in thesignature database. Set the value of the blockSignatureDatabase attribute to the name of thedatabase where Splunk should store IT signature data. This is a global setting that applies to allindexes:

blockSignatureDatabase=<database_name>

The default database name is _blocksignature.

View the integrity of IT data

To view the integrity of indexed data at search time, open the Show source window for results of asearch. To bring up the Show source window, click the drop-down arrow at the left of any searchresult. Select Show source and a window will open displaying the raw data for each search result.

The Show source window displays information as to whether the block of IT data has gaps, hasbeen tampered with, or is valid (no gaps or tampering).

The status shown for types of events are:

• Valid • Tampered with • Has gaps in data

269Performance implications

Because of the additional processing overhead, indexing with IT data signing enabled can negativelyaffect indexing performance. Smaller blocks mean more blocks to sign and larger blocks require morework on display. Experiment with block size to determine optimal performance, as small events caneffectively use slightly larger blocks. The block size setting is a maximum, you may have smallerblocks if you are not indexing enough events to fill a block in a few seconds. This allows incomingevents to be signed even when the indexing rate is very slow.

• Turning IT data signing ON slows indexing.

• Setting the blockSignSize attribute to high integer values (such as 1000) slows indexing performance. • For best performance, set blockSignSize to a value near 100.

Cryptographically sign audit events

Cryptographically sign audit events

Splunk creates audit trail information (by creating and signing audit events) when you have auditingenabled. Audit event signing is only available if you are running Splunk with an Enterprise license.

How audit event signing works

The audit processor signs audit events by applying a sequence number ID to the event, and bycreating a hash signature from the sequence ID and the event's timestamp. Once you've enabledaudit signing, you can search for gaps in the sequence of these numbers and find out if your data hasbeen tampered with.

Hash encryption

For each processed audit event, Splunk's auditing processor computes an SHA256 hash on all of thedata. The processor then encrypts the hash value and applies Base64 encoding to it. Splunk thencompares this value to whatever key (your private key, or the default keys) you specify in audit.conf.

Configure audit event signing

Configure the following settings of Splunk's auditing feature through audit.conf:

• Turn on and off audit event signing.

• Set default public and private keys.

Configure audit.conf

Create your own audit.conf. Edit this file in $SPLUNK_HOME/etc/system/local/, or your owncustom application directory in $SPLUNK_HOME/etc/apps/. For more information on configurationfiles in general, see how configuration files work.

Generate your own keys using genAuditKeys.py in $SPLUNK_HOME/bin/:

# ./splunk cmd python genAuditKeys.py

270This creates your private and public keys, $SPLUNK_HOME/etc/auth/audit/private.pem and$SPLUNK_HOME/etc/auth/audit/public.pem. To use these keys, set privateKey andpublicKey to the path to your keys in your $SPLUNK_HOME/etc/system/local/audit.conf:

Note: If the [auditTrail] stanza is missing, audit events are still generated, but not signed. If thepublicKey or privateKey values are missing, audit events will be generated but not signed.

Search to detect gaps in your data

Note: The functionality described in this section is not yet available as of version 4.0.4 , and will bedelivered in an upcoming maintenance release.

Once you've configured audit event signing, the sequence number ID that the audit processorassigns to each event lets you detect gaps in data which can identify tampering with the system. Youcan search the audit events to determine if gaps are detected:

index=_audit | auditThe field that contains the status of the event is called "validity". Values can be:

• VALIDATED - no gap before this event and event signature matches

• TAMPERED - event signature does not match • NO SIGNATURE - the signature was not found • NO PUBLIC KEY - cannot validate

The field that contains the gap status is called "gap". Values can be:

The information within the first set of brackets ([ ]) is the hashed and signed data. The string in thesecond set of brackets is the hash signature.

What activity generates an audit event?

Audit events are generated from monitoring:

• all files in Splunk's configuration directory $SPLUNK_HOME/etc/*

♦ files are monitored for add/change/delete using the file system change monitor. • system start and stop. • users logging in and out. • adding / removing a new user. • changing a user's information (password, role, etc). • execution of any capability in the system. ♦ capabilities are listed in authorize.conf

Audit event storage

If you have configured Splunk as a forwarder in a distributed setting, audit events are forwarded likeany other event. Signing can happen on the forwarder, or on the receiving Splunk instance.

Audit event processing

The file audit.conf tells the audit processor whether or not to encrypt audit events. As audit events aregenerated, Splunk's auditing processor assigns a sequence number to the event and stores the eventinformation in a SQLite database. If there is no user information specified when the event isgenerated, Splunk uses the currently signed in user information. Finally, if audit event signing is set,Splunk hashes and encrypts the event.

272Search for audit events

Search audit events in Splunk Web or in Splunk's CLI. To do this, pipe your searches to the auditcommand. The audit search command is most useful if audit event signing has been configured.However, if you want to search for all audit events where audit event signing has not been configured(or to skip integrity validation) you may search the whole audit index.

• To search for all audit events, specify the _audit index:

index=_auditThis search returns all audit events.

• Pipe your search to the audit command:

index=_audit | auditThis search returns the entire audit index, and processes the audit events it finds through the auditcommand.

Narrow your search before piping to the audit command. However, you can only narrow the timerange, or constrain by a single host. This is because each host has its own ID number sequence.Since sequential IDs exist to enable detection of gaps in audit events, narrowing a search acrossmultiple hosts causes false gap detection.

The field that contains the status of the event is called "validity". Values can be:

• VALIDATED - no gap before this event and event signature matches

• TAMPERED - event signature does not match • NO SIGNATURE - the signature was not found

The field that contains the gap status is called "gap". Values can be:

• TRUE - a gap was found

• FALSE - no gap was found • N/A - no id was found

Configure event hashing

Configure event hashing

Event hashing provides a lightweight way to detect if events have been tampered with between indextime and search time.

Event hashes aren't cryptographically secure. Someone could tamper with an event if they havephysical access to a machine's file system.

You should use event hashing only if you don't have the capability to run Splunk's IT data blocksigning feature; individual event hashing is more resource intensive than data block signing.

273How event hashing works

When event hashing is enabled, Splunk hashes events with a SHA256 hash just before index time.When each event is displayed at search time, a hash is calculated and compared to that event'sindex time hash. If the hashes match, the event is decorated in the search results as "valid". If thehashes don't match, the event is decorated as "tampered" (For the CLI: the value of the decoration isstored in the internal field: _decoration).

Configure event hashing by editing $SPLUNK_HOME/etc/system/local/audit.conf. Set up

event hashing filters that whitelist or blacklist events based on their host, source, or sourcetype.

• A whitelist is a set of criteria that events must match to be hashed. If events don't match, they aren't hashed. • A blacklist is a set of criteria that events must match to NOT be hashed. If events don't match, then they are hashed.

See more on configuring event hashing below.

Configure event hashing

Turn on event hashing by adding an [eventHashing] stanza to audit.conf. If you want to addfilters to event hashing, list each filter for which you have a filterSpec stanza in acomma-separated list in the filters = key.

Configure filtering

Set up filters for event hashing in audit.conf. Create a stanza after the [eventHashing] stanzato define a filter. Specify the details of each filter using comma-separated lists of hosts, sources, andsourcetypes.

./splunk search " * | top _decoration"

The resulting output:

_decoration count percent

Splunk recommends using the following standards to harden your Splunk instances. Following thesestandards will reduce Splunk's attack surface and mitigate the risk and impact of most vulnerabilities.

Service accounts

• Practice the principle of least privilege by running Splunk as an unprivileged user rather than a privileged account such as root or Administrator.

276 ♦ On unix or linux, use the "splunk" user that is created via the PKG or RPM packages, or create your own user that only has privilege and ownership over $SPLUNK_HOME ♦ On Windows, the local system context is often the best choice. However, if you require communication to occur via a windows communication channel (e.g. WMI), use an account with restricted access.

Splunk components

• Disable all unnecessary Splunk components

♦ For single-server Splunk deployments: ◊ Splunk forwarders should not run Splunk Web and should not be configured to receive data on TCP or UDP ports or from other Splunk instances ♦ For multi-server Splunk deployments: ◊ Splunk search heads should not receive data on TCP or UDP ports or from other Splunk instances ◊ Splunk indexers in a distributed search environment should not run Splunk Web if users are not logging in to them to search ◊ Splunk forwarders should not run Splunk Web and should not be configured to receive data on TCP or UDP ports or from other Splunk instances

Network access

• Do not place Splunk on a network segment that is Internet-facing (i.e. Splunk should not be accessible directly via the Internet) ♦ Remote users will still be able to access Splunk via a Virtual Private Network • Use a host-based firewall to restrict access to Splunk's web, management, and data ports ♦ End users and administrators will need to access Splunk Web (TCP port 8000 by default) ♦ Search heads will need to access their search peers on the Splunk management port (TCP port 8089 by default) ♦ Deployment clients will need to access the deployment server on the Splunk management port (TCP port 8089 by default) ♦ Forwarders will need to access the Splunk index server's data port (TCP port 9997 by default) ♦ Remote CLI calls use the Splunk management port. Consider restricting this port to local calls only via a host firewall. ♦ In most cases, it is not recommended to allow access to Splunk forwarders on any port • Install Splunk on an isolated network segment that only trustworthy machines can access • Do not permit Splunk access to the Internet unless access to Splunkbase or inline documentation is a requirement

Operating System

• Splunk strongly recommends hardening the operating system of all Splunk servers ♦ If your organization does not have internal hardening standards, Splunk recommends the CIS hardening benchmarks ♦ At the very least, limit shell/command line access to your Splunk servers

277Availability and reliability

• Configure redundant instances of Splunk, both indexing a copy of the same data • Back up Splunk data and configurations on a regular basis • Execute a periodic recovery test by attempting to restore Splunk from backup

Physical security

• Secure physical access to all Splunk servers

• Ensure that end users of Splunk practice sound physical and endpoint security ♦ Set a short time-out for user sessions in Splunk Web via Manager

Confidentiality and integrity

• Use SSL encryption on Splunk's web, management, and data ports

Authentication

• Upon installing Splunk, change the default "admin" password the first time that you log in • Do not use the default root certificate or server certificates that ship with Splunk ♦ Either generate a unique root certificate and self-signed server certificates, use your enterprise root certificate authority, or use a third-party certificate authority ♦ Make sure that you back up the private keys in a secure location • Use SSL authentication between forwarders and indexers • Use LDAP or other third-party systems to control authentication to Splunk ♦ When using LDAP, make sure that your LDAP implementation enforces: ◊ Strong password requirements for length and complexity ◊ A low incorrect attempt threshold for password lockout

Authorization

• Protect access to Splunk features and data by using Splunk's role-based access control ♦ Practice the principle of least privilege by only granting access to features and data based on business justification ♦ Use an approval process to validate access requests to Splunk features and data

Auditing

• Perform a periodic review of Splunk's access and audit logs

• Perform a periodic review of the Splunk server's audit and security logs

Configuration management

• Use a configuration management tool such as subversion to provide version control for Splunk configurations • Integrate Splunk configuration changes into your existing change management framework • Configure Splunk to monitor its own configuration files and alert on changes

278Client browser

• Use a current version of a supported browser such as Firefox or Internet Explorer

• Use a client-side JavaScript blocker such as noscript on Firefox or Internet Explorer 8 Filters to help protect against XSS, XSRF, and similar exploits • Ensure that users have the latest version of Flash installed

Splunk deployments range in size from departmental, single server installations, where one Splunkinstance handles all tasks ranging from data input to indexing to search and reporting, to enterprisedistributed deployments with data flowing between thousands of instances of Splunk forwarders,indexers, and search heads. Splunk provides valuable tools to handle distributed deployments of anysize. Chief among them is the Splunk deployment server and its features.

The deployment server is Splunk's technology for pushing out configurations and content todistributed Splunk instances. A key use case for the deployment server is to manage configuration forgroups of forwarders. For example, if you have several sets of forwarders, each set residing on adifferent machine type, you can use the deployment server to push out different content according tomachine type. Similarly, in a distributed search environment, you can use a deployment server topush out content to sets of indexers. If you want an overview of the different ways you can design aSplunk deployment in your organization, check out the deployment information in the CommunityWiki.

The first several topics in this section explain how to configure a deployment server and its clients.Topics then follow that show how to employ this technology for specific use cases.

The big picture (in words and diagram)

In a Splunk deployment, you use a deployment server to push out content and configurations(collectively called deployment apps) to deployment clients, grouped into server classes.

A deployment server is a Splunk instance that acts as a centralized configuration manager,

collectively managing any number of Splunk instances, called "deployment clients". Any Splunkinstance -- even one indexing data locally -- can act as a deployment server.

A deployment client is a Splunk instance remotely configured by a deployment server. A Splunk

instance can be both a deployment server and client at the same time. Each Splunk deploymentclient belongs to one or more server classes.

A server class is a set of deployment clients, grouped by some set of configuration characteristics, sothat they can be managed as a unit. You can group clients by application, OS, type of data, or anyother feature of your Splunk deployment. To update the configuration for a set of clients, thedeployment server pushes out configuration files to all or some members of a server class. Besidesconfiguration settings, you can use the deployment server to push out any sort of content. Serverclasses are configured on the deployment server.

This diagram provides a conceptual overview of the relationship between a deployment server and itsset of deployment clients and server classes:

280In this example, each deployment client is a Splunk forwarder that belongs to two server classes, onefor its OS and the other for its geographical location. The deployment server maintains the list ofserver classes and uses those server classes to determine what content to forward to each client. Foran example of how to implement this type of arrangement to govern the flow of content to clients, see"Deploy several standard forwarders".

A deployment app is a set of deployment content (including configuration files) deployed as a unit toclients of a server class. A deployment app might consist of just a single configuration file, or it canconsist of many files. Depending on filtering criteria, an app might get deployed to all clients in aserver class or to a subset of clients. Over time, an app can be updated with new content and thenredeployed to its designated clients. The deployment app can be an existing Splunk app, or onedeveloped solely to group some content for deployment purposes.

Note: The term "app" has a somewhat different meaning in the context of the deployment server fromits meaning in the general Splunk context. For more information on Splunk apps, see "What are appsand add-ons?".

For more information on deployment servers, server classes, and deployment apps, see "Defineserver classes". For more information on deployment clients, see "Configure deployment clients".

A multi-tenant environment means that you have more than one deployment server running on thesame Splunk instance, and each deployment server is serving content to its own set of deploymentclients. For information about multi-tenant environments, see "Deploy in multi-tenant environments".

Key terms

Here's a recap of the key definitions:

Term Meaning A Splunk instance that acts as a centralized configuration manager. It pushesdeployment server configuration updates to other Splunk instances. A remotely configured Splunk instance. It receives updates from thedeployment client deployment server. A deployment configuration shared by a group of deployment clients. Aserver class deployment client can belong to multiple server classes.

281deployment app A unit of content deployed to one or more members of a server class.multi-tenant A deployment environment involving multiple deployment servers.environmentCommunication between deployment server and clients

The deployment client periodically polls the deployment server, identifying itself. The deploymentserver then reviews the information in its configuration to find out if there is something new orupdated to push out to that particular client. If there is new content to deploy to a given deploymentclient, the deployment server tells the client exactly what it should retrieve. The deployment clientthen retrieves the new content and treats it according to the instructions specified for the server classit belongs to--maybe it should restart, run a script, or just wait until someone tells it to do somethingelse.

Plan a deploymentPlan a deployment

If you've got Splunk instances serving a variety of different populations within your organization,chances are their configurations vary depending on who uses them and for what purpose. You mighthave some number of Splunk instances serving the helpdesk team, configured with a specific app toaccelerate troubleshooting of Windows desktop issues. You might have another group of Splunkinstances in use by your operations staff, set up with a few different apps designed to emphasizetracking of network issues, security incidents, and email traffic management. A third group of Splunkinstances might serve the Web hosting group within the operations team.

Rather than having to manage and maintain these divergent Splunk instances one at a time, you canput them into groups based on their use, identify the configurations and apps needed by each group,and then use the deployment server to update their various apps and configurations as needed.

In addition to grouping Splunk instances by use, there are other useful types of groupings you canspecify. For example, you might group Splunk instances by OS or hardware type, by version, or bygeographical location or timezone.

Configuration overview

For the great majority of deployment server configurations, perform these steps:

1. Designate one of your Splunk servers as the deployment server. A deployment server can also bea deployment client, either of itself or of a different deployment server.

2. Group the deployment clients into server classes. A server class defines the clients that belong toit and what content gets pushed out to them. Each deployment client can belong to multiple serverclasses.

3. Create a serverclass.conf file on the deployment server. It specifies the server classes andthe location of the deployment apps. Refer to "Define server classes" in this manual for details.

4. Create the directories for your deployment apps, and put the content to be deployed into thosedirectories. Refer to "Deploy apps and configurations" in this manual for details.

2825. Create a deploymentclient.conf for each deployment client. It specifies what deploymentserver the client should communicate with, the specific location on that server from which it shouldpick up content, and where it should put it locally. Refer to "Configure deployment clients" in thismanual for details.

6. For more complex deployments with multiple deployment servers, create a tenants.conf file onone of the deployment servers. This allows you to define multiple deployment servers on a singleSplunk instance and redirect incoming client requests to a specific server according to rules youspecify. Refer to "Deploy in multi-tenant environments" in this manual for more information aboutconfiguring tenants.conf. Most deployment server topologies don't require that you touchtenants.conf, however.

For an example of an end-to-end configuration, see "Deploy several standard forwarders".

Note: The deployment server and its deployment clients must agree in the SSL setting for theirsplunkd management ports. They must all have SSL enabled, or they must all have SSL disabled.To configure SSL on a Splunk instance, set the enableSplunkdSSL attribute in server.conf to "true"or "false".

Restart or reload?

The first time you configure the deployment server and its clients, you'll need to restart all instancesof Splunk. When you restart the deployment server, it automatically deploys any new content to itsclients. Later on, to deploy new or updated content without restarting, you can use the CLI reloadcommand, as described in "Deploy apps and configurations" in this manual.

Define server classes

Define server classes

A server class defines a deployment configuration shared by a group of deployment clients. It

defines both the criteria for being a member of the class and the set of content to deploy to membersof the class. This content (encapsulated as "deployment apps") can consist of Splunk apps, systemconfigurations, and other related content, such as scripts, images, and supporting material. You candefine different server classes to reflect the different requirements, OSes, machine types, or functionsof your deployment clients.

You define server classes in serverclass.conf on the deployment server. Create one in$SPLUNK_HOME/etc/system/local. For information about configuration files, including anexplanation of their basic structure, see "About configuration files" in this manual.

If you have multiple server classes, you might want to define a "global" server class that applies to alldeployment clients by default. You can then override various aspects of it as needed by definingmore specific server classes. For example, if you have a mix of Windows and Linux forwarderssending data to the same indexer, you might want to specify that all forwarders get a commonoutputs.conf file, but that Windows forwarders get one inputs.conf file and Linux forwarders adifferent one. In that case, you could specify the outputs.conf in the global server class and thencreate separate Windows and Linux server classes for the different inputs.conf files.

283In addition to defining attributes and content for specific server classes, you can also define attributesthat pertain just to a single app within a server class.

Important: All configuration information is evaluated numerically and then alphabetically (0-9, thena-z), so nomenclature matters.

A deployment client has its own configuration, defined in deploymentclient.conf. The information indeploymentclient.conf tells the deployment client where to go to get the content that the serverclass it belongs to says it should have.

The next section provides a reference for the server class configuration settings. You might want toread it while referring to the set of simple example configurations presented later in this topic. Inaddition, there are several longer and more complete examples presented later in this manual,including "Deploy several standard forwarders".

What you can define for a server class

You can specify settings for a global server class, as well as for individual server classes or appswithin server classes. There are three levels of stanzas to enable this:

Stanza Meaning Scope

The global Attributes defined here pertain[global] server class. to all server classes. Individual server class. A Attributes defined here pertain[serverClass:<serverClassName>] serverClass to just the server class is a collection <serverClassName>. of apps. Attributes defined here pertain to just the specified deployment app <appName> within the specified <serverClassName>. To App within indicate all apps within[serverClass:<serverClassName>:app:<appName>] server class. <serverClassName>, <appName> can be the wildcard character: *, in which case it will cause all content in the repositoryLocation to be added to this serverClass.Attributes in more specific stanzas override less specific stanzas. Therefore, an attribute defined in a[serverClass:<serverClassName>] stanza will override the same attribute defined in[global].

The attributes are definable for each stanza level, unless otherwise indicated. Here are the mostcommon ones:

284 Attribute What it's forrepositoryLocation The location on the deployment $SPLUNK_HOME/etc/deployment-apps server where the content to be deployed for this server class is stored.targetRepositoryLocation The location on the deployment $SPLUNK_HOME/etc/apps client where the content to be deployed for this server class should be installed. You can override this in deploymentclient.conf on the deployment client.continueMatching If set to false, the deployment true server will look through the list of server classes in this configuration file and stop when it matches the first one to a client. If set to true, the deployment server will continue to look and match. This option is available because you can define multiple, layered sets of server classes. A serverClass can override this property and stop the matching.endpoint The HTTP location from which $deploymentServerUri$/services/s content can be downloaded by a deployment client. The deployment server fills in the variable substitutions itself, based on information received from the client. You can provide any URI here, as long as it uses the same variables. In most cases, this attribute does not need to be specified.filterType Set to "whitelist" or "blacklist". whitelist

This determines the order of

execution of filters. If filterType is whitelist, all whitelist filters are applied first, followed by blacklist filters. If filterType is blacklist, all blacklist filters are applied first, followed by whitelist filters.

• Items are not considered to

match the stanza by default. • Items that match any whitelist entry, and do not match any blacklist entry, are considered to match the stanza. • Items that match any blacklist entry are not considered to match the stanza, regardless of whitelist.

The blacklist setting indicates

a filtering strategy that rules out asubset:

• Items are considered to

match the stanza by default. • Items that match any blacklist entry, and do not match any whitelist entry, are considered to not match the stanza. • Items that match any whitelist entry are considered to match the stanza.

You can override this value at the

serverClass andserverClass:app levels. If youspecify whitelist at the globallevel, and then specify blacklistfor an individual server class, thesetting becomes blacklist for

286 that server class, and you have to provide another filter in that server class definition to replace the one you overrode.whitelist.<n> <n> is a number starting at 0, and n/a incrementing by 1.blacklist.<n> Set the attribute to ipAddress, hostname, or clientName:

• ipAddress is the IP address

of the deployment client. Can use wildcards, such as 10.1.1.* • hostname is the host name of deployment client. Can use wildcards, such as *.splunk.com. • clientName is a logical or tag name that can be assigned to a deployment client in deploymentclient.conf. clientName takes precedence over ipAddress or hostname when matching a client to a filter.

This will cause all hosts in

When filterType is blacklist:

This will cause only the 'web' and

'linux' hosts to match the server class. No other hosts will match.

287 You can override this value at the serverClass and serverClass:app levels.

Important: Overriding one type of

filter (whitelist/blacklist) causes the other to the overridden too. If, for example, you override the whitelist, the blacklist will not be inherited from the parent; you must provide one in the stanza.stateOnClient Set to "enabled", "disabled", or enabled "noop". This setting specifies whether the deployment client receiving an app should enable or disable the app once it is installed. The "noop" value is for apps that do not require enablement; for example, apps containing only Splunk knowledge, such as event or source types.machineTypes Matches any of the machine types n/a in a comma-separated list.

This setting lets you use the

hardware type of the deployment client as a filter. This filter will be used only if a client could not be matched using the whitelist/blacklist filters. The easiest way to ensure that Splunk uses machineTypes as the filter is to add this setting to the top of your serverclass.conf file:

[global] blacklist.0=*

Note: machineTypes will have no

effect if used in conjunction with whitelist.0=*. This is because whitelist.0=* causes a match on all clients, and machineTypes only gets used if no clients are matched through whitelist or blacklist filters.

288 The value for machineTypes is a comma-separated list of of machine types; for example, linux-i686, linux-x86_64, etc. Each machine type is a specific string designated by the hardware platform itself.

The method for finding this string

on the client varies by platform, but if the deployment client is already connected to the deployment server, you can determine the string's value by using this Splunk CLI command on the deployment server:

This setting will match any of the

Note: Be sure to include the 's' at

the end of "machineTypes"restartSplunkWeb Set to "true" or "false". Determines false whether the client's Splunk Web restarts after the installation of a server class or app.restartSplunkd Set to "true" or "false". Determines false whether the client's splunkd restarts after the installation of a server class or app.Note: The most accurate and up-to-date list of settings available for a given configuration file is in the.spec file for that configuration file. You can find the latest version of the .spec and .example files forserverclass.conf in serverclass.conf in the Configuration file reference in this manual, or in$SPLUNK_HOME/etc/system/README.

289Examples

Here are several examples of defining server classes in the serverclass.conf file:

# Example 1# Matches all clients and includes all apps in the server class

[global]whitelist.0=*# whitelist matches all clients.[serverClass:AllApps][serverClass:AllApps:app:*]# a server class that encapsulates all apps in the repositoryLocation -# in this case, $SPLUNK_HOME/etc/apps

[global]# blacklist.0=* at the global level ensures that the machineTypes filter# invoked later will apply.blacklist.0=*

[serverClass:AppsByMachineType]# Include all machineTypes used by apps in this server class.# It is important to have a general filter here and a more specific# filter at the app level. An app is matched only if the server class# it is contained in was also succesfully matched.machineTypes=windows-intel, linux-i686, linux-x86_64

[serverClass:AppsByMachineType:app:SplunkDesktop]# Deploy this app only to Windows boxes.machineTypes=windows-intel

This topic describes the options for setting up deployment clients when using the Splunkdeployment server functionality. A deployment client belongs to one or more server classes. Twoconfiguration files share primary responsibility for determining how deployment functions:

• The deployment server has a serverclass.conf file, which specifies the server classes it deploys to. A server class defines what content a given deployment client should download. See "Define server classes" for details on how to configure this file. • Each deployment client has a deploymentclient.conf file, which specifies what deployment server it should contact to get content, and where to put the content once it is downloaded. The current topic describes how to configure this file.

For information about configuration files, including an explanation of their basic structure, see Aboutconfiguration files.

The next section provides a reference for the deployment client configuration settings. You mightwant to read it while referring to the set of simple example configurations presented later in this topic.In addition, there are several longer and more complete examples presented later in this manual,including "Deploy several standard forwarders".

What you can define for a deployment client

To enable a Splunk instance as a deployment client, create a deploymentclient.conf in

$SPLUNK_HOME/etc/system/local.

You can also enable a deployment client through the CLI. See "Enable and disable deploymentclients using the CLI" later in this topic.

The deploymentclient.conf file provides two stanzas:

Stanza Meaning Includes a number of configuration attributes. Most[deployment-client] importantly, this stanza specifies where to place downloaded content. Specifies the location of this client's deployment[target-broker:deploymentServer] server. "deploymentServer" is the default name for a deployment server.

These are the main attributes available in the [deployment-client] stanza:

Attribute What it's for

disabled Set to "true" or "false". If "true", it false disables the deployment client.clientName deploymentClient

291 A name, or "tag," that the deployment server can use to filter on. It takes precedence over hostnames.workingDir A temporary folder used by the $SPLUNK_HOME/var/run/depl deployment client to download server classes and applications.repositoryLocation The repository location where apps are $SPLUNK_HOME/etc/apps installed after being downloaded from a deployment server.

Note that, for each app that is

downloaded, deployment server may also specify a repository location to install it. The deployment client will use the serverRepositoryLocationPolicy attribute to determine which location to use.serverRepositoryLocationPolicy Set to "acceptSplunkHome", acceptSplunkHome "acceptAlways", or "rejectAlways":

• acceptSplunkHome - Accept the

repository location supplied by the deployment server, if and only if it is rooted by $SPLUNK_HOME. • acceptAlways - Always accept the repository location supplied by the deployment server. • rejectAlways - Always reject the server-supplied repository location. Instead, use the repositoryLocation specified in this configuration file.endpoint The HTTP endpoint from which content $deploymentServerUri$/service should be downloaded.

Note: The deployment server can

specify a different endpoint from which to download each set of content (individual apps, etc). The deployment client uses the serverEndpointPolicy atrtribute to determine which value to use.

$deploymentServerUri$ resolves to targetUri, defined in the

292 [target-broker] stanza. $serviceClassName$ and $appName$ mean what they say.serverEndpointPolicy Set to "acceptAlways" or "rejectAlways": acceptAlways

• acceptAlways - Always accept

the endpoint supplied by the deployment server. • rejectAlways - Always reject the endpoint supplied by the deployment server. Instead, use the endpoint defined by the endpoint attribute.phoneHomeIntervalInSecs A number that determines how 60 frequently the deployment client should check for new content.

This is the attribute available in the [target-broker:deploymentServer] stanza:

Attribute What it's for Default

targetUri Set to <Deployment_server_URI>:<Mgmt_port>. Specifies the n/a deployment server connection information. The management port is typically 8089.Note: The most accurate and up-to-date list of settings available for a given configuration file is in the.spec file for that configuration file. You can find the latest version of the .spec and .example files inthe Configuration file reference in this manual, or in $SPLUNK_HOME/etc/system/README.

Examples

Here are several examples of defining deployment clients through the deploymentclient.conffile:

# Example 1# Deployment client receives apps, placing them into the same repositoryLocation locally,# relative to $SPLUNK_HOME, that it picked them up from. This is typically# $SPLUNK_HOME/etc/apps.# There is nothing in [deployment-client], because the deployment client is not overriding# the value set on the deployment server side.

# Example 2# Deployment server keeps apps to be deployed in a non-standard location on the server side# (perhaps for organization purposes).# Deployment client receives apps and places them in the standard location.# Note: Apps deployed to any location other than $SPLUNK_HOME/etc/apps on the deployment# client side will not be recognized and run.

293# This configuration rejects any location specified by the deployment server and replaces# it with the standard client-side location.

Enable and disable deployment clients using the CLI

To enable a deployment client, run the following command from the Splunk CLI:

./splunk set deploy-poll <IP address or hostname>:<port>

Include the IP address/hostname and management port of the deployment server. The managementport is typically 8089.

Now restart the deployment client to make the change take effect.

To disable a deployment client, run the following command:

./splunk disable deploy-client

Deploy in multi-tenant environments

294Deploy in multi-tenant environments

Note: It is recommended that you work with Splunk Professional Services when designing amulti-tenant deployment. It is best not to edit tenants.conf on your own.

A multi-tenant deployment server topology means that you have more than one deploymentserver running on the same Splunk instance, and each deployment server is serving content to itsown set of deployment clients. (You can also achieve the same effect by using two Splunkinstances, each with its own configuration.)

Use tenants.conf to redirect incoming requests from deployment clients to another deployment serveror servers. The typical reason for doing this is to offload your splunkd's HTTP server -- having manydeployment clients hitting the splunkd HTTP server at once to download apps and configurations canoverload the deployment server. Over 400 connections at one time has been shown to bog downsplunkd's HTTP server, but this does not take into account hardware or the size of the package theclient is downloading -- this will be constrained by bandwidth size.

To set up multiple deployment servers on a single Splunk instance, you:

• Create a tenants.conf containing a whitelist or blacklist that tells deployment clients which deployment server instance to use. • Create a separate instance of serverclass.conf for each deployment server, named for that deployment server, like so: <tenantName>-serverclass.conf. • For each deployment client, configure deploymentclient.conf the way you would if there were just one deployment server.

For information about configuration files, including an explanation of their basic structure, see Aboutconfiguration files.

What you can define in tenants.conf

You identify the different deployment servers as "tenants" in tenants.conf on the Splunk instancethat will host these deployment servers. There isn't a tenants.conf file by default, so you mustcreate one in $SPLUNK_HOME/etc/system/local and define the tenants in it.

For each tenant, create a stanza with the heading [tenant:<tenantName>] with these attributes:

Attribute What it's for Default

filterType Set to "whitelist" or "blacklist". Determines the type of filter to use. whitelist Deployment clients use the filter to determine which deployment server to access.whitelist.<n> <n> is a number starting at 0, and incrementing by 1. The client stops n/a looking at the filter when <n> breaks.blacklist.<n> Set the attribute to ipAddress, hostname, or clientName:

• ipAddress is the IP address of the deployment client. Can use

wildcards, such as 10.1.1.*

295 • hostname is the host name of deployment client. Can use wildcards, such as *.splunk.com. • clientName is a logical or tag name that can be assigned to a deployment client in deploymentclient.conf. clientName takes precedence over ipAddress or hostname when matching a client to a filter.Example

Here is an example of defining two tenants in the tenants.conf file:

# Define two tenants - dept1 and dept2.

# Deployment server configuration for dept1 will be in a matching dept1-serverclass.conf# Deployment server configuration for dept2 will be in a matching dept2-serverclass.conf

[tenant:dept1]whitelist.0=*.dept1.splunk.com

[tenant:dept2]whilelist.0=*.dept2.splunk.com

Deploy apps and configurations

Deploy apps and configurations

After configuring the deployment server and clients, two steps remain:

1. Put the new or updated deployment content into deployment directories.

2. Inform the clients that it's time to download new content.

Create directories for deployment apps and place content in them

The location of deployment directories is configurable by means of the repositoryLocation

attribute in serverclass.conf, as described in "Define server classes". The default location fordeployment directories is $SPLUNK_HOME/etc/deployment-apps. Each app must have its ownsubdirectory, with the same name as the app itself, as specified in serverclass.conf.

This example creates a deployment directory in the default repository location, for an app named"fwd_to_splunk1":

mkdir ?p $SPLUNK_HOME/etc/deployment-apps/fwd_to_splunk1/default

Place the content for each app into the app's subdirectory. To update the app by with new orchanged content, just add or overwrite the files in the directory.

Inform clients of new content

When you first configure the deployment server, and whenever you update its configuration by editingserverclass.conf, you'll need to restart or reload it for the changes to take effect. The clients willthen pick up any new or changed content in configuration and in apps. You may use the CLI reload

296command instead of restarting the server.

To use Splunk's CLI, change to the $SPLUNK_HOME/bin/ directory and use the ./splunkcommand. (On Windows, you do not need to include the ./ beforehand.)

This command checks all server classes for change and notifies the relevant clients:

./splunk reload deploy-server

This command notifies and updates only the server class you specify:

./splunk reload deploy-server -class <server class>

For example:

./splunk reload deploy-server -class www

In this example, the command notifies and updates only the clients that are members of the wwwserver class.

Once a client receives new configurations, it restarts splunkd and enables the relevant apps, ifconfigured to do so in serverclass.conf.

Confirm the deployment update

To confirm that all clients received the configuration correctly, run this command from the deploymentserver:

./splunk list deploy-clients

This lists all the deployment clients and specifies the last time they were successfully synced.

Example: deploy a light forwarder

Example: deploy a light forwarder

This example walks through the configuration needed to deploy an app, in this case, the Splunk lightforwarder.

On the deployment server

1. Copy the SplunkLightForwarder app from $SPLUNK_HOME/etc/apps to the deployment directory,

$SPLUNK_HOME/etc/deployment-apps on the deployment server.

2. Edit serverclass.conf in /system/local on the deployment server. Add a server class

Note the following:

• The [global] stanza is required. It contains any settings that should be globally applied. ♦ In the [global] stanza, whitelist.0=* signifies that all of the deployment server's clients match all server classes defined in this configuration file. In this example, there is just a single server class. • The server class name is "lightforwarders". You can call your server classes anything you want. ♦ In the [serverClass:lightforwarders] stanza, whitelist.0=* signifies that all clients match the lightforwarders server class. • The [serverClass:lightforwarders:app:SplunkLightForwarder] stanza contains settings specific to the SplunkLightForwarder app on the lightforwarders server class. ♦ stateOnClient specifies that this app should be enabled on the client when it is deployed. ♦ restartSplunkd specifies that when this app is deployed, splunkd should be restarted.

See "Define server classes" for details on how to configure this file.

On the deployment client

Edit deploymentclient.conf in /system/local on the deployment client to tell the client howto contact the deployment server:

Note the following:

• deploymentServer is the default name for a deployment server.

• <IP:port> is the IP address and port number for this client's deployment server.

The file points the client to the deployment server located at IP:port. There, it will pick up theSplunk light forwarder app, enable it, and restart. See "Configure deployment clients" for details onhow to configure this file.

Extended example: deploy several standard forwarders

Extended example: deploy several standard forwarders

What we're aiming for

A common use for deployment servers is to manage forwarder configuration files. In somedistributed environments, forwarders can number into the hundreds, and the deployment server

298greatly eases the work of configuring and updating them. This example shows how to use thedeployment server to initially configure a set of dissimilar forwarders. A follow-up example, in"Example: add an input to forwarders", shows how to use the deployment server to update theforwarders' configurations with new inputs.

The example sets up the following distributed environment, in which a deployment server deploysconfigurations for three forwarders sending data to two indexers:

• The deployment server Fflanda-SPLUNK3 (10.1.2.4) manages deployment for these

Here's the basic set up:

For information on monitoring files, such as message logs, see "Monitor files and directories" in thismanual.

299Overview of the set up

Here's an overview of the set up process (the detailed steps follow in the next section):

On the deployment server:

1. Create the set of server classes and apps for the deployment clients (forwarders) withserverclass.conf. You'll create two server classes to represent the two OS types (Windows,Linux). For each server class, you'll also create two deployment apps, for a total of four apps. Theapps encapsulate:

• The type of input -- the data that the forwarder will monitor (Windows event logs or Linux messages). • The type of output -- the indexer the forwarder will send data to (SPLUNK1 or SPLUNK2).

This configuration results in each forwarder belonging to a server class and receiving two apps: onefor its inputs and one for its outputs.

2. Create directories to hold the deployment apps.

3. Create configuration files (outputs.conf and inputs.conf) to deploy to the forwarders. Thesefiles constitute the deployment apps and reside in the app directories.

4. Restart the deployment server.

On each Splunk indexer that will be receiving data from forwarders:

1. Enable receiving through the Splunk CLI.

2. Restart the receiver.

On each forwarder/deployment client:

1. Create a deploymentclient.conf file that points to the deployment server.

2. Restart the forwarder.

The rest is Splunk magic. After a short delay (while the forwarders receive and act upon theirdeployed content), Windows event logs begin flowing from Fflanda-WIN1 to Fflanda-SPLUNK1, and/var/log/messages begin flowing from Fflanda-LINUX1 and Fflanda-LINUX2 to Fflanda-SPLUNK2.

# Server class for Windows

# App for inputting Windows event logs

# This app is only for clients in the server class Fflanda-WIN [serverClass:Fflanda-WIN:app:winevt] #Enable the app and restart Splunk, after the client receives the app stateOnClient=enabled restartSplunkd=true

# App for forwarding to SPLUNK1

# This app is only for clients in the server class Fflanda-WIN [serverClass:Fflanda-WIN:app:fwd_to_splunk1] stateOnClient=enabled restartSplunkd=true

7. Create $SPLUNK_HOME/etc/deployment-apps/linmess/default/inputs.conf with the

For information on monitoring files, such as message logs, see "Monitor files and directories" in thismanual.

8. Restart Splunk.

Note: Because the deployment server in this example is newly configured, it requires a restart for itsconfiguration to take effect. When clients poll the server for the first time, they'll get all the contentdesignated for them. To deploy subsequent content, you generally do not need to restart the server.Instead, you just invoke the Splunk CLI reload command on the server, as described in "Deployapps and configurations". By doing so, you ensure that the server will inform its clients of contentchanges. However, whenever you edit serverclass.conf, you must always restart thedeployment server for the configuration changes to take effect.

302On each receiver, Fflanda-SPLUNK1 and Fflanda-SPLUNK2:

1. Install Splunk, if you haven't already done so.

2. Run the following CLI command:

./splunk enable listen 9997 -auth <username>:<password>

This specifies that the receiver will listen for data on port 9997. With proper authorization, anyforwarder can now send data to the receiver by designating the receiver's IP address and portnumber. You must enable receivers before you enable the forwarders that will be sending data tothem.

For information on enabling receivers, see "Enable forwarding and receiving" in this manual.

3. Restart Splunk.

On each forwarder, Fflanda-WIN1, Fflanda-LINUX1, and Fflanda-LINUX2:

1. Install Splunk, if you haven't already done so.

2. Create $SPLUNK_HOME/etc/system/local/deploymentclient.conf with the following settings:

[deployment-client]

[target-broker:deploymentServer] # Specify the deployment server that the client will poll. targetUri= 10.1.2.4:8089

See "Configure deployment clients" for details on how to configure this file.

3. Restart Splunk.

Each forwarder will now poll the deployment server, download its configuration files, restart, andbegin forwarding data to its receiving indexer.

For a follow-up example showing how to use the deployment server to update forwarderconfigurations, see "Example: Add an input to forwarders".

What the communication between the deployment server and its clients looks like

Using the above example, the communication from Fflanda-WIN1 to Fflanda-SPLUNK3 on port 8089would look like this:

Fflanda-WIN1: Hello, I am Fflanda-WIN1.

Fflanda-SPLUNK3: Hello, Fflanda-WIN1. I have been expecting to hear from you. I have you downas a member of the Fflanda-WIN server class, and you should have the fwd_to_splunk1(checksum=12345) and winevt (checksum=12378) apps.

303Fflanda-WIN1: Hmmm, I don?t have those configs. Using this connection I just opened up to you,can I grab the configs from you?

Fflanda-SPLUNK3: Sure! I have them ready for you.

Fflanda-WIN1: Thanks! I am going to back off a random number of seconds between 1 and 60 (incase you have a lot of clients that are polling you at the moment) ... OK, now send me the files.

Fflanda-SPLUNK3: Done! You now have fwd_to_splunk1-timestamp.bundle and

winevt-timestamp.bundle.

Fflanda-WIN1: Awesome! I am going to store them in my $SPLUNK_HOME/etc/apps directory. Now

I am going to restart myself, and when I come back up I am going to read the configurations that yousent me directly out of the .bundle files, which I know are just tar balls with a different extension.

A couple of minutes go by....

Fflanda-WIN1: Hello, I am Fflanda-WIN1.

Fflanda-SPLUNK3: Hello, Fflanda-WIN1. I have been expecting to hear from you. I have you downas a member of the Fflanda-WIN server class, and you should have the fwd_to_splunk1(checksum=12345) and winevt (checksum=12378) Apps.

Fflanda-WIN1: Hmmm, I already have both of those, but thanks anyway!

Later on, an admin modifies the winevt/inputs.conf file on Fflanda-SPLUNK3 to disable thecollection of system event logs, and then runs the CLI command splunk reload deploy-serverto force the deployment server to rescan serverclass.conf and the app directories. The next timeFflanda-WIN1 talks to Fflanda-SPLUNK3, it goes like this:

Fflanda-WIN1: Hello, I am Fflanda-WIN1.

Fflanda-SPLUNK3: Hello, Fflanda-WIN1. I have been expecting to hear from you. I have you downas a member of the Fflanda-WIN server class, and you should have the fwd_to_splunk1(checksum=12345) and winevt (checksum=13299) Apps.

Fflanda-WIN1: Hmmm, I know I have those configs, but the checksum I have for the winevt configs isdifferent than the one you just told me about. Using this connection I just opened up to you, can Igrab the updated winevt config from you?

Fflanda-SPLUNK3: Sure! I have it ready for you.

Fflanda-WIN1: Thanks! I am going to back off a random number of seconds between 1 and 60 (incase you have a lot of clients that are polling you at the moment) ... Ok, now send me the updatedconfig.

Fflanda-SPLUNK3: Done! You now have winevt-newer_timestamp.bundle.

304Fflanda-WIN1: Awesome! I am going to store it my $SPLUNK_HOME/etc/apps directory and movethe old winevt.bundle I had out of the way. Now I am going to restart myself, and when I come backup, I'll have the most up-to-date config.

Example: add an input to forwarders

Example: add an input to forwarders

The previous topic, "Extended example: deploy several standard forwarders", described setting up adeployment environment to manage a set of forwarders. It showed how to configure a newdeployment server to deploy content to a new set of deployment clients. The current example followson directly from there, using the configurations created in that topic. It shows how to update aforwarder configuration file and deploy the updated file to a subset of forwarders, defined by a serverclass.

Overview of the update process

This example starts with the set of configurations and Splunk instances created in the topic"Extended example: deploy several standard forwarders". The Linux forwarders now need to startmonitoring data from a second source. To accomplish this:

1. Edit the inputs.conf file for the Linux server class to add the new source, overwriting theprevious version in its apps directory.

2. Use CLI to reload the deployment server, so that it becomes aware of the change and can deploy itto the appropriate set of clients.

You need make changes only on the deployment server. When the deployment clients in the Linuxserver class next poll the server, they'll be notified of the new inputs.conf file. They'll downloadthe file, enable it, restart Splunk, and immediately begin monitoring the second data source.

Detailed configuration steps

1. Edit $SPLUNK_HOME/etc/deployment-apps/linmess/default/inputs.conf to add a

2. Use Splunk CLI to reload the deployment server:

./splunk reload deploy-server -class Fflanda-LINUX

Once this command has been run, the deployment server notifies the clients that are members of theFflanda-LINUX server class of the changed file. Since the change doesn't affect the Fflanda-WIN

305server class, its members don't need to know about it.

306Set up distributed searchWhat is distributed search?What is distributed search?

In distributed search, Splunk servers send search requests to other Splunk servers and merge theresults back to the user. In a typical scenario, one Splunk server searches indexes on several otherservers.

These are some of the key use cases for distributed search:

• Horizonal scaling for enhanced performance. Distributed search provides horizontal scaling by distributing the indexing and searching loads across multiple indexers, making it possible to search and index large quantities of data.

• Access control. You can use distributed search to control access to indexed data. In a typical situation, some users, such as security personnel, might need access to data across the enterprise, while others need access only to data in their functional area.

• Managing geo-dispersed data. Distributed search allows local offices to access their own data, while maintaining centralized access at the corporate level. Chicago and San Francisco can look just at their local data; headquarters in New York can search its local data, as well as the data in Chicago and San Francisco.

• Maximizing data availability. Distributed search, combined with load balancing and cloning from forwarders, is a key component of high availability solutions.

The Splunk instance that does the searching is referred to as the search head. The Splunk instancesthat do the indexing are called search peers or indexer nodes. Together, the search head andsearch peers constitute the nodes in a distributed search cluster.

A search head can also index and serve as a search peer. However, in performance-based usecases, such as horizontal scaling, it is recommended that the search head only search and not index.In that case, it is referred to as a dedicated search head.

A search head by default runs its searches across all search peers in its cluster. You can limit asearch to one or more search peers by specifying the splunk_server field in your query. SeeSearch across one or more distributed servers in the User manual.

Some search scenarios

This diagram shows a simple distributed search scenario for horizontal scaling, with one search headsearching across three peers:

307In this diagram showing a distributed search scenario for access control, a "security" departmentsearch head has visibility into all the indexing search peers. Each search peer also has the ability tosearch its own data. In addition, the department A search peer has access to both its data and thedata of department B:

Finally, this diagram shows the use of load balancing and distributed search to provide highavailability access to data:

For more information on load balancing, see Set up load balancing in this manual.

For information on Splunk distributed searches and capacity planning, see Dividing up indexing andsearching in the Installation manual.

308What search heads send to search peers

When initiating a distributed search, the search head distributes its knowledge objects to its searchpeers. Knowledge objects include saved searches, event types, and other entities used in searchingacross indexes. The search head needs to distribute this material to its search peers so that they canproperly execute queries on its behalf. See What is Splunk knowledge? for detailed information onknowledge objects.

The indexers use the search head's knowledge to execute queries on its behalf. When executing adistributed search, the indexers are ignorant of any local knowledge objects. They have access onlyto the search head's knowledge.

The process of distributing search head knowledge means that the indexers by default receive nearlythe entire contents of all the search head's apps. This set of data is referred to as the distributedbundle. If an app contains large binaries that do not need to be shared with the indexers, you canreduce the size of the bundle by means of the [replicationWhitelist] stanza indistsearch.conf. See Limit distributed bundle size in this manual.

The distributed bundle gets distributed to $SPLUNK_HOME/var/run/searchpeers/ on each

search peer.

Because the search head distributes its knowledge, search scripts should not hardcode paths toresources. The distributed bundle will reside at a different location on the search peer's file system,so hardcoded paths will not work properly.

User authorization

All authorization for a distributed search originates from the search head. At the time it sends thesearch request to its search peers, the search head also distributes the authorization information. Ittells the search peers the name of the user running the search, the user's role, and the location of thedistributed authorize.conf file containing the authorization information.

Licenses for distributed deployments

Each indexer node in a distributed deployment requires a unique license key.

Search heads performing no indexing or only summary indexing can use the forwarder license. If thesearch head performs any other type of indexing, it requires a unique key.

See Search head license in the Installation manual for a detailed discussion of licensing issues.

Cross-version compatibility

All search nodes must be running Splunk 4.x to participate in the distributed search. Distributedsearch is not backwards or forwards compatible with Splunk 3.x.

Install a dedicated search head

309Install a dedicated search head

Distributed search is enabled by default on every Splunk instance, with the exception of forwarders.This means that every Splunk server can function as a search head to a specified group of indexers,referred to as search peers.

In some cases, you might want a single Splunk instance to serve as both a search head and a searchpeer. In other cases, however, you might want to set up a dedicated search head. A dedicatedsearch head performs only searching; it does not do any indexing. If you do not intend to perform anyindexing on your search head, you can license it as a dedicated search head.

Note: If you do want to use a Splunk instance as both a search head and a search peer, or otherwiseperform indexing on the search head, just install the search head as a regular Splunk instance with anormal license, as described in "About Splunk licenses" in the Installation manual.

To install a dedicated search head, follow these steps:

1. Determine your hardware needs by reading this topic in the Installation manual.

2. Install Splunk, as described in the topic in the Installation manual specific to your operating system.

3. Install a dedicated search head license (identical to a forwarder license), as described here in theInstallation manual.

Note: Use the forwarder license for dedicated search heads only. If the search head also performsindexing, it will need a full Splunk license.

4. Establish distributed search from the search head to all the indexers, or "search peers", you want itto search. See "Configure distributed search" for how to do this.

5. Log in to your search head and do a search for *. Look at your results and check thesplunk_server field. Veryify that all your search peers are listed in that field.

6. Set up the authentication method you want to use on the search head, just as you would for anyother Splunk instance. Do not set up any indexing on your search head, since that will violate itslicense.

Configure distributed search

Configure distributed search

Distributed search is available on every Splunk server, with the exception of forwarders. This meansthat every Splunk server can function as a search head to a specified group of indexers, referred toas search peers. The distributed search capability is enabled by default.

To activate distributed search, a Splunk instance that you designate as a search head just needs toadd search peers. Ordinarily, you do this by specifying each search peer manually. A number of otherconfiguration options are available, but ordinarily you do not need to alter their default values.

310No configuration is necessary on the search peers themselves to make them available to searchheads. Access is controllable through public key authentication.

• A search head must maintain a list of search peers, or it will have nothing to search on. A dedicated search head does not have additional data inputs, beyond the default ones. • A search peer must have specified data inputs, or it will have nothing to index. A search peer does not maintain a list of other search peers.

These roles are not necessarily distinct. A Splunk instance can, and frequently does, functionsimultaneously as both a search head and a search peer.

You can set up a distributed search head via Splunk Web, the Splunk CLI, or by editing thedistsearch.conf configuration file. Splunk Web is the recommended configuration method formost purposes.

Use Splunk Web

The main step in configuring a distributed search head is to specify its search peers. You can addthese either manually (the usual case) or by enabling automatic discovery.

4. Specify "Yes" for the option: "Broadcast to other Splunk servers?"

5. Change any other settings as needed and click Save.

Use the CLI

To use Splunk's CLI, navigate to the $SPLUNK_HOME/bin/ directory and use the ./splunkcommand.

Enable distributed search

Distributed search is enabled by default, so this step is ordinarily not required:

splunk enable dist-search -auth admin:changeme

Add a search peer manually

Use the splunk add search-server command to add a search peer. When you run thiscommand, make sure you specify the splunkd management port of the peer server. By default, this is8089, although it might be different for your deployment.

Be sure to provide credentials for both the local and the remote machines. Use the -auth flag foryour local credentials and the -remoteUsername and -remotePassword flags to specify theremote credentials (in this example, for search peer 10.10.10.10):

Specify automatic discovery

splunk enable discoverable -auth admin:changeme

Each of these commands will elicit a response from the server indicating success and the need torestart the server for the changes to take effect.

312Edit distsearch.conf

In most cases, the settings available through Splunk Web provide sufficient options for configuringdistributed search environments. Some advanced configuration settings, however, are only availablethrough distsearch.conf. Edit this file in $SPLUNK_HOME/etc/system/local/, or your owncustom application directory in $SPLUNK_HOME/etc/apps/.

For the detailed specification and examples, see distsearch.conf.

For more information on configuration files in general, see "About configuration files".

Note: As of 4.1, the blacklistNames and blacklistURLs attributes no longer have any effect ondistributed search behavior.

Distribute the key files

If you add search peers via Splunk Web or the CLI, Splunk automatically handles authentication.However, if you add peers by editing distsearch.conf, you must distribute the key files manually.

After enabling distributed search on a Splunk instance (and restarting the instance), you will find thekeys in this location: $SPLUNK_HOME/etc/auth/distServerKeys/

Distribute the file $SPLUNK_HOME/etc/auth/distServerKeys/trusted.pem on the search

head to $SPLUNK_HOME/etc/auth/distServerKeys/<searchhead_name>/trusted.pem onthe indexer nodes.

Support for keys from multiple Splunk instances

Any number of Splunk instances can have their certificates stored on other instances forauthentication. The instances can store keys in$SPLUNK_HOME/etc/auth/distServerKeys/<peername>/trusted.pem

For example, if you have Splunk search heads A and B and they both need to search Splunk indexnode C, do the following:

1. On C, create $SPLUNK_HOME/etc/auth/distServerKeys/A/ and

Limit distributed bundle size

The distributed knowledge bundle is the data that the search head replicates to each search peer toenable its searches. For information on the contents and purpose of this bundle, see "What searchheads send to search peers".

313To limit the size of the distributed bundle, you can create a replication whitelist. To do this, editdistsearch.conf and specify a [replicationWhitelist] stanza:

[replicationWhitelist]<name> = <whitelist_regex>...

All files that satisfy the whitelist regex will be included in the bundle that the search head distributes toits search peers. If multiple regex's are specified, the bundle will inlucde the union of those files.

In this example, the knowledge bundle will include all files with extensions of either ".conf" or ".spec":

[replicationWhitelist]allConf = *.confallSpec = *.spec

The names, such as allConf and allSpec, are used only for layering. That is, if you have both a globaland a local copy of distsearch.conf, the local copy can be configured so that it overrides onlyone of the regex's. For instance, assume that the example shown above is the global copy. Assumeyou then specify a whitelist in your local copy like this:

[replicationWhitelist]allConf = *.foo.conf

The two conf files will be layered, with the local copy taking precedence. Thus, the search head willdistribute only files that satisfy these two regex's:

allConf = *.foo.confallSpec = *.spec

For more information on attribute layering in configuration files, see "Attribute precedence" in thismanual.

Manage distributed server names

The name of each search head and search peer is determined by its serverName attribute, specifiedin server.conf. serverName defaults to the server's machine name.

In a distributed search cluster, all nodes must have unique names. The serverName has threespecific uses:

• For authenticating search heads. When search peers are authenticating a search head, they look for the search head's key file in /etc/auth/distServerKeys/<searchhead_name>/trusted.pem. • For identifying search peers in search queries. serverName is the value of the splunk_server field that you specify when you want to query a specific node. See "Search across one or more distributed search peers" in the User manual. • For identifying search peers in search results. serverName gets reported back in the splunk_server field.

314Note: serverName is not used when adding search peers to a search head. In that case, you identifythe search peers through their domain names or IP addresses.

The only reason to change serverName is if you have multiple instances of Splunk residing on asingle machine, and they're participating in the same distributed search cluster. In that case, you'llneed to change serverName to distinguish them.

Use distributed search

Use distributed search

From the user standpoint, specifying and running a distributed search is essentially the same asrunning any other search. Behind the scenes, the search head distributes the query to its searchpeers and consolidates the results when presenting them to the user.

Your users do not have the ability to specify which search peers participate in a search. They doneed to be aware of the distributed search configuration to troubleshoot.

Perform distributed searches

In general, you specify a distributed search through the same set of commands as for a local search.However, Splunk provides several additional commands and options to assist with controlling andlimiting a distributed search.

A search head by default runs its searches across all search peers in its cluster. You can limit asearch to one or more search peers by specifying the splunk_server field in your query. SeeSearch across one or more distributed servers in the User manual.

The search command localop is also of use in defining distributed searches. It enables you to limitthe execution of subsequent commands to the search head. See the description of localop in theSearch Reference for details and an example.

In addition, the lookup command provides a local argument for use with distributed searches. Ifset to true, the lookup occurs only on the search head; if false, the lookup occurs on the searchpeers as well. This is particularly useful for scripted lookups, which replicate lookup tables. See thedescription of lookup in the Search Reference for details and an example.

Troubleshoot the distributed search

This table lists some of the more common search-time error messages associated with distributedsearch:

Error message Meaning

status=down The specified remote peer is not available.status=not a splunk server The specified remote peer is not a Splunk server.duplicate license The specified remote peer is using a duplicate license.certificate mismatch Authentication with the specified remote peer failed.

When a user runs a search in Splunk, it is created as a "job" in the system. This job also includes theartifacts (like search results) that are returned by a given search. Users can pause and resurrect theirown jobs in the Job Manager. As an admin, you can manage the jobs of all users in the system.

To access the Jobs manager, click Jobs in the upper right of Splunk Web.

Note: The number of jobs shown in parentheses next to the Jobs link is the number of jobs that theuser you're logged in as is currently running, not the number of jobs running on the system as awhole, even if you're logged in as admin.

You can also manage jobs through the command line of your OS.

Restrict the jobs users can run

The way to restrict how many jobs a given user can run, and how much space their job artifacts cantake up is to define a role with these restrictions and assign them to it. You can do this at a very highlevel of granularity; each user in your system can have their own role.

Create a capability in a copy of authorize.conf in $SPLUNK_HOME/etc/system/local and

give it appropriate values of:

• srchDiskQuota: Maximum amount of disk space (MB) that can be taken by search jobs of a user that belongs to this role. • srchJobsQuota: Maximum number of concurrently running searches a member of this role can have.

For more information, refer to the topic about creating roles in this manual.

Autopause long-running jobs

To handle inadvertently long-running search jobs, Splunk provides an autopause feature. The featureis enabled by default only for summary dashboard clicks, to deal with the situation where usersmistakenly initiate "all time" searches.

When autopause is enabled for a particular search view, the search view includes an autopausecountdown field during a search. If the search time limit has been reached, an information window willappear to inform the user that the search has been paused. It offers the user the option of resumingor finalizing the search. By default, the limit before autopause is 30 seconds.

316Auto-pause is configurable only by view developers. It is not a system-wide setting nor is itconfigurable by role. The autopause feature can be enabled or disabled by editing the appropriateview. See How to turn off autopause in the Developer manual. Also, see the host, source, andsourcetypes links on the summary dashboard for examples of autopause implementation.

Manage jobs in Splunk Web

Manage jobs in Splunk Web

As the admin user, you can manage jobs run by all other users on the system. You can access thesejobs through the Jobs manager in Splunk Web.

Note: The number of jobs shown in parentheses next to the Jobs link is the number of jobs that theuser you're logged in as is currently running, not the number of jobs running on the system as awhole, even if you're logged in as admin.

The Jobs manager launches a pop-up window showing all the jobs currently running on the system.

Use the controls to save, pause, delete, resume, delete, and finalize jobs on the system. Select thecheckbox to the left of the item you want to act on, and click the relevant button at the bottom of thepage.

Unsaved search jobs expire within a set period of time after they complete. This means that theartifacts (including their results) are removed from the filesystem and cannot be retrieved unless

317you've explicitly saved the job.

The default lifespan for manually run search jobs is 15 minutes (search jobs resulting from scheduledsearches typically have much shorter lifespans). The Expires column tells you how much time eachlisted job has before it is deleted from the system. If you want to be able to review a search job afterthat expiration point, or share it with others, save it.

Manage jobs in the OS

Manage jobs in the OS

When Splunk is running a job, it will manifest itself as a process in the OS called splunkd search.You can use Manager to act on this job, but you can also manage the job's underlying processes atthe OS commandline as well.

There will be two processes for each search job; the second one is a 'helper' process used by the<codesplunkd</code> process to do further work as needed. The main job is the one using systemresources. The helper process will die on its own if you kill the main process.

The process info includes:

• the search string (search=)

• the job ID for that job (id=) • the ttl, or length of time that job's artifacts (the output it produces) will remain on disk and available (ttl=) • the user who is running the job (user=) • what role(s) that user belongs to (roles=)

When a job is running, its data is being written to

$SPLUNK_HOME/var/run/splunk/dispatch/<job_id>/ Scheduled jobs (scheduled savedsearches) include the saved search name as part of the directory name.

The value of ttl for a process will determine how long the data remains in this spot, even after youkill a job. When you kill a job from the OS, you might want to look at its job ID before killing it if youwant to also remove its artifacts.

318Use the filesystem to manage jobs

Splunk allows you to manage jobs via creation and deletion of items in that job's artifact directory:

• To cancel a job, go into that job's artifact directory create a file called 'cancel'. • To preserve that job's artifacts (and ignore its ttl setting), create a file called 'save'. • To pause a job, create a file called 'pause', and to unpause it, delete the 'pause' file.

319Use Splunk's command line interface (CLI)About the CLIAbout the CLI

You can use the Splunk CLI to monitor, configure, and execute searches on your Splunk server. YourSplunk role configuration dictates what actions (commands) you can execute. Most actions requireyou to be a Splunk administrator.

How to access the CLI

To access Splunk CLI, you need either:

• Shell access to a Splunk server, or

• Permission to access the correct port on a remote Splunk server.

If you have administrator or root privileges you can simplify CLI usage by adding the top leveldirectory of your Splunk installation to your shell path. The $SPLUNK_HOME variable refers to the toplevel directory. Set a SPLUNK_HOME environment variable and add $SPLUNK_HOME/bin to yourshell's path.

This example works for Linux/BSD/Solaris users who installed Splunk in the default location:

# export SPLUNK_HOME=/opt/splunk# export PATH=$SPLUNK_HOME/bin:$PATH

This example works for Mac users who installed splunk in the default location:

If you have administrator privileges, you can use the CLI not only to search but also to configure andmonitor your Splunk server (or servers). The CLI commands used for configuring and monitoringSplunk are not search commands. Search commands are arguments in the search and dispatchCLI commands.

You can find all CLI documentation in the CLI help reference. For the list of CLI commands, type:

./splunk help commands

Or, access the help page about Splunk search commands with:

./splunk help search-commands

• For more information, see "Get help with the CLI".

320 • For details on syntax for searching using the CLI, refer to "About CLI Searches" in the Search Reference Manual.

Note for Mac users

Mac OS X requires you to have superuser level access to run any command that accesses systemfiles or directories. Run CLI commands using sudo or "su -" for a new shell as root. Therecommended method is to use sudo. (By default the user "root" is not enabled but any administratoruser can use sudo.)

Answers

Have questions? Visit Splunk Answers and see what questions and answers the Splunk communityhas around using the CLI.

Get help with the CLI

Get help with the CLI

Find a complete CLI help reference by using the command help.

Access the default CLI help page by typing the following in the command line while Splunk is running:

./splunk help

Access help on a specific CLI command, or topic by typing:

./splunk help command name | topic name

For example, access a help page about Splunk CLI commands:

./splunk help commands

Or, access a help page about Splunk search commands:

./splunk help search-commands

Note: Notice the dash (-) between the words: "search" and "commands". This is because the SplunkCLI interprets spaces as breaks. Use dashes between multiple words for topic names that are morethan one word.

Working with the CLI on Windows

To access CLI help and run CLI commands on Windows, be sure to run cmd.exe as administratorfirst. It's also not necessary to type the ./ before each "splunk" command.

Some commands require authentication or target host information

Use the auth and uri parameters with any CLI command.

321auth

Use auth with commands that require authentication to execute. auth is useful if you need to run acommand that requires different permissions to execute than the currently logged in user has.

Note: auth must be the last parameter specified in a CLI command argument.

Syntax:

./splunk command object [-parameter value]... -auth username:password

uri

Use uri to send commands to another Splunk server.

Syntax:

./splunk command object [-parameter value]... -uri specified-server

Specify the target Splunk server with the following format:

[http|https]://name_of_server:management_port

Example: The following example returns search results from the remote "splunkserver" on port 8089.

Search a remote server

For details on syntax for searching using the CLI, refer to "About CLI searches" in the SearchReference Manual.

View apps installed on a remote server

The following example returns the list of apps that are installed on the remote "splunkserver".

./splunk display app -uri https://splunkserver.8089

Change your default URI value

You can set a default URI value using the SPLUNK_URI environment variable. If you change thisvalue to be the URI of the remote server, you do not need to include the uri parameter each timeyou want to access that remote server.

Type "help [object|topic]" to get help on a specific object, or topic.

325Configuration file referenceadmon.confadmon.conf

The following are the spec and example files for admon.conf.

admon.conf.spec

# Copyright (C) 2005-2010 Splunk Inc. All Rights Reserved. Version 4.1.5## This file contains potential attribute/value pairs to use when configuring Windows active# directory monitoring.## To learn more about configuration files (including precedence) please see the documentation# located at http://www.splunk.com/base/Documentation/latest/Admin/Aboutconfigurationfiles

[<stanza name>] * There can be multiple configuration for any given Domain Controller, so this is a unique name related to that particular set of configuration.

targetDC = <string> * Fully qualified domain name. This can also be empty, which then it will obtain the local computer DC and bind to its root DN.

startingNode = <string> * Specify a path to the directory tree in AD where to start monitoring, or else if left empty it will start at the root of the directory tree

monitorSubtree = <int 0|1>

* Given the DC path, monitor subtree instead of a single level

disabled = <in 0|1>

* Enables or disables this particular configuration

admon.conf.example

# Copyright (C) 2005-2010 Splunk Inc. All Rights Reserved. Version 4.1.5## This file contains an example configuration for monitoring changes# to the Windows active directory monitor. Refer to admon.conf.spec for details.# The following is an example of a active directory monitor settings.## To use one or more of these configurations, copy the configuration block into# admon.conf in $SPLUNK_HOME/etc/apps/windows/local/. You must restart Splunk to enable configu## To learn more about configuration files (including precedence) please see the documentation# located at http://www.splunk.com/base/Documentation/latest/Admin/Aboutconfigurationfiles

[default]monitorSubtree = 1disabled = 0

[NearestDC]

326targetDc =startingNode =

alert_actions.confalert_actions.conf

The following are the spec and example files for alert_actions.conf.

alert_actions.conf.spec

# Copyright (C) 2005-2010 Splunk Inc. All Rights Reserved. Version 4.1.5## This file contains possible attributes and values for configuring global saved search actions# in alert_actions.conf. Saved searches are configured in savedsearches.conf.## There is an alert_actions.conf in $SPLUNK_HOME/etc/system/default/. To set custom configurat# place an alert_actions.conf in $SPLUNK_HOME/etc/system/local/. For examples, see# alert_actions.conf.example. You must restart Splunk to enable configurations.## To learn more about configuration files (including precedence) please see the documentation# located at http://www.splunk.com/base/Documentation/latest/Admin/Aboutconfigurationfiles

################################################################################# Global options: these settings do not need to be prefaced by a stanza name# If you do not specify an entry for each attribute, Splunk will use the default value.################################################################################

maxresults = <int> * Set the global maximum number of search results sent via alerts. * Defaults to 100.

hostname = <string> * Set the hostname that is displayed in the link sent in alerts. * This is useful when the machine sending the alerts does not have a FQDN. * Defaults to current hostname (set in Splunk) or localhost (if none is set).

maxtime = <int>[mshd] * the maximum amount of time the execution of an action should be allowed before the action is * Defaults to 5m * Defaults to 1m for: rss

################################################################################# EMAIL: these settings are prefaced by the [email] stanza name################################################################################

[email] * Set email notification options under this stanza name. * Follow this stanza name with any number of the following attribute/value pairs.

327 * If you do not specify an entry for each attribute, Splunk will use the default value.

reportServerURL = <url> * The URL of the PDF report server, if one is setup and available on the network * For a default locally installed report server, the url would be http://localhost:8091/ * Defaults to false

preprocess_results = <search-string> * a search string to preprocess results before emailing them. Usually the pre processing * consists of filtering out unwanted internal field * Defaults to empty string

################################################################################# RSS: these settings are prefaced by the [rss] stanza################################################################################

[rss] * Set rss notification options under this stanza name. * Follow this stanza name with any number of the following attribute/value pairs. * If you do not specify an entry for each attribute, Splunk will use the default value.

328################################################################################# script:################################################################################[script]command = <string> * command template to be realized with information from the saved search that * triggered the script action.

################################################################################# summary_index: these settings are prefaced by the [summary_index] stanza################################################################################[summary_index]command = <string> * command template to be realized with information from the saved search that * triggered the summary indexing action.

################################################################################# populate_lookup: these settings are prefaced by the [populate_lookup] stanza################################################################################[populate_lookup]command = <string> * command template to be realized with information from the saved search that * triggered the populate lookup action.

alert_actions.conf.example

# Copyright (C) 2005-2010 Splunk Inc. All Rights Reserved. Version 4.1.5## This is an example alert_actions.conf. Use this file to configure alert actions for saved se## To use one or more of these configurations, copy the configuration block into alert_actions.c# in $SPLUNK_HOME/etc/system/local/. You must restart Splunk to enable configurations.## To learn more about configuration files (including precedence) please see the documentation# located at http://www.splunk.com/base/Documentation/latest/Admin/Aboutconfigurationfiles

[email]# keep the search artifacts around for 24 hoursttl = 86400

# if no @ is found in the address the hostname of the current machine is appended

# keep the search artifacts around for 24 hours

# make sure the following keys are not added to marker (command, ttl, maxresults, _*)command = summaryindex addtime=true index="$action.summary_index._name{required=yes}$" file="$n

app.confapp.conf

The following are the spec and example files for app.conf.

app.conf.spec

# Copyright (C) 2005-2010 Splunk Inc. All Rights Reserved. Version 4.1.5## This file contains defines configuration for a custom app inside Splunk.# There is no default app.conf. Instead, an app.conf is placed inside the /default# directory of each app. For examples, see app.conf.example.# You must restart Splunk to reload manual changes to app.conf.## To learn more about configuration files (including precedence) please see the documentation# located at http://www.splunk.com/base/Documentation/latest/Admin/Aboutconfigurationfiles

## Define how this app appears in Launcher (and online on Splunkbase)#

[launcher]

version = <version string>

* Version numbers are a number followed by a sequence of numbers or dots.* Pre-release versions can append a space and a single-word suffix like "beta2". Examples:* 1.2* 11.0.34* 2.0 beta* 1.3 beta2* 1.0 b2* 12.4 alpha* 11.0.34.234.254

330description = <string>* short explanatory string which is displayed underneath the app's title in Launcher.* descriptions should be 200 characters or less, becase most users won't read long descriptions

author = <name>* for apps you intend to post to Splunkbase, enter the username of your splunk.com account* for internal-use-only apps, include your full name and/or contact info (e.g. email

# Your app can include an icon which will show up next to your app# in Launcher and on Splunkbase. You can also include a screenshot,# which will show up on Splunkbase when the user views info about your# app before downloading it. Icons are recommended, although not required.# Screenshots are optional.## There is no setting in app.conf for these images. Instead, images for# icon and screenshot should be placed in the /appserver/static dir of# your app. They will automatically be detected by Launcher and Splunkbase.# eg. /appserver/static/appIcon.png, (the capital "I" is required!)# /appserver/static/screenshot.png# app icon image must be 36px x 36px, in PNG format# app screenshot must be 623px x 350px, in PNG format

## [package] defines upgrade-related metadata, and will be# used in future versions of Splunk to streamline app upgrades.#

[package]

id = <appid>* id should be omitted for internal-use-only apps which are not intended to be uploaded to Splunkbase* id is required for all new apps uploaded to Splunkbase. Future versions of Splunk will use appid to correlate locally-installed apps and the same app on Splunkbase (e.g. to notify users about app updates)* id must be the same as the folder name in which your app lives in $SPLUNK_HOME/etc/apps* id must adhere to cross-platform folder-name restrictions: - must contain only letters, numbers, "." (dot), and "_" (underscore) characters - must not end with a dot character - must not be any of the following names: CON, PRN, AUX, NUL, COM1, COM2, COM3, COM4, COM5, COM6, COM7, COM8, COM9, LPT1, LPT2, LPT3, LPT4, LPT5, LPT6, LPT7, LPT8, LPT9

## Set install settings for this app#

[install]

state = disabled | enabled

* Set whether app is disabled or enabled.* If an app is disabled, its configs are ignored.* By default, apps are enabled.

state_change_requires_restart = true | false

* Set whether removing/enabling/disabling app requires a restart of Splunk.

331* Defaults to true.

is_configured = true | false

* whether the application's custom setup has been performed* Defaults to false

build = integer-build-number* must increment this whenever you change files in /appserver/static* every release must change both "version" and "build" settings* ensures browsers don't see cached copies of old static files in* new versions of your app* note that build is a single integer, unlike version which can be* a more complex string like 1.5.18

## Set UI-specific settings for this app#

[ui]is_visible = true | false* Indicates if this app should be visible/navigable as a UI app* Apps require at least 1 view to be available from the UI

is_manageable = true | false

* Indicates if Splunk Manager should be used to manage this app* Defaults to true

label = <string>* Defines the name of the app shown in the Splunk GUI and Launcher* Must be between 5 and 80 characters.* Must not include "Splunk For" prefix.* Label is required.* Examples of good labels:* IMAP* SQL Server Integration Services* FISMA Compliance

## Set custom configuration file settings for this app#

[config:$STRING]* Name your stanza.* Preface with config:.* Set $STRING to any arbitrary identifier.

targetconf = <$CONFIG_FILE>* Target configuration file for changes.* There can be only one.* Any configuration file that is included in the application.* For example indexes, for indexes.conf.

targetstanza = <$STANZA_NAME>* Stanza name from application.

targetkey = <$ATTRIBUTE>* Attribute to set.

332targetkeydefault = <$VALUE>* Default setting for attribute.* Can be empty for no default.

app.conf.example

# Copyright (C) 2005-2010 Splunk Inc. All Rights Reserved. Version 4.1.5## The following are example app.conf configurations. Configure properties for your custom appli## There is NO DEFAULT app.conf.## To use one or more of these configurations, copy the configuration block into# props.conf in $SPLUNK_HOME/etc/system/local/. You must restart Splunk to enable configuration## To learn more about configuration files (including precedence) please see the documentation# located at http://www.splunk.com/base/Documentation/latest/Admin/Aboutconfigurationfiles

[launcher]author=<author of app>description=<textual description of app>version=<version of app>

audit.confaudit.conf

The following are the spec and example files for audit.conf.

333audit.conf.spec

# Copyright (C) 2005-2010 Splunk Inc. All Rights Reserved. Version 4.1.5## This file contains possible attributes and values you can use to configure auditing# and event signing in audit.conf.## There is NO DEFAULT audit.conf. To set custom configurations, place an audit.conf in# $SPLUNK_HOME/etc/system/local/. For examples, see audit.conf.example. You must restart# Splunk to enable configurations.## To learn more about configuration files (including precedence) please see the documentation# located at http://www.splunk.com/base/Documentation/latest/Admin/Aboutconfigurationfiles

[eventHashing] * This stanza turns on event hashing -- every event is SHA256 hashed. * The indexer will encrypt all the signatures in a block. * Follow this stanza name with any number of the following attribute/value pairs.

filters=mywhitelist,myblacklist... * (Optional) Filter which events are hashed. * Specify filtername values to apply to events. * NOTE: The order of precedence is left to right. Two special filters are provided by d * blacklist_all and whitelist_all, use them to terminate the list of your filters. For * if your list contains only whitelists, then terminating it with blacklist_all will re * signing of only events that match any of the whitelists. The default implicit filter * terminator is whitelist_all

privateKey=/some/path/to/your/private/key/private_key.pempublicKey=/some/path/to/your/public/key/public_key.pem * You must have a private key to encrypt the signatures and a public key to decrypt the * Set a path to your own keys * Generate your own keys using openssl in $SPLUNK_HOME/bin/.

queuing=<true | false> * Turn off sending audit events to the indexQueue -- tail the audit events instead. * If this is set to 'false', you MUST add an inputs.conf stanza to tail the audit log. * Defaults to 'true.'

audit.conf.example

# Copyright (C) 2005-2010 Splunk Inc. All Rights Reserved. Version 4.1.5## This is an example audit.conf. Use this file to configure auditing and event hashing.## There is NO DEFAULT audit.conf.## To use one or more of these configurations, copy the configuration block into audit.conf# in $SPLUNK_HOME/etc/system/local/. You must restart Splunk to enable configurations.## To learn more about configuration files (including precedence) please see the documentation# located at http://www.splunk.com/base/Documentation/latest/Admin/Aboutconfigurationfiles

# If this stanza exists, audit trail events will be cryptographically signed.

# You must have a private key to encrypt the signatures and a public key to decrypt them.# Generate your own keys using openssl in $SPLUNK_HOME/bin/.

# EXAMPLE #1 - hash all events:

[eventHashing]

# This performs a SHA256 hash on every event other than ones going the _audit index (which are# handled their own way).# NOTE: All you need to enable hashing is the presence of the stanza 'eventHashing'.

# EXAMPLE #3 - multiple blacklisting

# DO NOT hash all events with the following, sources, sourcetypes and hosts - they are all# blacklisted. All other events are hashed.

# EXAMPLE #4 - whitelisting

[filterspec:event_whitelist:mywhitelist]sourcetype=syslog#source=aa, bb (these can be added as well)#host=xx, yy

[filterspec:event_blacklist:nothingelse]#The 'all' tag is a special boolean (defaults to false) that says match *all* eventsall=True

[eventSigning]filters=mywhitelist, nothingelse

# Hash ONLY those events which are of sourcetype 'syslog'. All other events are NOT hashed.# Note that you can have a list of filters and they are executed from left to right for every e# If an event passed a whitelist, the rest of the filters do not execute. Thus placing# the whitelist filter before the 'all' blacklist filter says "only hash those events which# match the whitelist".

authentication.confauthentication.conf

The following are the spec and example files for authentication.conf.

authentication.conf.spec

# Copyright (C) 2005-2010 Splunk Inc. All Rights Reserved. Version 4.1.5## This file contains possible attributes and values for configuring authentication via# authentication.conf.## There is an authentication.conf in $SPLUNK_HOME/etc/system/default/. To set custom configura# place an authentication.conf in $SPLUNK_HOME/etc/system/local/. For examples, see# authentication.conf.example. You must restart Splunk to enable configurations.#

336# To learn more about configuration files (including precedence) please see the documentation# located at http://www.splunk.com/base/Documentation/latest/Admin/Aboutconfigurationfiles

[authentication] * Follow this stanza name with any number of the following attribute/value pairs.

authSettings = <string> * Key to look up the specific configurations of chosen authentication system. * <string> is the name of the stanza header [<authSettingsKey>]. * This is used by LDAP and Scripted Authentication.

###################### LDAP settings#####################

[<authSettings-key>] * Follow this stanza name with the following attribute/value pairs.

port = <integer> * OPTIONAL - The port that Splunk should use to connect to your LDAP server. * Defaults to port 389 for non-SSL and port 636 for SSL

bindDN = <string> * Distinguished name of the user that will be retrieving the LDAP entries * This user needs to have read access to all LDAP users and groups you wish to use in Splun * Optional, but usually required due to LDAP security settings. * Leave this blank if your LDAP entries can be retrieved with anonymous bind

userBaseFilter = <string> * OPTIONAL - The LDAP search filter you wish to use when searching for users * Highly recommended, especially when there are many entries in your LDAP user subtrees * When used properly, search filters can significantly speed up LDAP queries * Example that matches users in the IT or HR department: * userBaseFilter = (|(department=IT)(department=HR)) * See RFC 2254 for more detailed information on search filter syntax * This defaults to no filtering.

337groupBaseDN = <string> * REQUIRED - Distinguished names of LDAP entries whose subtrees contain the groups * Enter a ';' delimited list to search multiple trees. * If your LDAP environment does not have group entries, there is a configuration that can t * Set groupBaseDN to the same as userBaseDN, which means you will search for groups in * Next, set the groupMemberAttribute and groupMappingAttribute to the same attribute as * This means the entry, when treated as a group, will use the username value as its * For clarity, you should probably also set groupNameAttribute to the same as userNameA

groupBaseFilter = <string> * OPTIONAL - The LDAP search filter you wish to use when searching for groups * Like userBaseFilter, this is highly recommended to speed up LDAP queries * See RFC 2254 for more information * This defaults to no filtering

userNameAttribute = <string> * REQUIRED - User entry attribute whose value is the username * NOTE: This attribute should use case insensitive matching for its values, and the values * Users are case insensitive in Splunk * In Active Directory, this is 'sAMAccountName' * A typical attribute for this is 'uid'

groupMappingAttribute = <string> * OPTIONAL - User entry attribute whose value is used by group entries to declare membershi * Groups are often mapped with user DN, so this defaults to 'dn' * Set this if groups are mapped using a different attribue * Usually only needed for OpenLDAP servers. * A typical attribute used to map users to groups is 'uid' * For example, assume a group declares that one of its members is 'splunkuser' * This implies that every user with 'uid' value 'splunkuser' will be mapped to that

groupNameAttribute = <string> * REQUIRED - Group entry attribute whose value stores the group name * A typical attribute for this is 'cn' (common name) * Recall that if you are configuring LDAP to treat user entries as their own group, user en

groupMemberAttribute = <string> * REQUIRED - Group entry attribute whose values are the groups members * Typical attributes for this are 'member' and 'memberUid' * For example, consider the groupMappingAttribute example above using groupMemberAttribute * To declare 'splunkuser' as a group member, its attribute 'member' must have the value

charset = <string> * OPTIONAL - ONLY set this for an LDAP setup that returns non-UTF-8 encoded data. LDAP is s * Follows the same format as CHARSET in props.conf (see props.conf.spec) * An example value would be "latin-1"

###################### Map roles#####################

[roleMap] * Follow this stanza name with several Role to Group mappings as defined below.

<RoleName> = <LDAP group string>

338 * Maps a Splunk role (from authorize.conf) to LDAP groups * This list is semi-colon delimited (no spaces). * List several of these attribute value pairs to map all Splunk roles to Groups

###################### Scripted authentication#####################

[<authSettings-key>] * Follow this stanza name with the following attribute/value pairs:

scriptPath = <string> * REQUIRED - Full path to the script, including the path to the program that runs it (p * ex: "$SPLUNK_HOME/bin/python" "$SPLUNK_HOME/etc/system/bin/$MY_SCRIPT" * Note that if a path contains spaces, it must be quoted. Our example above handles the

scriptSearchFilters = <boolean> * OPTIONAL - Only set this to 1 to call the script to add search filters. * 0 disables (default)

# Cache timing:# Use these settings to adjust the maximum frequency at which Splunk calls your script function# Caching is disabled by default

# All timeouts can be expressed in seconds or as a search-like time range

getUsersTTL = <time range string>

* Timeout for getUsers.

userLoginTTL = <time range string>

* Timeout for userLogin.

authentication.conf.example

# Copyright (C) 2005-2010 Splunk Inc. All Rights Reserved. Version 4.1.5## This is an example authentication.conf. Use this file to configure LDAP or toggle between LD# and Splunk's native authentication system.## To use one or more of these configurations, copy the configuration block into authentication.# in $SPLUNK_HOME/etc/system/local/. You must restart Splunk to enable configurations.## To learn more about configuration files (including precedence) please see the documentation# located at http://www.splunk.com/base/Documentation/latest/Admin/Aboutconfigurationfiles