Extract fields from files with structured data

Many structured data files, such as comma-separated value (CSV) files and Internet Information Server (IIS) web server logs, have information in the file header that can be extracted as fields during indexing. You can configure Splunk Enterprise and the Splunk universal forwarder to automatically extract these values into fields that can be searched. For example, a CSV file starts with a header row that contains column headers for the values in subsequent rows:

Use Splunk Web to extract fields from structured data files

When you upload or monitor a structured data file, Splunk Web loads the "Set Source type" page. This page lets you preview how your data will be indexed. See The 'Set Source type' page.

From the Add Data page in Splunk Web, choose Upload or Monitor as the method that you want to add data.

Specify the structured data file that you want the software to monitor. Splunk Web loads the "Set Source type" page. It sets the source type of the data based on its interpretation of that data. For example, if you upload a CSV file, it sets the source type to csv.

Review the events in the preview pane on the right side of the page. The events are formatted based on the current source type.

If the events appear to be formatted correctly, click "Next" to proceed to the "Modify input settings" page. Otherwise, configure event formatting by modifying the timestamp, event breaking, and delimited settings until the previewed events look the way that you want.

If you don't want to save the settings as a new source type, return to Step 4. Otherwise, click the Save As button to save the settings as a new source type.

In the dialog that appears, type in a name and description for the new source type.

Select the category for the source type by selecting the category you want from the "Category" drop-down.

Select the application context that the new source type should apply to by choosing from the entries in the "App" drop-down.

Click "Save" to save the source type.

Return to Step 4 to proceed to the "Modify input settings" page.

Structured data files with large numbers of columns might not display all extracted fields in Splunk Search

If you index a structured data file with a large number of columns (for example, a CSV file with 300 columns), you might experience a problem later where the Search app does not appear to return or display all of the fields for that file. While Splunk software has indexed all of the fields correctly, this anomaly occurs because of a configuration setting for how Splunk software extracts the fields at search time.

Before Splunk software displays fields in Splunk Web, it must first extract those fields by performing a search time field extraction. By default, the limit for the number of fields that can be extracted automatically at search time is 100. You can set this number higher by editing the limits.conf file in $SPLUNK_HOME/etc/system/local and changing the limit setting to a number that is higher than the number of columns in the structured data file.

[kv]
limit = 300

If you work with a lot of large CSV files, you might want to configure the setting to a number that reflects the largest number of columns you expect your structured data files to have.

Use configuration files to enable automatic header-based field extraction

You can also use a combination of inputs.conf and props.conf to extract fields from structured data files. Edit these files in $SPLUNK_HOME/etc/system/local/ or in your own custom application directory in $SPLUNK_HOME/etc/apps/<app_name>/local. Inputs.conf specifies the files you want to monitor and the source type to be applied to the events they contain, and props.conf defines the source types themselves. If you have Splunk Enterprise, you can edit the settings on indexer machines or machines where you are running the Splunk universal forwarder. You must restart Splunk Enterprise for any changes that you make to inputs.conf and props.conf to take effect. If you have Splunk Cloud and want configure the extraction of fields from structured data, use the Splunk universal forwarder.

Props.conf attributes for structured data

To configure field extraction for files that contain headers, modify the following attributes in props.conf. For additional attributes in props.conf, review the props.conf specification file.

Attribute

Description

Default

INDEXED_EXTRACTIONS = <CSV|W3C|TSV|PSV|JSON>

Specifies the type of file and the extraction and/or parsing method to be used on the file.

Note: If you set INDEXED_EXTRACTIONS=JSON, check that you have not also set KV_MODE = json for the same source type, which would extract the JSON fields twice, at index time and again at search time.

n/a (not set)

PREAMBLE_REGEX

Some files contain preamble lines. This attribute contains a regular expression that Splunk software uses to ignore any matching lines.

n/a

FIELD_HEADER_REGEX

A regular expression that specifies a pattern for prefixed header line. Splunk software parses the first matching line into header fields. Note that the actual header starts after the matching pattern, which is not included in the parsed header fields. You can specify special characters in this attribute.

n/a

FIELD_DELIMITER

Specifies which character delimits or separates fields in the monitored file or source. You can specify special characters in this attribute.

n/a

FIELD_QUOTE

Specifies the character to use for quotes in the specified file or source. You can specify special characters in this attribute.

n/a

HEADER_FIELD_DELIMITER

Specifies which character delimits or separates field names in the header line. You can specify special characters in this attribute. If HEADER_FIELD_DELIMITER is not specified, FIELD_DELIMITER applies to the header line.

n/a

HEADER_FIELD_QUOTE

Specifies which character is used for quotes around field names in the header line. You can specify special characters in this attribute. If HEADER_FIELD_QUOTE is not specified, FIELD_QUOTE applies to the header line.

n/a

HEADER_FIELD_LINE_NUMBER

Specifies the line number of the line within the file that contains the header fields. If set to 0, Splunk attempts to locate the header fields within the file automatically.

0

TIMESTAMP_FIELDS = field1,field2,...,fieldn

Some CSV and structured files have their timestamp encompass multiple fields in the event separated by delimiters. This attribute tells Splunk software to specify all such fields which constitute the timestamp in a comma-separated fashion.

Splunk Enterprise tries to automatically extract the timestamp of the event.

FIELD_NAMES

Some CSV and structured files might have missing headers. This attribute specifies the header field names.

n/a

MISSING_VALUE_REGEX

If Splunk software finds data that matches the specified regular expression in the structured data file, it considers the value for the field in the row to be empty.

n/a

Special characters or values are available for some attributes

You can use special characters or values such as spaces, vertical and horizontal tabs, and form feeds in some attributes. The following table lists these characters:

Special value

Props.conf representation

form feed

\f

space

space or ' '

horizontal tab

\t or tab

vertical tab

\v

whitespace

whitespace

none

none or \0

file separator

fs or \034

group separator

gs or \035

record separator

rs or \036

unit separator

us or \037

You can use these special characters for the following attributes only:

FIELD_DELIMITER

FIELD_HEADER_REGEX

FIELD_QUOTE

Edit configuration files to create and reference source types

To create and reference the new source types to extract files with headers:

Define a new sourcetype by creating a stanza which tells Splunk Enterprise how to extract the file header and structured file data, using the attributes described above. You can define as many stanzas - and thus, as many sourcetypes - as you like in the file. For example:

Create a file inputs.conf in the same directory, if it does not already exist.

Open the file for editing.

Add a stanza which represents the file or files that you want Splunk Enterprise to extract file header and structured data from. You can add as many stanzas as you wish for files or directories from which you want to extract header and structured data. For example:

Optionally, if you need to transform this data in any way prior to indexing it, edit transforms.conf.

Restart the receiving instance.

Restart the monitoring instance.

On the receiving instance, use the Search app to confirm that the fields have been extracted from the structured data files and properly indexed.

Caveats to extracting fields from structured data files

Splunk software does not parse structured data that has been forwarded to an indexer

When you forward structured data to an indexer, it is not parsed when it arrives at the indexer, even if you have configured props.conf on that indexer with INDEXED_EXTRACTIONS. Forwarded data skips the following pipelines on the indexer, which precludes any parsing of that data on the indexer:

parsing

merging

typing

The forwarded data must arrive at the indexer already parsed.

Field extraction settings for forwarded structured data must be configured on the forwarder

If you want to forward fields that you extract from structured data files to another Splunk instance, you must configure the props.conf settings that define the field extractions on the forwarder that sends the data. This includes configuration of INDEXED_EXTRACTIONS and any other parsing, filtering, anonymizing, and routing rules. Performing these actions on the instance that indexes the data will have no effect, as the forwarded data must arrive at the indexer already parsed.

When you use Splunk Web to modify event break and time stamp settings, it records all of the proposed changes as a stanza for props.conf. You can find those settings in the "Advanced" tab on the "Set Source type" page.

Use the "Copy to clipboard" link in the "Advanced" tab to copy the proposed changes to props.conf to the system clipboard. You can then paste this stanza into props.conf in a text editor on Splunk instances that monitor and forward similar files.

Only header fields containing data are indexed

When Splunk software extracts header fields from structured data files, it only extracts those fields where data is present in at least one row. If the header field has no data in any row, it is skipped (that is, not indexed). Take, for example, the following csv file:

When Splunk software reads this file, it notes that the rows in the header4 column are all empty, and does not index that header field or any of the rows in it. This means that neither header4 nor any of the data in its row can be searched for in the index.

If, however, the header4 field contains rows with empty strings (for example, ""), the field and all the rows underneath it are indexed.

No support for mid-file renaming of header fields

Some software, such as Internet Information Server, supports the renaming of header fields in the middle of the file. Splunk software does not recognize changes such as this. If you attempt to index a file that has header fields renamed within the file, the renamed header field is not indexed.

Example configuration and data files

Following are an example inputs.conf and props.conf to give you an idea of how to use the file header extraction attributes.

To extract the data locally, edit inputs.conf and props.conf to define inputs and sourcetypes for the structured data files, and use the attributes described above to specify how to deal with the files. To forward this data to another Splunk instance, edit inputs.conf and props.conf on the forwarding instance, and props.conf on the receiving instance.

Enter your email address, and someone from the documentation team will respond to you:

Send me a copy of this feedback

Please provide your comments here. Ask a question or make a suggestion.

Feedback submitted, thanks!

You must be logged into splunk.com in order to post comments.
Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic.
If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk,
consider posting a question to Splunkbase Answers.

0
out of 1000 Characters

Your Comment Has Been Posted Above

We use our own and third-party cookies to provide you with a great online experience. We also use these cookies to improve our products and services, support our marketing campaigns, and advertise to you on our website and other websites. Some cookies may continue to collect information after you have left our website.
Learn more (including how to update your settings) here »