Splunk is a powerful platform for searching, analyzing, monitoring, visualizing and reporting of your enterprise data. It acquires important machine data and then converts it into powerful operational intelligence by giving real time insight to your data using alerts, dashboards and charts etc.

Or: What is Splunk? Why is Splunk used for analyzing machine data? This question will most likely be the first question you will be asked in any Splunk interview. You need to start by saying that:

Splunk is a platform which allows people to get visibility into machine data, that is generated from hardware devices, networks, servers, IoT devices and other sources

Splunk is used for analyzing machine data because it can give insights into application management, IT operations, security, compliance, fraud detection, threat visibility etc

Explain the working of Splunk ?

Ans: Splunk works into three phases –

First phase –it gathers data to solve your query from many sources as required.

Second phase –it converts that data into results that can solve your query.

Third phase –it displays the information/answers via a chart, report or graph, which is understood by large audiences.

What are the components of Splunk?

Ans:

Splunk has four important components :

Indexer –It indexes the machine data

Forwarder –Refers to Splunk instances that forward data to the remote indexers

Search Head –Provides GUI for searching

Deployment Server –Manages the Splunk components like indexer, forwarder, and search head in computing environment.

What are the types of Splunk forwarder?

Ans:

Splunk has two types of Splunk forwarder which are as follows:

Universal Forwarders –It performs processing on the incoming data before forwarding it to the indexer.

Heavy Forwarders –It parses the data before forwarding them to the indexer works as an intermediate forwarder, remote collector.

What are alerts in Splunk?

Ans:

An alert is an action that a saved search triggers on regular intervals set over a time range, based on the results of the search. When the alerts are triggered, various actions occur consequently.. For instance, sending an email when a search to the predefined list of people is triggered.Three types of alerts:

Pre-result alerts :Most commonly used alert type and runs in real-time for an all- time span. These alerts are designed such that whenever a search returns a result, they are triggered.

Scheduled alerts :The second most common- scheduled results are set up to evaluate the results of a historical search result running over a set time range on a regular schedule. You can define a time range, schedule and the trigger condition to an alert.

Rolling-window alerts:These are the hybrid of pre-result and scheduled alerts. Similar to the former, these are based on real-time search but do not trigger each time the search returns a matching result . It examines all events in real-time mapping within the rolling window and triggers the time that specific condition by that event in the window is met, like the scheduled alert is triggered on a scheduled search.

What are the categories of SPL commands?

Ans: SPL commands are divided into five categories:

Sorting Results –Ordering results and (optionally) limiting the number of results.

Filtering Results –It takes a set of events or results and filters them into a smaller set of results.

Reporting Results –Filtering out some fields to focus on the ones you need, or modifying or adding fields to enrich your results or events.

HADOOP

SPLUNK

Splunk collects , visualizes, and analyzes the data and passes it to hadoop for ETL and other batch processing

Splunk collects , visualizes, and analyzes the data and passes it to hadoop for ETL and other batch processing

Splunk collects , visualizes, and analyzes the data and passes it to hadoop for ETL and other batch processing

Splunk collects , visualizes, and analyzes the data and passes it to hadoop for ETL and other batch processing

Splunk collects , visualizes, and analyzes the data and passes it to hadoop for ETL and other batch processing

Splunk collects , visualizes, and analyzes the data and passes it to hadoop for ETL and other batch processing

What are common port numbers used by Splunk?

Ans:

Common ports numbers on which services are run (by default) are :

Service

Port Number

Splunk Management Port

8089

Splunk Index Replication Port

8080

KV store

8191

Splunk Web Port

8000

Splunk Indexing Port

9997

Splunk network port

514

What are Splunk buckets? Explain the bucket lifecycle ?

Ans:

A directory that contains indexed data is known as a Splunk bucket. It also contains events of a certain period. Bucket lifecycle includes following stages:

Hot –It contains newly indexed data and is open for writing. For each index, there are one or more hot buckets available

Warm –Data rolled from hot

Cold –Data rolled from warm

Frozen –Data rolled from cold. The indexer deletes frozen data by default but users can also archive it.

Thawed –Data restored from an archive. If you archive frozen data , you can later return it to the index by thawing (defrosting) it.

What command is used to enable and disable Splunk to boot start?

Ans:

To enable Splunk to boot start use the following command:

$SPLUNK_HOME/bin/splunk enable boot-start

To disable Splunk to boot start use the following command:

$SPLUNK_HOME/bin/splunk disable boot-start

What is eval command?

Ans:

It evaluates an expression and consigns the resulting value into a destination field. If the destination field matches with an already existing field name, the existing field is overwritten with the eval expression. This command evaluates Boolean , mathematical and string expressions.

Using eval command:

Convert Values

Round Values

Perform Calculations

User conditional statements

Format Values

What is lookup command and its use case?

Ans:

The lookup command adds fields based while looking at the value in an event, referencing a lookup table, and adding the fields in matching rows in the lookup table to your event.
Example

… | lookup usertogroup user as local_user OUTPUT group as user_group

What is inputlookup command?

Ans:

inputlookup command returns the whole lookup table as search results.
For example
…| inputlookup intellipaatlookup returns a search result for every row in the table intellipaatlookup which has two field values:

Explain outputlookup command?

Ans:

This command outputs the current search results to a lookup table on the disk.
For example

…| outputlookup intellipaattable.csv saves all the results into intellipaattable.csv.

What commands are included in filtering results category?

Ans:

where –Evaluates an expression for filtering results. If the evaluation is successful and the result is TRUE, the result is retained; otherwise, the result is discarded.

dedup –Removes subsequent results that match specified criteria.

head –Returns the first count results. Using head permits a search to stop retrieving events from disk when it finds the desired number of results.

tail –Unlike head command , this returns the last results

What commands are included in reporting results category?

Ans:

top –Finds most frequent tuple of values of all fields in the field list along with the count and percentage.

rare –Finds least frequent tuple of values of all fields in the field list.

stats –Calculates aggregate statistics over a dataset

chart –Creates tabular data output suitable for charting

timechart –Creates a time series chart with corresponding table of statistics.

What commands are included in grouping results category?

Ans:

transaction – Groups events that meet different constraints into transactions, where transactions are the collections of events possibly from multiple sources.

What is the use of sort command?

Ans:

It sorts search results by the specified fields.
Syntax:

sort [<count>] <sort-by-clause>… [desc]

Example:

… | sort num(ip), -str(url)

It sort results by ip value in ascending order whereas url value in descending order.

Explain the difference between search head pooling and search head clustering?

Ans:

Search head pooling is a group of connected servers that are used to share load, Configuration and user data Whereas Search head clustering is a group of Splunk Enterprise search heads used to serve as a central resource for searching. Since the search head cluster supports member interchangeability, the same searches and dashboards can be run and viewed from any member of the cluster.

Explain the function of Alert Manager ?

Ans:

Alert manager displays the list of most recently fired alerts, i.e. alert instances. It provides a link to view the search results from that triggered alert. It also displays the alert’s name, app, type (scheduled, real-time, or rolling window), severity and mode.

What is SOS?

Ans:

SOS stands for Splunk on Splunk. It is a Splunk app that provides graphical view of your Splunk environment performance and issues.
It has following purposes:

Diagnostic tool to analyze and troubleshoot problems

Examine Splunk environment performance

Solve indexing performance issues

Observe scheduler activities and issues

See the details of scheduler and user driven search activity

Search, view and compare configuration files of Splunk

What is Splunk DB connect?

Ans:

It is a general SQL database plugin that permits you to easily combine database information with Splunk queries and reports. It provides reliable, scalable and real-time integration between Splunk Enterprise and relational databases.

What is the difference between Splunk App Framework and Splunk SDKs?

Splunk App Framework resides within Splunk’s web server and permits you to customize the Splunk Web UI that comes with the product and develop Splunk apps using the Splunk web server. It is an important part of the features and functionalities of Splunk Software , which does not license users to modify anything in the Splunk Software.
Splunk SDKs are designed to allow you to develop applications from the ground up and not require Splunk Web or any components from the Splunk App Framework. These are separately licensed to you from the Splunk Software and do not alter the Splunk Software.

What is Splunk indexer and explain its stages?

Ans:

The indexer is a Splunk Enterprise component that creates and manages indexes. The main functions of an indexer are:

Indexing incoming data

Searching indexed dataSplunk indexer has following stages:

Input : Splunk Enterprise acquires the raw data from various input sources and breaks it into 64K blocks and assign them some metadata keys. These keys include host, source and source type of the data.

Parsing : Also known as event processing, during this stage, the Enterprise analyzes and transforms the data, breaks data into streams, identifies, parses and sets timestamps, performs metadata annotation and transformation of data.

Indexing : In this phase, the parsed events are written on the disk index including both compressed data and the associated index files.

Searching : The ‘Search’ function plays a major role during this phase as it handles all searching aspects (interactive, scheduled searches, reports, dashboards, alerts) on the indexed data and stores saved searches, events, field extractions and views

What is the use of replace command?

Ans:

Replace command performs a search-and-replace on specified field values with replacement values. The values in a search and replace are case sensitive.Syntax:

replace (<wc-string> WITH <wc-string>)… [IN <field-list>]

Example:
… | replace *localhost WITH localhost IN hostChange any host value that ends with “localhost” to “localhost”.

List .conf files by priority.

Ans:

File precedence in Splunk is as follows:

System local directory: top priority

App local directories

App default directories

System default directory: lowest priority

What is the use of regex command?

Ans:

It removes results that do not match the specified regular expression.
Syntax:

After restart, you can login using default username: admin password: changeme

How to list all the saved searches in Splunk?

Ans:

Using syntax:

rest /servicesNS/-/-/saved/searches splunk_server=loca

State the different between stats and eventstats commands?

Ans:

stats – This command produces summary statistics of all existing fields in your search results and store them as values in new fields.eventstats – It is same as stats command except that aggregation results are added in order to every event and only if the aggregation is applicable to that event. It computes the requested statistics similar to stats but aggregates them to the original raw data.

32.Why use only Splunk? Why can’t I go for something that is open source?

Ans:

This kind of question is asked to understand the scope of your knowledge

. You can answer that question by saying that Splunk has a lot of competition in the market for analyzing machine logs, doing business intelligence, for performing IT operations and providing security. But, there is no one single tool other than Splunk that can do all of these operations and that is where Splunk comes out of the box and makes a difference. With Splunk you can easily scale up your infrastructure and get professional support from a company backing the platform. Some of its competitors are Sumo Logic in the cloud space of log management and ELK in the open source category. You can refer to the below table to understand how Splunk fares against other popular tools feature-wise.

Which Splunk Roles can share the same machine?

Ans:

This is another frequently asked Splunk interview question which will test the candidate’s hands-on knowledge. In case of small deployments, most of the roles can be shared on the same machine which includes Indexer,SearchHead and LicenseMaster. However, in case of larger deployments the preferred practice is to host each role on stand alone hosts. Details about roles that can be shared even in case of larger deployments are mentioned below:

Strategically, Indexers and Search Heads should have physically dedicated machines. Using Virtual Machines for running the instances separately is not the solution because there are certain guidelines that need to be followed for using computer resources and spinning multiple virtual machines on the same physical hardware can cause performance degradation.

However, a License master and Deployment server can be implemented on the same virtual box, in the same instance by spinning different Virtual machines.

You can spin another virtual machine on the same instance for hosting the Cluster master as long as theDeployment master is not hosted on a parallel virtual machine on that same instance because the number of connections coming to the Deployment server will be very high.

This is because the Deployment server not only caters to the requests coming from the Deployment master, but also to the requests coming from the Forwarders.

What are the unique benefits of getting data into a Splunk instance via Forwarders?

Ans:

You can say that the benefits of getting data into Splunk via forwarders are bandwidth throttling, TCP connection and an encrypted SSL connection for transferring data from a forwarder to an indexer. The data forwarded to the indexer is also load balanced by default and even if one indexer is down due to network outage or maintenance purpose, that data can always be routed to another indexer instance in a very short time. Also, the forwarder caches the events locally before forwarding it, thus creating a temporary backup of that data.

35. What is the use of License Master in Splunk?

Ans:

License master in Splunk is responsible for making sure that the right amount of data gets indexed. Splunk license is based on the data volume that comes to the platform within a 24hr window and thus, it is important to make sure that the environment stays within the limits of the purchased volume.

Consider a scenario where you get 300 GB of data on day one, 500 GB of data the next day and 1 terabyte of data some other day and then it suddenly drops to 100 GB on some other day. Then, you should ideally have a 1 terabyte/day licensing model. The license master thus makes sure that the indexers within the Splunk deployment have sufficient capacity and are licensing the right amount of data.

36. What happens if the License Master is unreachable?

Ans:

In case the license master is unreachable, then it is just not possible to search the data. However, the data coming in to the Indexer will not be affected. The data will continue to flow into your Splunk deployment, the Indexers will continue to index the data as usual however, you will get a warning message on top your Search head or web UI saying that you have exceeded the indexing volume and you either need to reduce the amount of data coming in or you need to buy a higher capacity of license.

Basically, the candidate is expected to answer that the indexing does not stop; only searching is halted.

37. Explain ‘license violation’ from Splunk perspective.

Ans:

If you exceed the data limit, then you will be shown a ‘license violation’ error. The license warning that is thrown up, will persist for 14 days. In a commercial license you can have 5 warnings within a 30 day rolling window before which your Indexer’s search results and reports stop triggering. In a free version however, it will show only 3 counts of warning.

Application Monitoring: By using knowledge objects, you can monitor your applications in real-time and configure alerts which will notify you when your application crashes or any downtime occurs

Network Security: You can increase security in your systems by blacklisting certain IPs from getting into your network. This can be done by using the Knowledge object called lookups

Employee Management: If you want to monitor the activity of people who are serving their notice period, then you can create a list of those people and create a rule preventing them from copying data and using them outside

Easier Searching Of Data: With knowledge objects, you can tag information, create event types and create search constraints right at the start and shorten them so that they are easy to remember, correlate and understand rather than writing long searches queries. Those constraints where you put your search conditions, and shorten them are called event types.

Explain Search Factor (SF) & Replication Factor (RF)

Ans:

Questions regarding Search Factor and Replication Factor are most likely asked when you are interviewing for the role of a Splunk Architect. SF & RF are terminologies related to Clustering techniques (Search head clustering & Indexer clustering).

The search factor determines the number of searchable copies of data maintained by the indexer cluster. The default value of search factor is 2. However, the Replication Factor in case of Indexer cluster, is the number of copies of data the cluster maintains and in case of a search head cluster, it is the minimum number of copies of each search artifact, the cluster maintains

Search head cluster has only a Search Factor whereas an Indexer cluster has both a Search Factor and a Replication Factor

Important point to note is that the search factor must be less than or equal to the replication factor

Which commands are included in ‘filtering results’ category?

Ans:

There will be a great deal of events coming to Splunk in a short time. Thus it is a little complicated task to search and filter data. But, thankfully there are commands like ‘search’, ‘where’, ‘sort’ and ‘rex’ that come to the rescue. That is why, filtering commands are also among the most commonly asked Splunk interview questions.

Search: The ‘search’ command is used to retrieve events from indexes or filter the results of a previous search command in the pipeline. You can retrieve events from your indexes using keywords, quoted phrases, wildcards, and key/value expressions. The ‘search’ command is implied at the beginning of any and every search operation.

Where: The ‘where’ command however uses ‘eval’ expressions to filter search results. While the ‘search’ command keeps only the results for which the evaluation was successful, the ‘where’ command is used to drill down further into those search results. For example, a ‘search’ can be used to find the total number of nodes that are active but it is the ‘where’ command which will return a matching condition of an active node which is running a particular application.

Sort: The ‘sort’ command is used to sort the results by specified fields. It can sort the results in a reverse order, ascending or descending order. Apart from that, the sort command also has the capability to limit the results while sorting. For example, you can execute commands which will return only the top 5 revenue generating products in your business.

Rex: The ‘rex’ command basically allows you to extract data or particular fields from your events. For example if you want to identify certain fields in an email id: [email protected], the ‘rex’ command allows you to break down the results as abc being the user id, edureka.co being the domain name and edureka as the company name. You can use rex to breakdown, slice your events and parts of each of your event record the way you want.

What is a lookup command? Differentiate between inputlookup&outputlookupcommands.

Ans:

Lookup command is that topic into which most interview questions dive into, with questions like: Can you enrich the data? How do you enrich the raw data with external lookup?
You will be given a use case scenario, where you have a csv file and you are asked to do lookups for certain product catalogs and asked to compare the raw data & structured csv or json data. So you should be prepared to answer such questions confidently.

Lookup commands are used when you want to receive some fields from an external file (such as CSV file or any python based script) to get some value of an event. It is used to narrow the search results as it helps to reference fields in an external CSV file that match fields in your event data.

An inputlookupbasically takes an input as the name suggests. For example, it would take the product price, product name as input and then match it with an internal field like a product id or an item id. Whereas, anoutputlookupis used to generate an output from an existing field list. Basically, inputlookup is used to enrich the data and outputlookup is used to build their information.

What is the difference between ‘eval’, ‘stats’, ‘charts’ and ‘timecharts’ command?

‘Eval’ and ‘stats’ are among the most common as well as the most important commands within the Splunk SPL language and they are used interchangeably in the same way as ‘search’ and ‘where’ commands.

At times ‘eval’ and ‘stats’ are used interchangeably however, there is a subtle difference between the two. While ‘stats‘ command is used for computing statistics on a set of events, ‘eval’ command allows you to create a new field altogether and then use that field in subsequent parts for searching the data.

Another frequently asked question is the difference between ‘stats’, ‘charts’ and ‘timecharts’ commands. The difference between them is mentioned in the table below.

Stats

Chart

Timechart

Stats is a reporting command which is used to present data in a tabular format.

Chart displays the data in the form of a bar, line or area graph. It also gives the capability of generating a pie chart.

Timechart allows you to look at bar and line graphs. However, pie charts are not possible.

In Stats command, you can use multiple fields to build a table.

In Chart, it takes only 2 fields, each field on X and Y axis respectively.

In Timechart, it takes only 1 field since the X-axis is fixed as the time field.

What are the different types of Data Inputs in Splunk?

Ans:

This is the kind of question which only somebody who has worked as a Splunk administrator can answer. The answer to the question is below.

The obvious and the easiest way would be by using files and directories as input

Configuring Network ports to receive inputs automatically and writing scripts such that the output of these scripts is pushed into Splunk is another common way

But a seasoned Splunk administrator, would be expected to add another option called windows inputs. These windows inputs are of 4 types: registry inputs monitor, printer monitor, network monitor and active directory monitor.

What are the defaults fields for every event in Splunk?

Ans:

There are about 5 fields that are default and they are barcoded with every event into Splunk.
They are host, source, source type, index and timestamp.

Explain file precedence in Splunk.

Ans:

File precedence is an important aspect of troubleshooting in Splunk for an administrator, developer, as well as an architect. All of Splunk’s configurations are written within plain text .conf files. There can be multiple copies present for each of these files, and thus it is important to know the role these files play when a Splunk instance is running or restarted. File precedence is an important concept to understand for a number of reasons:

To be able to plan Splunk upgrades

To be able to plan app upgrades

To be able to provide different data inputs and

To distribute the configurations to your splunk deployments.

To determine the priority among copies of a configuration file, Splunk software first determines the directory scheme. The directory schemes are either a) Global or b) App/user.

When the context is global (that is, where there’s no app/user context), directory priority descends in this order:

System local directory — highest priority

App local directories

App default directories

System default directory — lowest priority

When the context is app/user, directory priority descends from user to app to system:

User directories for current user — highest priority

App directories for currently running app (local, followed by default)

App directories for all other apps (local, followed by default) — for exported settings only

System directories (local, followed by default) — lowest priority

How can we extract fields?

Ans:

You can extract fields from either event lists, sidebar or from the settings menu via the UI.
The other way is to write your own regular expressions in props.conf configuration file.

46.What is the difference between Search time and Index time field extractions?

Ans:

As the name suggests, Search time field extraction refers to the fields extracted while performing searches whereas, fields extracted when the data comes to the indexer are referred to as Index time field extraction. You can set up the indexer time field extraction either at the forwarder level or at the indexer level.

Another difference is that Search time field extraction’s extracted fields are not part of the metadata, so they do not consume disk space. Whereas index time field extraction’s extracted fields are a part of metadata and hence consume disk space.

What is summary index in Splunk?

Ans:

Summary index is another important Splunk interview question from an administrative perspective. You will be asked this question to find out if you know how to store your analytical data, reports and summaries. The answer to this question is below.

The biggest advantage of having a summary index is that you can retain the analytics and reports even after your data has aged out. For example:

Assume that your data retention policy is only for 6 months but, your data has aged out and is older than a few months. If you still want to do your own calculation or dig out some statistical value, then during that time, summary index is useful

For example, you can store the summary and statistics of the percentage growth of sale that took place in each of the last 6 months and you can pull the average revenue from that. That average value is stored inside summary index.

But the limitations with summary index are:

You cannot do a needle in the haystack kind of a search

You cannot drill down and find out which products contributed to the revenue

You cannot find out the top product from your statistics

You cannot drill down and nail which was the maximum contribution to that summary.

That is the use of Summary indexing and in an interview, you are expected to answer both these aspects of benefit and limitation.

How to exclude some events from being indexed by Splunk?

Ans:

You might not want to index all your events in Splunk instance. In that case, how will you exclude the entry of events to Splunk.
An example of this is the debug messages in your application development cycle. You can exclude such debug messages by putting those events in the null queue. These null queues are put into transforms.conf at the forwarder level itself.

If a candidate can answer this question, then he is most likely to get hired.

What is the use of Time Zone property in Splunk? When is it required the most?

Ans:

Time zone is extremely important when you are searching for events from a security or fraud perspective. If you search your events with the wrong time zone then you will end up not being able to find that particular event altogether. Splunk picks up the default time zone from your browser settings. The browser in turn picks up the current time zone from the machine you are using. Splunk picks up that timezone when the data is input, and it is required the most when you are searching and correlating data coming from different sources. For example, you can search for events that came in at 4:00 PM IST, in your London data center or Singapore data center and so on. The timezone property is thus very important to correlate such events.

What is Splunk App? What is the difference between Splunk App and Add-on?

Ans:

Splunk Apps are considered to be the entire collection of reports, dashboards, alerts, field extractions and lookups.
Splunk Apps minus the visual components of a report or a dashboard are Splunk Add-ons. Lookups, field extractions, etc are examples of Splunk Add-on.

Any candidate knowing this answer will be the one questioned more about the developer aspects of Splunk.

How to assign colors in a chart based on field names in Splunk UI?

Ans:

You need to assign colors to charts while creating reports and presenting results. Most of the time the colors are picked by default. But what if you want to assign your own colors? For example, if your sales numbers fall below a threshold, then you might need that chart to display the graph in red color. Then, how will you be able to change the color in a Splunk Web UI?

You will have to first edit the panels built on top of a dashboard and then modify the panel settings from the UI. You can then pick and choose the colors. You can also write commands to choose the colors from a palette by inputting hexadecimal values or by writing code. But, Splunk UI is the preferred way because you have the flexibility to assign colors easily to different values based on their types in the bar chart or line chart. You can also give different gradients and set your values into a radial gauge or water gauge.

What is sourcetype in Splunk?

Ans:

Now this question may feature at the bottom of the list, but that doesn’t mean it is the least important among other Splunk interview questions.

Sourcetype is a default field which is used to identify the data structure of an incoming event. Sourcetype determines how Splunk Enterprise formats the data during the indexing process. Source type can be set at the forwarder level for indexer extraction to identify different data formats. Because the source type controls how Splunk software formats incoming data, it is important that you assign the correct source type to your data. It is important that even the indexed version of the data (the event data) also looks the way you want, with appropriate timestamps and event breaks. This facilitates easier searching of data later.

For example, the data maybe coming in the form of a csv, such that the first line is a header, the second line is a blank line and then from the next line comes the actual data. Another example where you need to use sourcetype is if you want to break down date field into 3 different columns of a csv, each for day, month, year and then index it. Your answer to this question will be a decisive factor in you getting recruited.