When working with data in the two apps, our developers make use of their knowledge
of Splunk® simple XML, Splunk search processing language (SPL™), and JavaScript.
These all help in understanding how to make use of the data that Splunk Enterprise
retrieves from the various sources.

Getting data from the Auth0 service into Splunk Enterprise using a modular input

One of the first issues to address for the Auth0 app is how best to get the information
collected by the Auth0 service into the Splunk app where it can be visualized and
analyzed in real time. Both Auth0 and Splunk Enterprise are hosted services, so
to enable viewing of real-time data from Auth0 in a Splunk app we must have some
mechanism for transferring event data over the network. The initial approach we
explored was to push data whenever anything interesting happened in the Auth0 service
to an endpoint in Splunk Enterprise. Splunk Enterprise would then be able to index
the incoming data and make it available to a dashboard in the Auth0 app. However,
we identified the following potential issues with this solution:

It is not robust. If the Splunk instance is not listening, or there are
connectivity problems, then event data from Auth0 is lost.

It is not efficient. Because of the way that the Auth0 service works internally,
there is no opportunity for batching event data to send to the Splunk instance.
For every interesting event in Auth0, the Auth0 service must make an HTTP call
to the Splunk instance.

It is not complete. Again, because of the way the Auth0 service works internally,
it is only possible to push some event types to the Splunk instance. For example,
the Auth0 service can send event data relating to successful logins, but not
failures.

It has limited reporting capabilities as you cannot get historical information.
You can only pick up the new events that are sent to your Splunk instance.

It is awkward to configure. You may need to configure firewall rules to
allow the Auth0 service to push data to your Splunk instance.

There is no easy deployment model. You need to add code to the Auth0 service
to push data into your Splunk service.

However, the Auth0 service generates its own complete log files that contain
all the detailed event data that the Splunk app requires. Copying log files on a
schedule from the Auth0 environment to the Splunk Enterprise environment would not
deliver the requirement for real-time information in the Splunk Enterprise dashboard.
If the Splunk instance could request data from the Auth0 service every couple of
seconds that would allow the dashboard to display sufficiently up-to-date information
on all the events generated in the Auth0 service. A robust solution would be if
the Splunk instance could keep track of the most recent event it received, and be
able to resubmit a request if it failed to receive the data. It would also be more
efficient, because the Splunk instance could request data in batches making it easier
to configure since the Splunk service is calling a public endpoint in the Auth0
service to request data.

The model we chose also makes it easy for customers to deploy our solution because
we can publish it as an app on the Splunkbase site (http://dev.splunk.com/goto/auth0app).
To implement this solution, we made changes to the Auth0 service and created a
modular input using the Splunk SDK for JavaScript in the Splunk app (for
more information, see "How
to work with modular inputs in the Splunk SDK for JavaScript"). Splunk
Enterprise also supports scripted inputs as an alternative to modular inputs. Scripted
inputs are easier to write, however they are more complex to deploy and manage,
especially if you need to support multiple operating systems. Scripted inputs also
have limitations as compared to modular inputs. For example, scripted inputs:

Do not support passing arguments in Splunk Web (which we require to pass
in the Auth0 credentials).

Do not provide validation feedback when you configure them.

Do not support multiple instances (you would need two copies of the script
if you had two Auth0 installations).

Are less integrated with respect to logging to Splunk Enterprise's own internal
logs.

Although Auth0 was new to Splunk Enterprise, it took Auth0 just
two weeks to get their basic Splunk app up and running along with making the
necessary changes to their API.

Modular inputs configuration parameters can be managed through
the Splunk REST API, which is really useful.

Changes to the Auth0 service

Our first challenge was how to retrieve data from the Auth0 service. We wanted
to enable a modular input to continuously poll for data, but the Auth0 service itself
did not have a suitable API. We determined that the best option was to create a
new REST API in the Auth0 service that enables a client (such as a Splunk instance) to
request the event data. This new API takes two parameters: take
specifies the maximum number of log entries to return, and from
specifies the log entry from which to start reading. This was a simple change to
make in the Auth0 service and did not have an impact on any other features.

Auth0 uses MongoDB to store log data from the Auth0 service.
Because MongoDB lets them assign incrementing IDs to log entries as they are
written, it's easy to implement an API that reads a sequential set of log entries
starting from a specified log entry.

Creating a modular input

A modular input enables us to add a new type of custom input to our Auth0 Splunk application
that behaves like one of the native input types. A user can interactively create
and update the custom inputs using Splunk Enterprise, just as they do for native
inputs (this would not be possible with a scripted input). The following screenshot
shows the new custom Auth0 input type in the list of available input types in the
Splunk Enterprise UI. This new Auth0 input type is defined in the server.js
script that is discussed later in this section:

The next screenshot shows the Auth0 input type UI requesting details of the Auth0
service to which to connect:

We choose to implement this modular input using the Splunk SDK for JavaScript
as the team working on this app are experienced users of
node.js. Node.js
is also a cross-platform development tool. It is just as easy to build a modular
input using one of the other Splunk SDKs in a language of your choice. When you
create a modular input using node.js, you define the input in a Node module that
exports a standard set of functions. The following code snippets come from the
server.js script in the bin\app folder in the app (there
are also scripts called auth0.cmd and auth0.sh in the
bin folder for launching the server.js script at startup
for both Windows and Linux environments).

Modular inputs are an alternative to scripted inputs. Where scripted
inputs are quick and easy to implement, they may not be easy for an end user
to use. Modular inputs require more upfront work by the developers, but are
much easier for end users to use.

To implement a modular input, you must define a Scheme instance,
which tells Splunk Enterprise about the arguments that a user configuring this input
must provide. You then provide any optional validation logic for those arguments,
as well as the logic for streaming the events back to Splunk Enterprise. The Auth0
input requires the user to provide credential information, it will then connect
to the Auth0 service to validate the credentials, and then connect to the Auth0
service to begin retrieving the data that it streams into Splunk Enterprise. As
you can see from the require calls below, the modular input relies on the Splunk
JavaScript SDK for the modular input infrastructure as well as the Auth0 SDK for communicating
with the Auth0 service:

The first section of the server.js script defines the Scheme
instance for the input that displays in the UI of Splunk Enterprise for configuring
the modular input. As you can see, we are exporting a getScheme
function. The Scheme instance describes the input and provides
its arguments. Notice how we set the property useSingleInstance
to false, which causes the UI to display an optional Interval
parameter to let a user specify how frequently the script should run. In this case,
the parameter determines the polling interval for checking with the Auth0 service
for new log data to request. For more information about creating modular inputs
using JavaScript, see "How
to work with modular inputs in the Splunk SDK for JavaScript."

The remainder of this script is a JavaScript function that has been broken into
chunks with commentary throughout so that you can follow along.

The first section validates that a checkpoint file can be created, and if one
does not exist, creates it. This file is used to store the current seek location
or last record seen during the polling of the Auth0 API. Splunk Enterprise provides
a checkpoint folder for each input to store its checkpoint data.

The main body of the script uses an asynchronous loop to poll the Auth0 service
for new log data. The Async.whilst method is from the Splunk SDK
for JavaScript. The loop continues until there are no more logs or an error is encountered.

var working =true;
Async.whilst(function(){return working;},

In the body of the loop, we first use the Auth0 API to retrieve up to 200 new
log entries, starting from the last checkpoint.

We check for errors and if there are any remaining log entries to index. If there
are none, working is set to false, which will then exit the whilst
loop. We also use the
Loggerclass from the SDK to record what happened.

Next, we loop over the log entries we retrieved from the Auth0 service. The
Event and EventWriter classes from the JavaScript
SDK are used to send data to Splunk Enterprise. We then record the most recent ID
in the logCheckpoint variable.

If there are any errors, we log the error and save the most recent log entry
id in the checkpoint file.

catch(e){
errorFound =true;
working =false;// Stop streaming if we get an error
Logger.error(name, e.message, eventWriter._err);
fs.writeFileSync(checkpointFilePath, logCheckpoint);// Write to the checkpoint file// We had an error, diereturn done(e);}}

Finally, if everything worked, we save the id of the last log entry we indexed
into the checkpoint file.

catch(e){
errorFound =true;
working =false;// Stop streaming if we get an error
Logger.error(name, e.message, eventWriter._err);
fs.writeFileSync(checkpointFilePath, logCheckpoint);// Write to the checkpoint file// We had an error, diereturn done(e);}
}

We designed this input based on the assumption that Splunk Enterprise will run
a single instance at a given time, and fetch events by continually polling. Each
time it fetches, it will pull down all available logs, send events back to the Splunk
instance, and then the process is killed. The way intervals work in Splunk Enterprise,
if the input is still collecting data when the interval timer expires, the Splunk
instance does not launch a new instance of the input. The interval applies only
when the input finishes its work. You can also configure the location where Splunk
Enterprise stores the checkpoint file, but the default location is in the
%SPLUNK_HOME%/var/lib/splunk/modinputs folder. For more information, see "Data
checkpoints."

You can create modular inputs using other languages such as Python
and C#.

Refreshing an index with checkpoints

During our testing, we need to be able to delete our indexed data and start over. To
do this, we followed this procedure:

Open Splunk Enterprise, click Settings, and then click
Data Inputs. Click Auth0 (the modular input
we defined) and then select Delete.

Delete the log checkpoint file (that tracks the most recent event that Splunk
retrieved from the Auth0 service) from the Splunk folder %SPLUNK_HOME%/var/lib/splunk/modinputs.

Delete the content of the index by running the command bin/splunk
eventdata clean INDEX_NAME in a shell or at a command prompt in Windows.

Getting data into Splunk Enterprise for the PAS app using data models and Splunk
Common Information Model extensions

The PAS app currently uses log data from three different sources: a database,
a document repository, and the file system. Each of these is defined as a separate
sourcetype: ri:pas:database, ri:pas:application,
and ri:pas:file. These three logs contain different types of event
data with different formats from each other. Each of these types has different field
names, which on first sight requires separate searches to pull the data. This is
not ideal and introduces a potential maintenance issue if we add a new data source
in the future (or if the format of one of the log files changes). We would like
to make the log data from these three sources (and any data sources we define in
the future) available in a normalized format to simplify the design of the searches
in the app. We would also like to make it available for other apps to consume in
a standardized format.

Fortunately, Splunk Enterprise offers a better solution. We can achieve the first
of these goals by using aliases and extracts to translate and map the content of
our log files into common field names, and by building data models based on the
extracts and aliases. We can achieve the second of these goals by building a special
model that maps and translates our log data into the structure defined in a Splunk
Common Information Model (CIM).

A data model is a semantic mapping from a set of events that can be used for
querying Splunk Enterprise. A data model specifies a set of fields with fixed data
types and an agreed interpretation with respect to the events Splunk Enterprise is
indexing that Splunk apps can use.

A Splunk CIM defines a core set of fields for a particular type of event that
might come from multiple log sources. For example, there is a
Change Analysis CIM data model with fields that describe Create,
Update, and Delete activities, and there is an
Authentication CIM data model with fields that describe login and
logout activities. For more information about these and other Splunk CIM data models,
see the section "Data Models" on the "Common
Information Model Add-on Manual" page. In addition to the documentation,
after you install the Splunk Common Information Model Add-on, you
can browse the structure of the models from the Pivot page in the
Search & Reporting app in Splunk Enterprise.

A CIM defines the lowest common denominator of the data associated
with the activity such as change analysis, authentication, or intrusion detection.
Browsing the model in Splunk Enterprise will give you more insight into the
structure of the model.

A CIM focuses on normalizing data and making it interoperable with other apps.
However, we also want to create a data model that is specific to our app, and that
will define all of the rich data that we need to build our pivot reports. You can
define multiple models for your data as CIM Extensions.

We also plan to
accelerate our CIM PAS Extension data model to improve query performance, this
will enable us to use commands such as tstats
on the fields in our data model in our searches.

Data model acceleration creates summaries for only those
specific fields you and your Pivot Editor users are interested in and want to
report on. To enable data model acceleration, follow these instructions: dev.splunk.com/goto/enabledatamodelacc.
While there, we highly recommend you review the restrictions on the kinds of
data model objects that can be accelerated.

You can manually generate accelerated namespaces and leverage
the power of indexed fields to perform statistical queries without having to
index fields. You do so by using the
tscollect command.

Mapping to a Splunk Common Information Model

For the PAS app, we determined that the Change Analysis CIM data model was the
most appropriate. After identifying the model to use, the next step is to map the
existing fields in our data sources to the set of standard field names defined in
the CIM to create a normalized view of the data. We begin by using a spreadsheet
to document the mappings from our three data sources to the CIM and then implement
the mappings using a combination of aliases, extracts,
and static evals in the props.conf file for each data
source. For example, we map the SQLTEXT field in the database log, use
the static value "updated" for the document repository log, and map an
extract field in the file log to the CIM field named action. Now
a search can refer to the action field, regardless of the particular
log file we are searching, and any other app that uses our data sources can expect
to find the standard field names from the CIM. If our app needs to support another
data source, we can perform a similar mapping operation and use search definitions
that are very similar to our existing ones. Furthermore, if the format of a log
file changes, we can accommodate those changes in our mappings without the need
to modify any searches that depend on specific field names. The following table
shows our initial set of mappings for our three data inputs:

Database log original field

Document log original field

File log original field

CIM field

SQLTEXT

"updated"

Extract

action

NAME

Enumerated values in log:

Connect

Insert

Update

Select

Delete

Quit

Grant

Revoke

event_name

Enumerated values in log:

login

download

edit

read

create

upload

share

permissions_changed

lock

unlock

delete

Extract

Enumerated values in log:

getattr

read

open

write

command

DOCUMENT

Extract

object

CONNECTION_ID

pid

object_id

IP

src_ip

src

USER

user_id

Extract

user

USER_ID

empid

user_id

event_details

object_attrs

event_id

event_id

Extract

event_target

"success"

status

"application"

change_type

For an event to show up in the Change Analysis CIM it must be
tagged with the value change as defined in the constraint in
the Change Analysis data model. You must define a search that assigns this tag
value to events from your data.

Tagging our events

Tagging events lets us associate those events with a data model. This works with
both Splunk Common Information Model and with our custom data model. To tag an event,
we first define event types and then associate those event types with tags. The
following screenshot shows the event types for our database provider add-on: you
can view this page in the Settings section of Splunk Enterprise:

You should make sure that event type names are unique to each
app, otherwise the definition in one app will overwrite the definition in another
one.

Each event type has a search string that identifies the events and a set of associated
tags. Notice how we reference the ri-pas-database event type in
the subsequent definitions, and how some event types have more than one associated
tags. You can also view the tags in the Settings section of Splunk
Enterprise:

Some of these tags (change_permissions, delete,
read, and update) are used to associate the events
with the Change Analysis CIM, and some of these tags (pas,
change, and audit) are used to associate events
with our custom data model. The pas tag is intended to be unique
to the PAS apps, while the other tags may be used by many other apps to identify
events generically. In the Google Drive add-on app, we also define
the tag cloudstorage that could be used in other similar apps such
as add-ons for OneDrive or DropBox to indicate a category of data.

The files eventtypes.conf and tags.conf in each of
the provider add-ons store these definitions.

Using tags

Searches in the PAS app can now use the tags instead of specifying an index to
search. For example:

Not all our searches use tags, in some cases we search for events
using more detailed criteria such as looking at the values in specific fields.

The only place where we mention the pas index is the authorize.conf
file in the main app (Indexes and Access Controls
in Settings). In the Access Controls settings
we specify pas as the default index for the users of the app. For
more information about authorizations and permissions in the PAS app, see the "Packaging
and deployment: reaching our destination" chapter in this guide. If you
decide to create another add-on app for the main PAS app, if the add-on app has
an inputs.conf file, that file will also refer to the pas
index.

The following diagram summarizes the role of the Splunk knowledge objects related
to tagging in the PAS app:

When we ship the PAS app it includes sample add-on provider apps, that together
with the Eventgen app generate sample event data that is indexed in the
pas index. When a customer deploys the app, they can use their own event
data and indexes provided that:

The events are tagged with the tags recognized by our data model.

The pasuser and pasadmin roles are authorized
to use the customer's index.

Defining a custom data model

In addition to mapping our log data to the Change Analysis CIM, we also defined
our own custom data model within the app to support pivot-based searches on the app
dashboards. A custom data model defines a set of fields (possibly organized hierarchically)
and a constraint that identifies the events that the data model handles. This definition
is expressed in JSON, and in our app the file is named ri_pas_datamodel.json.
The app also contains a datamodels.conf file that contains metadata
about the model such as whether it is accelerated.

As a reminder, CIM is the least common denominator and not very
rich. It makes sense to use other models or techniques as well. The key is to
make sure that CIM is also covered when extracting data, so that the least common
denominator can be relied on.

An accelerated model is equivalent to an indexed view in an Relational
Database Management System (RDMS). Searches will be faster, at the expense of
persisting and maintaining indexes.

The following screen shot from Splunk Enterprise shows the data model we defined
for the PAS app:

Notice how the constraint uses our tags to specify which events are included
in the model.

Mapping our data to a CIM or to a custom data model are both
examples of normalizing multiple data sources to a single model.

Defining our mappings in separate add-on apps

To make it easy to maintain these mappings and keep them all in a fixed location,
we package them as separate add-on apps. In the PAS app, we use these separate add-on
apps specifically because we want to let customers extend the PAS app by adding
their own data sources, which will require their own custom mappings. For information
about how the main PAS app recognizes these add-on apps, see the section "Using
the Splunk JavaScript SDK to interrogate other apps" in the chapter "Adding
code: using JavaScript and Search Processing Language." The following code
snippet shows the props.conf file from the RI document TA
app:

Other apps use the custom knowledge objects such as the field aliases and extracts
defined in our add-on apps; therefore, we give these objects Global
rather than App scope.

You can define the scope (individual, app, or global) of knowledge
objects in either a local.meta or default.meta file.
You should not ship an application that contains a local.meta file,
so you should move any scoping definitions to the default.meta
file.

A note about the props.conf file

For the PAS app, we are generating our own simulated events using the Eventgen app.
Therefore, we are confident that the format of the event data is optimized for consumption
by Splunk Enterprise. In practice, with real event data, you may be able to further
improve the performance of Splunk Enterprise when it parses the event data by providing
additional information in the props.conf file. Typically, you should
include the following attributes: TIME_PREFIX, MAX_TIMESTAMP_LOOKAHEAD,
TIME_FORMAT, LINE_BREAKER, SHOULD_LINEMERGE,
TRUNCATE, KV_MODE. The following snippet shows an example of these
attributes in use:

For more information about these attributes, see "props.conf"
in the Admin manual.

Rebuilding our index after refactoring

As part of the effort to refactor our data inputs, create the data models, and
package them in add-on apps, we renamed our sourcetypes part way through our journey:
for example, we renamed the conducive:app
sourcetype to ri:pas:application.
We also renamed the app that contains our sample data. An unintended consequence
of this was that Splunk Enterprise could no longer find the sample data and was
no longer indexing our data. To fix this, we had to delete the content of the old
index named pas completely by using the following procedure:

Add the admin user to the can_delete role
in Access controls in Splunk Enterprise.

Stop Splunk Enterprise.

At an a operating system command prompt, run the following command:bin/splunk clean eventdata -index pas

Using
the data models

Earlier in this chapter, we describe our custom data model and how we map our
log data to the Change Analysis CIM. After building our custom data model, we can
refactor our existing dashboards to make use of the data model and use pivots in
the search criteria. For example, the Summary dashboard includes
several pivot searches that use the data model such as this one
that is based on the Root_Event in our data model:

These search definitions now use the fields defined in our custom data model
such as _time, user, command,
and object.

For more information about the pivot command, see "pivot"
in the Search Reference.

The following screenshot shows a search that uses the Change_Analysis
CIM data model to show some of the sample data from the PAS add-ons (in this example
the Document and File sample providers):

Modifying the data model to support additional queries

The following screenshot shows an example of a pivot based on the Root
Event in our original data model that shows counts of the different commands
executed by individual users. The existing dashboards in our app all use pivots
similar to this one to retrieve the data they display:

We plan to add visualizations to the summary screen that show the overall health
of the system we are monitoring. These visualizations will need Key Performance
Indicator (KPI) values to determine the overall health, and we need to modify our
data model to enable us to query for these KPIs. Our initial set of KPIs are: a
count of out of hours accesses to the system (Invalid Time Access),
a count of accesses by terminated employees (Terminated Access),
and a count of policy violations (Policy Violation).

The Policy Violation object gets removed later
in our journey.

The following screenshot shows a pivot based on the Terminated Access
event and you can see the count on each day. We can use the count of Terminated
Access events for the last day as part of the calculation of the overall
system health status:

We define additional events in the PAS Data Model, such as
Terminated Access events, as children of the root event. The following
screenshot shows the attributes and constraint the Terminated Access
event inherits from the root event along with the additional constraint that identifies
the specific event type.

We can now use these additional event definitions in the search managers on our
dashboards. For example to search for Policy Violation events,
we use the following search definition:

Case study: Using data models to handle large volumes of data

One reason to use data models is to optimize the performance of Splunk Enterprise
when you have a large number of users who use a dashboard that runs searches across
high volumes of data. For example, you have a requirement for a dashboard used by
several hundred users that displays information from the last thirty days and you
have multiple terabytes of new data to index every day. This is considerably more
data than the PAS scenario expects, but an app such as PAS will still see performance
benefits from using accelerated data models.

Using simple, inline searches on the dashboard that search for the last thirty
days of data will be unusably slow in this scenario. Therefore, the first step might
be to replace the in-line searches with saved reports that you have accelerated
for the last thirty days. While this will speed up the searches, they will typically
stall at the end because Splunk Enterprise only updates the accelerated data every
ten minutes. A search over the last thirty days retrieves mostly accelerated data
but still has to search the raw data for the last few minutes' worth of nonaccelerated
data. To work around this problem, you can modify your dashboards to report on the
last thirty days of data using a time range that excludes the last ten minutes to
ensure that the searches only retrieve accelerated data.

You can further improve on this approach by using scheduled reports. This lets
the searches on the dashboard access cached data on the search head instead of accessing
the indexers for the accelerated data. You can manually schedule the searches you
need to run as reports every ten minutes, and then the searches on the dashboards
can load the results of the scheduled reports from the search head using the
loadjob command.

You can also accelerate a data model to improve the performance of pivot searches
based on the data model. This provides similar performance improvements to accelerated
reports, but in addition to enabling pivot searches, accelerated data models:

Update every five minutes instead of every ten minutes.

Let you manage the amount of disk space required to store the accelerated
data because you can choose which columns to add to your data model.

It's possible to further optimize a high data volume scenario
by using a custom solution instead of a data model. For example, you could run
a search, with a timespan of one minute, every minute that appends data to an
output lookup file, and then on the dashboard use input lookups to read this
summary data.

For more information about data model acceleration, see the "Accelerate
data models" section of the Knowledge Manager Manual.

For more information about using reports, see "About
reports" in the Reporting Manual.

For more information about scheduling reports, see "Schedule
reports" in the Reporting Manual.

For more information about the loadjob command, see "loadjob"
in Search Reference.

Integrating with a third-party system

On the User Activity dashboard we display information about
a user that we pull from a third-party system. In the sample PAS app this third-party
system is a REST endpoint we implemented using Python that simulates a directory
service such as LDAP or Active Directory.

Using a mock implementation like this let us develop the functionality
in the absence of the real directory service with real user data.

The following screenshot shows how we display this information on the dashboard:

To pull the data from our simulated directory service, we use a
custom search command. Splunk Enterprise lets you implement custom search commands
for extending the SPL. Custom search commands are authored in Python, and are easy
to do with the Splunk SDK for Python. Our custom search command is named pasgetuserinfo
as shown in the following code snippet from the user_activity.xml
file:

We implement this custom command in the PAS Get User Information app
(you can find this sample in the test repository). The commands.conf
file specifies the name of the custom command as shown in the following configuration
snippet:

This configuration file identifies the Python source file, pasgetuserinfo.py,
that implements the custom event generating command. The following code sample shows
the complete implementation of the pasgetuserinfo command:

Notice how this code imports the GeneratingCommand,
Configuration, and Option classes from the splunklib.searchcommands module
in the Splunk SDK for Python. We chose a GeneratingCommand because
we are manufacturing events. The generate method calls our mock
REST API endpoint passing the value of the user option of the custom
command. If the REST API recognizes the user, it returns a JSON string containing
the user data. The generate method then returns this data as a
dictionary instance. To use a real directory service, we can replace the code in
the generate method with code to query the real service and return
the data in a Python dictionary instance.

The JavaScript code behind the User Activity dashboard formats
the data from the custom search command to display in the panel.

Using stateful configuration data in the PAS app

In the PAS app, the Suspicious Activity panel and the donut
charts in the Policy Violations panel on the Summary
dashboard make use of configuration data that the user creates the first time they
use the app. The section "Sharing code between dashboards" in the chapter "Adding
code: using JavaScript and Search Processing Language" describes how we
direct the user to the Setup dashboard the first time they access
the PAS app for providing this data. The following screenshot shows the
Setup dashboard and the data the user must create:

On this dashboard, the user can select the departments for which they want to
see a donut chart, and provide definitions of the policy violations that should
appear in the list of suspicious activities. Each policy violation type has a name,
a color, and a weight that the calculations behind the visualizations use. The following
screenshot from the Summary dashboard shows the donut charts for
the Development and Management departments selected
on the Setup dashboard and in the Violation Types
from the Setup dashboard in the Suspicious Activity
panel:

We need a mechanism to persist the configuration data the user enters on the
Setup dashboard so it can be read by the code that renders the
donuts on the Summary dashboard. Historically,
custom REST endpoints have been the mechanism used to persist data in a scenario
such as this one, and a Splunk Enterprise API is available for accessing your custom
endpoint. The
App KV Store is a new feature in the version of Splunk Enterprise we are using that
provides a more robust and easy to use data storage management than custom REST
endpoints. App KV Store can even interface to a database using convenient REST operations,
although, this is not one of our requirements for the PAS app. Additionally, App
KV Store has built-in support for a distributed architecture of search head clusters: a
considerable amount of coding is needed to add this level of functionality to a
custom REST endpoint solution. All the functionality we need comes with Splunk Enterprise,
therefore we decided to use App KV Store to persist our configuration data. No additional
coding is needed beyond defining your data collection and invoking the App KV Store
REST operations.

We use the KV Store to persist global configuration data shared by
all users of the PAS app. It is possible to use the KV Store to persist per-user
data.

The Setup dashboard uses the KV Store feature in Splunk Enterprise
to persist the setup data in a collection named ri_setup_coll that
we define in the collections.conf file as shown in the following configuration
snippet.

We use two different collections, one for departments and one for violation types
to make it easier to access this data in a search. Notice how we use arrays to store
the list of departments and the policy violation types to accommodate a variable
number of entries in each case. We then use a transforms.conf file
to make the setup data in the KV store available to our searches:

Now we can use the setup data in the searches behind the visualizations on the
Summary dashboard. For example, the search policy_violations_search
in the summary.xml file which extracts the data for both the donut
visualizations and the Suspicious Activity panel includes the following
lookup clause to use the setup data:

| lookup violation_types id AS ViolationType OUTPUTNEW title AS ViolationTypeTitle, color AS ViolationColor, weight AS ViolationWeight,

The policy_violations_color_summary search that retrieves the
data for the donut visualizations uses the following join clause
to filter the data based on the departments the user selected on the Setup
dashboard:

We also added some utility code that replaces a complete collection
of data in the KV store. See the function setCollectionData
in the setup.js file for more details.

At a later stage in the project we add a new field to the KV store to let a user
toggle the display of the learning tip icons. To make this change we add a new field
named learningTipsEnabled in the collections.conf
file, add a new checkbox in the setup.xml file, and some additional
code in setup.js to initialize and save the field value. To use the
learningTipsEnabled configuration setting in the app, we read the
value using code in the dashboard.js file and control the visibility
of the learning tip icons using CSS. For more information about the dashboard.js
and dashboard.css files, see the chapter "UI
and visualizations: what the apps look like."

Search managers that don't return results

It's possible in some circumstances that the user_info_search
on the User Activity dashboard does not return any results. We
noticed that in this case that the on("data", ... callback
function is not being invoked. We have modified the code in the user_info.js
file to work round this problem as shown in the following code sample:

userInfoSearch.data("results",{// HACK: By default, no "data" event is fired when no results are// found. Override so that it does fire in this case.
condition:function(manager, job){return(job.properties()||{}).isDone;}}).on("data",function(resultsModel){var rows = resultsModel.data().rows;if(rows.length ===0){
view.html("No user information found.");}else{...

What did
we learn?

This section summarizes some of the key lessons learned while we were working
with the data our apps use.

You can use a modular input to pull in events from external systems.

You can author modular inputs in several languages including JavaScript
/ node.js.

You can store state in a modular input by writing to a file.

If you are using a modular input written in JavaScript, you can instrument
your code using methods such as error and info
of the ModularInputs.Logger class. You search for these log
messages in the _internal index in Splunk itself.

You can store state in a modular input by writing to a file.

You can use the CIM to provide a standard way to search against disparate
sources of data. You can map existing and future sources to the CIM using aliases,
extractions, event types, and tags.

You can create your own data models to provide a richer mapping for querying
your data.

You can easily extend a data model to support additional search requirements.

You can use the KV store to persist data that can then be referenced in
searches.

You learned how to delete an index completely and how to delete all the
entries in an index. Both are useful when testing an app.

You need to know your data to design effective apps. Different users of
your app have different data and must be able to configure the app to make it
work for them.

Questions?

We use our own and third-party cookies to provide you with a great online experience. We also use these cookies to improve our products and services, support our marketing campaigns, and advertise to you on our website and other websites. Some cookies may continue to collect information after you have left our website.
Learn more (including how to update your settings) here »