join

Description

Use the join command to combine the results of a subsearch with the results of a main search. One or more of the fields must be common to each result set. You can also combine a search result set to itself using the selfjoin command.

Alternative commands

For flexibility and performance, consider using one of the following commands if you do not require join semantics. These commands provide event grouping and correlations using time and geographic location, transactions, subsearches, field lookups, and joins.

Appends the fields of the subsearch results with the input search result fields. The first subsearch result is merged with the first main result, the second subsearch result is merged with the second main result, and so on.

To group events by using the eval command with a conditional expression, such as if, case, or match.

To group events by using a recycled field value, such as an ID or IP address.

To group events by using a pattern, such as a start or end time for the event.

To break up groups larger than a certain duration. For example, when a transaction does not explicitly end with a message and you want to specify a maximum span of time after the start of the transaction.

Syntax

join [join-options...] [field-list] subsearch

Required arguments

subsearch

Syntax: "[" subsearch "]"

Description: A secondary search where you specify the source of the events that you want to join to. The subsearch must be enclosed in square brackets. The results of the subsearch should not exceed available memory.

Limitations on the subsearch for the join command are specified in the limits.conf.spec file. The limitations include the maximum subsearch to join against, the maximum search time for the subsearch, and the maximum time to wait for subsearch to fully finish. See Subsearches in the Search Manual.

Optional arguments

field-list

Syntax: <field>, <field>, ...

Description: Specify the fields to use for the join. If no fields are specified, all of the fields that are common to both result sets are used.

Field names must match, not just in name but also in case. You cannot join product_id with product_ID. You must first change the case of the field in the subsearch to match the field in the main search.

Description: Options to the join command. Use either outer or left to specify a left outer join.

Descriptions for the join-options argument

type

Syntax: type=inner | outer | left

Description: Indicates the type of join to perform. The difference between an inner and a left (or outer) join is how the events are treated in the main search that do not match any of the events in the subsearch. In both inner and left joins, events that match are joined. The results of an inner join do not include events from the main search that have no matches in the subsearch. The results of a left (or outer) join includes all of the events in the main search and only those values in the subsearch have matching field values.

Default:inner

Examples of the data that are included in a join.

usetime

Syntax: usetime=<bool>

Description: A Boolean value that Indicates whether to use time to limit the matches in the subsearch results. Used with the earlier option to limit the subsearch results to matches that are earlier or later than the main search results.

Default:true

earlier

Syntax: earlier=<bool>

Description: If usetime=true and earlier=true, the main search results are matched only against earlier results from the subsearch. If earlier=false, the main search results are matched only against later results from the subsearch. Results that occur at the same time (second) are not eliminated by either value.

Default:true

overwrite

Syntax: overwrite=<bool>

Description: Indicates whether fields from the subresults overwrite the fields from the main results, if the fields have the same field name.

Default:true

max

Syntax: max=<int>

Description: Specifies the maximum number of subsearch results that each main search result can join with. If set to max=0, there is no limit.

Default:1

Usage

The join command is a centralized streaming command when there is a defined set of fields to join to.
See Command types.

Use the join command when the results of the subsearch are relatively small, for example 50,000 rows or less. To minimize the impact of this command on performance and resource consumption, Splunk software imposes some default limitations on the subsearch. See the subsearch section in the syntax for more information about these limitations.

One-to-many and many-to-many relationships

To return matches for one-to-many, many-to-one, or many-to-many relationships, include the max argument in your join syntax and set the value to 0. By default max=1, which means that the subsearch returns only the first result from the subsearch. Setting the value to a higher number or to 0, which is unlimited, returns multiple results from the subsearch.

Examples

Example 1

Combine the results from a main search with the results from a subsearch search vendors. The result sets are joined on the product_id field, which is common to both sources.

... | join product_id [search vendors]

Example 2

If the field names in the sources do not match, you can rename the field in the subsearch result set. The field in the main search is product_id. The field in the subsearch is pid.

Note: The field names must match in name and in case. You cannot join product_id with product_ID.

... | join product_id [search vendors | rename pid AS product_id]

Example 3

By default, only the first row of the subsearch that matches a row of the main search is returned. To return all of the matching subsearch rows, include the max=<int> argument and set the value to 0. This argument joins each matching subsearch row with the corresponding main search row.

... | join product_id max=0 [search vendors]

Example 4

The dashboards and alerts in the distributed management console shows you performance information about your Splunk deployment. The Resource Usage: Instance dashboard contains a table that shows the machine, number of cores, physical memory capacity, operating system, and CPU architecture.

To display the information in the table, use the following search. This search includes a join command. The search uses the information in the dmc_assets table to look up the instance name and machine name. The search then uses the serverName field to join the information with information from the /services/server/info REST endpoint. The /services/server/info is the URI path to the Splunk REST API endpoint that provides hardware and operating system information for the machine. The $splunk_server$ part of the search is a token variable.

Comments

I second @Samwatson45's question: "Is there any way to do a full join, so rather than type=left, type=both, where both sides fully join and null values on either side are included?"

I am assuming that by type=both he implies outer join.

However, I tried it out and outer join did not work as intended.
It worked only when the reference dataset was on the left side of the join and the dataset i was comparing was the subsearch on the right of the join.
When the reference dataset was on the left and I did type=outer, no null values were returned.

If this is how it is intended to work, there needs to be clear examples to illustrate the same.

Anirbandasdeb

September 6, 2018

Is there any way to do a full join, so rather than type=left, type=both, where both sides fully join and null values on either side are included?

One would expect to have a mv field called source added to each event joined but instead you get a flattened list concatenated with spaces.

Not sure if this is a bug or undocumented feature :)

Bsonposh, Splunker

November 22, 2017

Rjthibod - Yes, I do think it is worth mentioning the Union command. Thanks for the suggestion!

Lstewart splunk, Splunker

June 2, 2017

Is it worth adding the new `union` command to the Alternative Commands section starting for version 6.6?

Rjthibod

June 1, 2017

Lstewart -- Hm, then is the "usetime" default listed incorrectly? Is the default for usetime actually false? This is the behavior I'm seeing: https://gist.github.com/jordan-brough/b510997732b17046adf5

Jordananimoto

October 5, 2015

Yes, the default value for "earlier" is "true.

Lstewart splunk, Splunker

October 5, 2015

Is "true" really the default value for the "earlier" option? On my installation at least the default seems to be "false", which seems like the more expected behavior.

Jordananimoto

September 25, 2015

The documentation for the join command has been updated based on the feedback we received and the questions posted in Splunk Answers. Thank you!

Lstewart splunk, Splunker

June 11, 2015

@KerdonMike make sure your have this "max=0". for example in your code try

| join type=left max=0

i know that works for inner joins. The thing is Splunk will automatically do a "dedup" on the subquery dataset on the field that is being joined to the original dataset.

Tdiestel

May 3, 2015

And, once there will be functionality for editing my own posts, I will correct my first post from "join command is now working as it should" to "join command is NOT working as it should".

KerdonMike

March 24, 2015

In similar fashion, splunk's "inner join" is not entirely inner join, because it only allows comparing columns of same name, and for equality.
So it is (as described in the linked wiki page) somewhere between equi-join and natural join.

So the used comparison:
"SQL-like joining of results from the results from the main search with the results from a subsearch."

Should be more precise as:
"Nothing like SQL joins, and also bugged compared to our own definition."

I hope the annoyance of working with splunk's join is evident from my posts.

KerdonMike

March 24, 2015

Either the documentation is incorrect, or the join command is now working as it should. Specifically, for the "type=left".

In my example data, if I run query:
"... (some specific code) ..." it returns 1 result / row.

Enter your email address, and someone from the documentation team will respond to you:

Send me a copy of this feedback

Please provide your comments here. Ask a question or make a suggestion.

Feedback submitted, thanks!

You must be logged into splunk.com in order to post comments.
Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic.
If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk,
consider posting a question to Splunkbase Answers.

0
out of 1000 Characters

Your Comment Has Been Posted Above

We use our own and third-party cookies to provide you with a great online experience. We also use these cookies to improve our products and services, support our marketing campaigns, and advertise to you on our website and other websites. Some cookies may continue to collect information after you have left our website.
Learn more (including how to update your settings) here »