Contributing Guidelines

Sample Data

Page Contents

Apache Software Foundation Issue Tracker

The sample time-series dataset in apachejira comes from public Apache instance of JIRA. It was converted to a TSV file with a Java program that uses the JIRA API, with the following fields:

action

The action taken, one of “comment”, “create”, or “update”.

actor

The name of the user who took the action.

assignee

The name of the user to which this issue is assigned.

category

The category to which the issue belongs. Multiple projects roll up to a single category.

fieldschanged

Which fields were modified in this action. A single field containing a space-separated list of fields modified. For example, if the action changed the assignee and status in this action, there will be one result: “assignee status”. More appropriate for display.

fieldschangedtok

Which fields were modified in this action. A multi-valued field containing one result for each field changed. For example, if the action changed the assignee and status in this action, there are two different results: “assignee” and “status”. More appropriate for filtering.

fixversion

The fixversions this issue was set to when this action was taken. A single field containing a pipe (|) separated list of fixversions. For example, if this issue is assigned to fixversions master and 7.0, there is one result: “master|7.0”. More appropriate for display.

fixversiontok

The fixversions this issue was set to when this action was taken. A multi-valued field containing one fix version each. For example, if this issue is assigned to fixversions master and 7.0, there are two results: “master” and “7.0”. More appropriate for filtering.

issueage

The number of seconds between when this action took place and when the issue was created.

issuekey

The key of the specified JIRA ticket.

issuetype

The type of this ticket.

prevstatus

The status of the ticket prior to this action. If the action did not change status, this will always be the same as the current status.

project

The name of the project (not the abbreviation) to which this issue belongs.

reporter

The name of the person that created this issue.

resolution

If this issue is resolved, the string value of the Resolution field. Otherwise blank.

status

The status of the issue.

summary

he summary (i.e., short description or name) of the ticket.

timeinstate

The number of seconds between when this action took place and when the last action changed this issue’s status.

Source: https://dumps.wikimedia.org/other/pagecounts-raw/ for page counts and https://dumps.wikimedia.org/backup-index.html for all other fields

World Cup 2014 Player Data

The dataset in worldcupplayerinfo_20140701.tsv includes information about players in the World Cup 2014. Since this is not typical time-series Imhotep data, all documents are assigned the same timestamp: 2014-07-01 00:00:00

Each document in the dataset includes information about a single player:

Player

String

Player’s name.

Age

Int

Player’s age.

Captain

Int

Value (1 or 0) indicates whether the player is a captain.

Club

String

The player’s club when not playing for the national team in the World Cup.

Country

String

The country the player represents in the World Cup.

Group

String

The player’s national team belongs to this World Cup group.

Jersey

Int

The player’s jersey number.

Position

String

The player’s position.

Rank

Int

The ranking of the country the player represents.

Selections

Int

The number of World Cup appearances for this player.

TSV Data Size (raw uncompressed)

Imhotep Data Size

45 KB

15 KB

Source: Stack Exchange Network / Open Data
The data are distributed under the creative commons Attribution-Share Alike 4.0 International license. The creator of the data is http://opendata.stackexchange.com/users/3061/bryan. In compliance with this license, the data is hereby attributed to the users and owners of StackOverflow, but not in such a way as to suggest that they endorse Indeed or Indeed’s use of the data.