Science, or how to analyse raw data yourself

As stated in the frequently asked questions, the dataset collected through this userscript should be great for analysing individual matches, maps, players and the game in general. We believe there are much more opportunities for analysis than currently available on this website, and that offline analysis is needed for large-scale research into the dataset. We also think the community should be actively involved in this, so that this website primarily serves as a data warehouse for the community. This page will get you started with analysing the dataset.

JSON fields

Let us first describe the structure of match files, which are formatted as JSON (ignore the last column of this table):

Field

Type

Description

Differences for old tagpro.me records

server

string

Domain name of the game server.

port

unsigned

Port number assigned to the match by the game server.

official

boolean

Indicator whether the game server is an official server. This field is only available when downloading data from this website!

group

string

8-letter group identifier assigned by the game server if this was a group match, or the empty string if this was a public match. When downloading data from this website, you will always see "redacted" for group matches instead of actual identifier for security reasons.

Not available.

date

unsigned

The date and time of the start of the match, as a UNIX timestamp. In the database of this website, this value is corrected for the clock difference between the client and this web service.

Date indicates end of match instead of start and may be approximate.

timeLimit

real

Time limit of the match, in minutes. Usually 12, but this can be customised for group matches. Is always a multiple of 0.25, i.e. 15 seconds.

Not available.

duration

unsigned

Actual duration of the match, in frames. Each second consists of 60 frames. Required as input for PlayerLogReader below.

Not available. Instead, a field named timeRemaining indicates the unused time on the clock at the end of the match, in frames.

finished

boolean

Indicator whether the match was completed until either the capture limit or time limit was reached.

map

object

Sub-object describing the map of the match.

Map has been guessed using name and author and match date. Matches for which no reliable guess was possible have been removed from the database.

map.name

string

Name of the map, or "Untitled" if untitled. Note that matches on untitled maps are rejected from the database of this website.

map.author

string

Author(s) of the map, or "Unknown" if unknown.

map.type

string

Map type. "ctf" for public capture-the-flag maps, "nf" for public neutral-flag maps, "mb" for public marsball maps, "-" for group-only maps, "e14", "u14", "h14", "b15", "p16", "e16" or "s17" for event maps or the empty string for unofficial maps. This field is only available when downloading data from this website!

Group-only potato maps and the 2015 April Fools event are misrepresented as normal public maps.

map.marsballs

unsigned

Number of marsballs on the map.

map.width

unsigned

Width of the map, in number of tiles. Required as input for MapLogReader below.

map.tiles

blob

Description of the map tile grid, to be decoded by the MapLogReader class below. Implicitly defines the map height when decoded using the correct width.

Potatoes, if any, are coded as flags. Gravity wells during the 2015 April Fools event are not indicated.

players

array

Array of objects describing the players that participated in the match.

Only players present at the end of the match are listed.

players[…].auth

boolean

Indicator whether the player name is his/her reserved name.

players[…].name

string

Name of the player, or "••••••••••••" for anonymised records.

players[…].flair

unsigned

Flair of the player. 0 if none. 1 for the top-left flair in flair.png, plus 16 for each row downward, plus 1 for each column to the right.

Not available.

players[…].degree

unsigned

Degree of the player, or 0 if stats are disabled, the player is not logged in or the player has no degree yet.

Score awarded to the player by the game server. The userscript resets positive scores to zero for early quitters.

players[…].points

unsigned

Rank points awarded to the player by the game server at the end of the match. Always 0 in group matches.

Not available.

players[…].team

unsigned

Team of the player at the start of the match. 1 for red, 2 for blue, or 0 if the player missed the match start and joined later. Required as input for PlayerLogReader below.

Value indicates team at end of the match instead of at start, never 0.

players[…].events

blob

Description of the player-specific timeline of the match, to be decoded by the PlayerLogReader class below.

Not available. Instead, unsigned integer fields named grabs, hold, captures, drops, pops, prevent, returns, tags and support contain the aggregate statistics for the match. hold and prevent are in seconds.

teams

array

Array of objects describing the teams that participated in the match, excluding player-related data. The two array elements correspond to the red and blue team respectively.

teams[…].name

string

Name of the team. Usually "Red" for the red team or "Blue" for the blue team, but the group leader can customise this for group matches.

Not available.

teams[…].score

unsigned

Final score of the team at the end of the match, in number of captures. Usually equal to the number of captures derived from the decoded player data, but it can be larger if an initial score was set for a group match.

teams[…].splats

blob

Description of the splat locations for the team, to be decoded by the SplatLogReader class below. Times of the splats can be linked to the pop times in the player-specific timelines, but the individual splat locations cannot be linked to individual pops if multiple team members pop simultaneously.

Not available.

Be aware that the fields of type blob are represented by base64-encoded strings. You must first base64-decode them before feeding them to the appropriate LogReader.

Blob format

Of course the really interesting fields are the blob fields. These contain binary data that has to be read bit-by-bit, as they are sequences of the following data types:

Type

LogReader method

Description

Example usage

Bool

LogReader::readBool()

A boolean value (1 bit).

Whether a player popped in a time step.

Fixed

LogReader::readFixed($bits)

An unsigned integer, less than 2$bits, stored in a fixed (or exogeneously determined) number of bits $bits.

Tile type code on a map (6 bits).Splat coordinate relative to some reference point (number of bits and reference point position depend on map size).

Tally

LogReader::readTally()

An unsigned integer stored in a variable number of bits, namely as tally marks. Only efficient for numbers that are almost always very small.

Number of tags of a player in a time step.

Footer

LogReader::readFooter()

An unsigned integer stored in a variable number of bits, using a 2-bit header indicating the length and the remaining other bits indicating the number itself similar to Fixed. The overall data type is called Footer because it always aligns its end to a byte boundary, i.e. a multiple of 8 bits. (The header part of the Footer is used to indicate which byte boundary.) This makes it useful to finalise a block of data with one last integer. Because of the byte-alignment dependency, the maximum possible integer varies, but all numbers less than 1+256+2562+2563=16843009 are guaranteed to fit.

Time difference in time steps between two positions on the timeline of a player, minus one.Number of times the same map tile is repeated.

Hence the key to decoding is to read the right variables of the right types in the right order. We have created three classes, namely PlayerLogReader, MapLogReader and SplatLogReader, to do this for you, and convert the binary data into a series of events. You attach event listeners by creating a child class and overriding the corresponding methods. The PHP code of all classes, which should be easy to translate to any other programming language, is given below.

<?php

// Copyright (c) 2017, Jeroen van der Gun// All rights reserved.//// Redistribution and use in source and binary forms, with or without modification, are// permitted provided that the following conditions are met://// 1. Redistributions of source code must retain the above copyright notice, this list of// conditions and the following disclaimer.//// 2. Redistributions in binary form must reproduce the above copyright notice, this list of// conditions and the following disclaimer in the documentation and/or other materials// provided with the distribution.//// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF// MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL// THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,// SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT// OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)// HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR// TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Two events in the PlayerLogReader imply other events that are not fired separately: a return implies a tag and drop implies a pop. (The former is not really true for marsball returns, but the way the data is recorded you cannot distinguish between marsball returns and normal returns.)

A PlayerLogReader::flaglessCaptureEvent means a marsball capture, i.e. a capture that does not affect your flag carrying status.

For PlayerLogReader events, the $powers/$newPowers/$oldPowers attribute may be a bitwise combination of multiple powers, so use the bitwise operators of your programming language there.

Prior to running SplatLogReader, you must run MapLogReader. Listen to the MapLogReader::heightEvent, add one to the largest/last $newY you get, and you have the $height required by the SplatLogReader constructor.

The splat times provided by SplatLogReader are only indices. To find the actual in-game times, first run PlayerLogReader on all players. There, listen to PlayerLogReader::dropEvent and PlayerLogReader::popEvent and collect all times of deaths, per team. Sort these times and remove duplicates, still per team. The splat time index can now be looked up in this array of the team.

The SplatLogReader::splatsEvent receives an array of simultaneous splat locations. The order of these locations is unfortunately not always related to the order of players in the JSON. The coordinates are in pixels measured from the center of the top-left (possibly empty) tile of the map. Each tile is 40 pixels.

Compare your results with what you see in the Timeline and Splats sections of match visualisations on this website, to ensure you are decoding the data correctly. If you make a mistake, you will not get an error, but the read data will be garbage from that point on.

Example

The following example PHP script, designed to be run from the command line, reads a JSON match file, and then outputs to the console the map in Unicode art, the match timeline, and the splats of each team along with the corresponding times. It should be simple to adapt this to measure any other metric from the match data.

#!/usr/bin/php<?php

// Copyright (c) 2015, Jeroen van der Gun// All rights reserved.//// Redistribution and use in source and binary forms, with or without modification, are// permitted provided that the following conditions are met://// 1. Redistributions of source code must retain the above copyright notice, this list of// conditions and the following disclaimer.//// 2. Redistributions in binary form must reproduce the above copyright notice, this list of// conditions and the following disclaimer in the documentation and/or other materials// provided with the distribution.//// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF// MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL// THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,// SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT// OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)// HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR// TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Downloading bulk data

Rather than downloading data from individual matches, you can also download the entire database in bulk. The bulk data is split into two JSON files, maps and matches:

from #to # inclusive (beware: full dataset is several gigabytes)

Here, matches and maps are coupled to a key indicating their id. Matches have a mapId property indicating the map id. Which files you need depends on the type of analysis you want to do. Note that the splat data is located in the Matches file, although splat data cannot be read without the Maps file.

C++ implementation

In addition to the previous PHP code, we also have a C++ implementation of all decoders. This is especially handy for processing the large bulk data files efficiently. Download this zip file, which contains the above blob readers in tagpro.h, base64-functionality in base64.h and a JSON parser in json.h. The file test.cpp is an example program processing the bulk data that you can use as a starting point for your own.

Note that your compiler must support C++11 or later. For some compilers, e.g. GCC, you may need to set a command-line flag to enable C++11 support.

Old tagpro.me records

Prior to TagPro Analytics, there used to be the tagpro.me website run by bluesoul. We have merged his data into our database. Although these old matches cannot be downloaded individually, they are available in the following bulk JSON file:

This can be used with the same bulk maps file as above. Be aware that there are important differences in the data format; consequently, your analyses cannot be as advanced as with TagPro Analytics. The differences are outlined in the last column of the table at the top of this page. All match ids of tagpro.me matches start with the letter b. Special match #b1277368 did not originate from the tagpro.me dataset but follows the same format.

If you want, you can download the original CSV file published by bluesoul as a torrent. However, it is probably more useful to use our transformation of this dataset, as the original publication contains many incomplete, duplicate and partly corrupted records, which we repaired or removed as appropriate. It may be helpful to know that the match id on TagPro Analytics is equal to the id of the first player of a match in the original dataset.

Map conversion tool

If you have the PNG and JSON files of a map in standard TagPro format, select them together in the form below in order to render the map and output a converted map description in TagPro Analytics format.

Good luck!

We understand that this requires quite some effort, but you should be able to produce awesome data analyses using this dataset. Please credit us as your data source when you publish any analysis, so that other people can find out too about this service.

Finally, whatever you do or want to do with this dataset, please check out and subscribe to /r/tagprostatistics. Please post there if you have anything to share, so that everybody is aware of what others are doing.