Overview

Usage data is aggressively normalized on the server. Ultimately, the data is stored in multiple tables.

Tables

Profile

A record is created in the profile table, usagedata_profile for each workspace we encounter. When the UDC uploads data, it provides us with two identifiers (both are UUIDs generated by the UDC) representing the user and the workspace. These identifiers are used to correlate the data. The user identifier effectively represents an individual computer; the workspace identifier represents an individual workspace. This distinction was deemed necessary in order to account for users who run multiple Eclipse workspaces. Note that the ids cannot by themselves be used to determine the identity of the user or any personal information.

A row is created in the profile table for each distinct userId/workspaceId pairing.

Table Description

The table contains an id field which is the primary key. This field is used as the target of the foreign-key reference from the Upload table.

Upload

Each entry in the upload table, usagedata_upload, represents an upload event. That is, every time the UDC "calls home" with an upload, we add a row to this table. Each row records the profileId of the user (from the Profile table using the userId/workspaceId combination provided with the upload event), the country code (ccode), and the time (on the server) of the upload.

Table Description

The table contains an id field which is the primary key. This field is used as the target of the foreign-key reference from a Record table.

Record

Multiple "record" tables are created and maintained, one for each month of gathered usage data. The table usagedata_record_monthly_200901, for example, contains usage data uploaded in January 2009 (note that this will include data generated in the previous month). We decided to separate this data into multiple tables in anticipation of having to deal with extremely large amounts of data (there are 127,680,562 records in the January 2009 table).

This table is highly normalized. Most of the content of a usage data event record is textual. All content is stored in the "String" table (see below); only the id of the corresponding entry is stored in the record table itself.

The record table records the uploadId of the upload from which the a row originates, along with the what, kind, bundleId, bundleVersion, description, and time fields (described in What Gets Captured).

In order to determine the user from which an individual record was obtained, a Record table must be joined to the Upload table and then to the Profile table.