Compute vector levels

We will start with the most simple function. During comperes development, idea about the it really helped me reason more clearly about package functional API. I am talking about levels2() which computes “levels” of any non-list vector.

It has the following logic: if x has levels attribute then return levels(x); otherwise return character representation of vector’s sorted unique values. Notes about design and implementation of this function:

I hesitated a lot about whether it should return character or same type as input vector in case x has no levels. In many practical cases there is a need in latter behavior. However, in the end I decided that type stable output (levels(x) always returns character vector or NULL) is better.

Conversion to character is done after sorting, which is really important when dealing with numeric vectors.

This function is helpful when one needs to produce unique values in standardized manner (for example, during pairwise distance computation). Some examples:

Manage item summaries

Arguably, the most common task in data analysis is computation of group summaries. This task is conveniently done by consecutive application of dplyr’s group_by(), summarise() and ungroup() (to return regular data frame and not grouped one). comperes offers a wrapper summarise_item() for this task (which always returns tibble instead of a data frame) with additional feature of modifying column names by adding prefix (which will be handy soon):

Sometimes, there is also a need to compare actual values with their summaries across different grouping. For example, determine whether car’s number of carburetors (carb) is bigger than average value per different groupings: by number of cylinders cyl and V/S vs.

To simplify this task, comperes offers a join_item_summary() function for that: it computes item summary with summarise_item() and joins it (with dplyr::left_join()) to input data frame:

Adding different prefixes helps navigating through columns with different summaries.

Convert pariwise data

One of the main features of comperes is the ability to compute Head-to-Head values of players in competition. There are functions h2h_long() and h2h_mat() which produce output in “long” (tibble with row describing one ordered pair) and “matrix” (matrix with cell value describing pair in corresponding row and column) formats respectively.

These formats of pairwise data is quite common: “long” is better for tidy computing and “matrix” is better for result presentation. Also converting distance matrix to data frame with pair data is a theme of several Stack Overflow questions (for example, this one and that one).

Package comperes has functions as_h2h_long() and as_h2h_mat() for converting between those formats. They are powered by a “general usage” functions long_to_mat() and mat_to_long(). Here is an example of how they can be used to convert between different formats of pairwise distances: