Modified lencat() — Increased Flexibility with dplyr

One of the first functions in the FSA package was lencat(), which served me well over the years. However, I have been bothered by the use of a formula and data= to identify a single column to be “transformed” and that an “automatic” determination of startcat= was not coded. Additionally, lencat() did not work well with dplyr, which I recently discovered (see my introduction). Thus, I have reworked lencat() in the latest FSA to handle these issues while maintaining the original functionality.

The modified lencat() behaves slightly differently depending on how the user supplies the fish lengths. If the user provides a formula and data=, then lencat() will return a data.frame with the new variable appended. This is the exact same behavior as the original lencat(). However, if the user supplies a vector as the first argument, then lencat() will now return a single vector of the length categorization values. Additionally, in both uses, the user can leave startcat= blank and a reasonable starting value (i.e., a value just below the minimum observed value that “makes sense” given w=) will be used.

The new functionality of lencat() is demonstrated below. First, I loaded the FSA and dplyr packages.

library(FSA)
library(dplyr)

Smallmouth Bass length data from a lake in Minnesota will be used and for the sake of simplicity, all variables related to measurements on the scales of the fish (i.e., all variables containing “anu” and “radcap”) and the species and lake (because they were constant at “SMB” and “WB”) were removed.

The advantage of using dplyr in this way is that you can string together multiple data manipulations. For example, one could create the variable as above but then order the rows of the data.frame in ascending length category values as follows.

Extended Example of New Functionality

In the examples above, the 10-mm length categories were created without the use of startcat=. The lencat() function found the first even 10-mm length category (50) below the minimum observed value (55) and created length categories from that. One can still set the value for the starting category with startcat= as follows.

The default type returned by lencat() is numeric. This can result in “missing categories” in length frequency distributions. For example, the length frequency distribution for 25-mm length categories shown below is missing the 375- and 400-mm categories.

The problem with missing length categories can be corrected by having the values returned as a factor rather than a numeric. The return values are forced to be a factor by including as.fact=TRUE to lencat() as shown below.

Finally, one can still use breaks= to set specific and potentially unequally-spaced values for the length categories. The example below finds the Gabelhouse five-cell length categories for Smallmouth Bass and then creates two new variables from these values – one that will show the length values and one that shows the category name values. To further exhibit the use of dplyr I also removed (i.e., use filter()) all fish that were less than “stock” size (i.e., the zero category).