This release brings a number of improvements. First, there are significant performance improvements, which make the package more useful for anyone using it for large datasets. I have also simplified the package so it always returns data frames. This was my first R package that I published on CRAN, and let’s just say that I have since found a lot of low-hanging fruit when it came to performance and usability.

Second, I have added a dataset from the North Atlantic Population Project which provides a dataset of names for Canada, Great Britain, Germany, Iceland, Norway, Sweden for the nineteenth century. This extends the package’s usefulness beyond its original focus on American history. (If you have suitable datasets for other times and places, I’d welcome contributions since the gender package can be easily extended.)

Third, I have added a new function gender_df() which makes it easier to use gender with a common research problem. The gender() function is vectorized on names but not on dates. In other words, it is easy to pass gender() many names, but not many dates. Suppose, for example, that we have a list of names and wish to guess their genders for birth years in the 1930s. We can do that like this:

While it has always been possible to use the gender() function with Map() or dplyr::do(), those are not easy to use choices, and most code that I have seen (especially my own at first) has had a naive approach that calls the gender() function many more times than necessary. So the gender_df() function lets you pass it a data frame with a column of first names and a column of years (or two columns specifying the minimum and maximum of a range of years).