Dealing with identifier variables in data management and analysis

P. Wilner Jeanty
Kinder Institute for Urban Research
and
Hobby Center for the Study of Texas
Rice University
Houston, TX
pwjeanty@rice.edu

Abstract. Identifier variables are prominent in most data files and, more often
than not, are essential to fully use the information in a Stata
dataset. However, rendering them in the proper format and relevant
number of digits appropriate for data management and statistical
analysis might pose unnerving challenges to inexperienced or even
veteran Stata users. To lessen these challenges, I provide
some useful tips and guard against some pitfalls by featuring two
official Stata routines: the string() function and its
elaborated wrapper, the tostring command. I illustrate how to
use these two routines to address the difficulties caused
by identifier variables in managing and analyzing data from private
institutions and U.S. government agencies.