BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//104.27.130.195//NONSGML kigkonsult.se iCalcreator 2.20//
CALSCALE:GREGORIAN
METHOD:PUBLISH
X-FROM-URL:https://www.vilhuber.com/lars
X-WR-TIMEZONE:America/New_York
BEGIN:VTIMEZONE
TZID:America/New_York
X-LIC-LOCATION:America/New_York
BEGIN:STANDARD
DTSTART:20171105T020000
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
TZNAME:EST
END:STANDARD
BEGIN:DAYLIGHT
DTSTART:20180311T020000
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
TZNAME:EDT
END:DAYLIGHT
END:VTIMEZONE
BEGIN:VEVENT
UID:ai1ec-1074@www.ncrn.cornell.edu
DTSTAMP:20171214T021012Z
CATEGORIES;LANGUAGE=en-US:Conferences\,Presentation\,vilhuber
CONTACT:http://www.ssc.wisc.edu/naddi2015/
DESCRIPTION:The North American Data Documentation Initiative Conference (NA
DDI) is an opportunity for those using metadata standards and those intere
sted in learning more to come together and learn from each other. Modeled
on the successful European DDI User Conference\, NADDI 2015 will be a thre
e day conference (April 8-10) with invited and contributed presentations\,
and should be of interest to both researchers and data professionals in t
he social sciences and other disciplines.\nCornell’s Bill Block is on the
Program Committee.
DTSTART;VALUE=DATE:20150408
DTEND;VALUE=DATE:20150411
GEO:+43.076592;-89.412488
LOCATION:University of Wisconsin-Madison @ Madison\, WI\, USA
SEQUENCE:0
SUMMARY:NADDI 2015
URL:https://www.vilhuber.com/lars/event/naddi-2015/
X-ALT-DESC;FMTTYPE=text/html:\\n\\n

\\n\\n\\n

The North American Data Documentation Initiat
ive Conference (NADDI) is an opportunity for those using metadata standard
s and those interested in learning more to come together and learn from ea
ch other. Modeled on the successful European DDI User Conference\, NADDI 2
015 will be a three day conference (April 8-10) with invited and contribut
ed presentations\, and should be of interest to both researchers and data
professionals in the social sciences and other disciplines.\nCornell
’s Bill Block is on the Program Committee.

Beka Steorts (CMU)/Shrivastava (Cornell)\, Quantifying populations when we
don’t know who is being counted: A real-life application

\n

Discuss
ant: Amy O’Hara (U.S. Census Bureau) [15 min]

\n

12:15 Boxed lunch
es

\n

1:45-2:15 Light Refreshments – First Floor East Court

\n

2:15 Welcome to the Seminar — Lawrence Brown\, CNSTAT Chair and the
University of Pennsylvania

\n

2:20 Developments at the OMB Statisti
cal and Science Policy Office — Katherine Wallman\, Chief Statistician of
the U.S.

\n

2:35 Featured Topic:

\n

‘Can Government-Academic
Partnerships Help Secure the Future of the Federal Statistical System?Exam
ples from the NSF-Census Research Network\,’\nJohn Abowd\, Cornell U
niversity [presenter] and Stephen Fienberg\, Carnegie Mellon University

\n

Robert Groves\, Georgetown University [facilitator and discussant]\nErica Groshen\, Bureau of Labor Statitics [discussant] ‘Comment on:
Can Government-Academic Partnerships Help Secure the Future of the Federal
Statistical System? Examples from the NSF-Census Research Network’

\n<
p>4:00 Floor Discussion\n

Reception East Court

\n

For addition
al information\, contact the NCRN Coordinating Office

\n

Nodes:

\n

NCRN Coordinating Office\nCarnegie-Mellon University\nCor
nell University\nDuke University / National Institute of Statistical
Sciences (NISS)\nNorthwestern University\nUniversity of Color
ado at Boulder / University of Tennessee\nUniversity of Michigan\nUniversity of Missouri\nUniversity of Nebraska

END:VEVENT
BEGIN:VEVENT
UID:ai1ec-1073@www.ncrn.cornell.edu
DTSTAMP:20171214T021012Z
CATEGORIES;LANGUAGE=en-US:Conferences\,Presentation\,vilhuber
CONTACT:http://iassistdata.org/conferences/iassist-2015-call-papers
DESCRIPTION:Bridging The Data Divide: Data In The International Context\nTh
e theme of our 2015 conference is Bridging the Data Divide: Data in the In
ternational Context. Going hand in hand with the well-known digital divide
is a growing inequity in access to data. Increasing budget concerns have
placed strains on governments\, universities\, and other institutions in t
he provision of data services. From the cancellation of the Statistical Ab
stract of the United States\, to the controversy over the Canadian Census
long form\, to political barriers in the data collection process in some c
ountries\, access to data and the data divide presents organizational\, ec
onomic and educational challenges to the community of data professionals w
orldwide.\n
DTSTART;VALUE=DATE:20150602
DTEND;VALUE=DATE:20150606
GEO:+44.977753;-93.265011
LOCATION:Minneapolis\, MN\, USA
SEQUENCE:0
SUMMARY:IASSIST 2015
URL:https://www.vilhuber.com/lars/event/iassist-2015/
X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\n\\n\\n

Bridging The Data Divide: Data In The Interna
tional Context\nThe theme of our 2015 conference is Bridging the Dat
a Divide: Data in the International Context. Going hand in hand with the w
ell-known digital divide is a growing inequity in access to data. Increasi
ng budget concerns have placed strains on governments\, universities\, and
other institutions in the provision of data services. From the cancellati
on of the Statistical Abstract of the United States\, to the controversy o
ver the Canadian Census long form\, to political barriers in the data coll
ection process in some countries\, access to data and the data divide pres
ents organizational\, economic and educational challenges to the community
of data professionals worldwide.\n

X-TAGS;LANGUAGE=en-US:Confidentiality\,NCRN\,Privacy\,SynLBD\,UNECE
END:VEVENT
BEGIN:VEVENT
UID:ai1ec-2855@www.ncrn.cornell.edu
DTSTAMP:20171214T021012Z
CATEGORIES;LANGUAGE=en-US:Conferences\,Presentation\,vilhuber
CONTACT:https://sites.stanford.edu/researchdatacenter/conference-agenda
DESCRIPTION:“Earnings Inequality Trends in the United States: Nationally Re
presentative Estimates from Longitudinally Linked Employer-Employee Data”\
, John Abowd (Cornell University and U.S. Census Bureau)\, Kevin McKinney
(U.S. Census Bureau)\, Nellie Zhao (Cornell University)\nExtended Abstract
\nWe track sources of earnings inequality using the statistical technique
introduced to the labor economics literature in 1999 (Abowd\, Kramarz and
Margolis\, Econometrica 1999). When this technique has been used in Europe
(Card\, Heining and Kline QJE 2013 for Germany\, in particular)\, the big
gest contributor to the increase in earnings inequality appears to be incr
eased employer-level heterogeneity (called the firm effect in AKM). Using
the Census Bureau’s Longitudinal Employer-Household Dynamics Infrastructur
e data for 1990-2013\, we show that with respect to the U.S. data\, the CH
K result does not hold. There has been very little change in employer-leve
l earnings heterogeneity in the U.S. when one compares wage measures simil
ar to the ones used to analyze the European data. European administrative
databases allow one to construct something akin to a wage rate (usually\,
the amount that would be earned if an individual worked full-time full-yea
r). The American data does not directly allow that. We develop a statistic
al approximation to the full-year full-time wage rate\, using integrated C
urrent Population Survey\, Census 2000\, and American Community Survey dat
a. Using that measure\, the earnings inequality trends in the U.S. look mo
re similar to the European analyses.\nBut\, for the purposes of studying e
arnings inequality\, considering only the wage rate\, and not the amount o
f time a person actually works\, is seriously incomplete—especially in the
U.S. where there is very little statutory employment security except in t
he public sector. The most important determinant of increased earnings ine
quality in our analyses is changes in labor force attachment (weeks worked
in the year\, hours worked per week).\nIn attempting to estimate how impo
rtant the labor-force attachment component is\, we reconstruct the work-el
igible population (18-70) for each year from 1990-2013. The administrative
records database developed at the Census Bureau uses an encrypted SSN to
track individuals. The researcher can tell if the number that was encrypte
d is a valid SSN\, and can also access the demographic details and employm
ent history associated with the underlying SSN. In our model\, there are t
wo kinds of SSNs that are suspect: ones that are not valid (this means tha
t the employer reported earnings in a state’s UI system for an SSN that wa
s never issued) and ones associated with demographic characteristics that
mean it is unlikely that the owner of the SSN used it (leading case: the S
SN was issued to a person who was less than 10 years old in the year durin
g which the SSN was used to report UI eligible earnings). Our working hypo
theses are: (1) the use of an invalid SSN reflects the work of a single un
documented immigrant\, so we add that person to both the eligible populati
on and the working population and (2) the use of a valid SSN issued to som
eone who appears to be too young (or too old) to work legally represents o
ne person in the population (not working\, not immigrant\; i.e.\, eligible
to get an SSN by virtue of birth in the U.S.) and at least one other pers
on both working and in the work-eligible population\, who is an undocument
ed immigrant.\nGetting the non-working work-eligible population as accurat
e as possible is important because\, especially during the Great Recession
\, many persons had no income from work for a full calendar year. We have
no trouble finding these people for properly documented native-born and im
migrant subpopulations\, but we have to estimate how many work-eligible no
n-documented immigrants are still in the U.S. looking for work in any give
n year.\nWe also link data from the 1992-2012 Economic Censuses. These dat
a are used to construct a measure of surplus per worker (revenue minus fac
tor opportunity costs) for every private establishment in the censuses. Th
ese data show similar results for the population of working persons employ
ed in the private sector. In particular\, they show that there has not bee
n an increase in overall earnings variability for this population.
DTSTART;VALUE=DATE:20151113
DTEND;VALUE=DATE:20151115
GEO:+37.427475;-122.169719
LOCATION:Stanford University @ Li Ka Shing Conference Center 291 Campus Dr
ive\, Stanford\, CA 94305
SEQUENCE:0
SUMMARY:Abowd @ NBER Conference on Firm Heterogeneity and Income Inequality
: “Earnings Inequality Trends in the United States: Nationally Representat
ive Estimates from Longitudinally Linked Employer-Employee Data”
URL:https://www.vilhuber.com/lars/event/abowd-nber-conference-on-firm-heter
ogeneity-and-income-inequality-earnings-inequality-trends-in-the-united-st
ates-nationally-representative-estimates-from-longitudinally-linked-employ
er-e/
X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\n\\n\\n

Extended Abstract\nWe track sources of earnings inequality u
sing the statistical technique introduced to the labor economics literatur
e in 1999 (Abowd\, Kramarz and Margolis\, Econometrica 1999). When this te
chnique has been used in Europe (Card\, Heining and Kline QJE 2013 for Ger
many\, in particular)\, the biggest contributor to the increase in earning
s inequality appears to be increased employer-level heterogeneity (called
the firm effect in AKM). Using the Census Bureau’s Longitudinal Employer-H
ousehold Dynamics Infrastructure data for 1990-2013\, we show that with re
spect to the U.S. data\, the CHK result does not hold. There has been very
little change in employer-level earnings heterogeneity in the U.S. when o
ne compares wage measures similar to the ones used to analyze the European
data. European administrative databases allow one to construct something
akin to a wage rate (usually\, the amount that would be earned if an indiv
idual worked full-time full-year). The American data does not directly all
ow that. We develop a statistical approximation to the full-year full-time
wage rate\, using integrated Current Population Survey\, Census 2000\, an
d American Community Survey data. Using that measure\, the earnings inequa
lity trends in the U.S. look more similar to the European analyses.
\nBut\, for the purposes of studying earnings inequality\, considering onl
y the wage rate\, and not the amount of time a person actually works\, is
seriously incomplete—especially in the U.S. where there is very little sta
tutory employment security except in the public sector. The most important
determinant of increased earnings inequality in our analyses is changes i
n labor force attachment (weeks worked in the year\, hours worked per week
).\nIn attempting to estimate how important the labor-force attachme
nt component is\, we reconstruct the work-eligible population (18-70) for
each year from 1990-2013. The administrative records database developed at
the Census Bureau uses an encrypted SSN to track individuals. The researc
her can tell if the number that was encrypted is a valid SSN\, and can als
o access the demographic details and employment history associated with th
e underlying SSN. In our model\, there are two kinds of SSNs that are susp
ect: ones that are not valid (this means that the employer reported earnin
gs in a state’s UI system for an SSN that was never issued) and ones assoc
iated with demographic characteristics that mean it is unlikely that the o
wner of the SSN used it (leading case: the SSN was issued to a person who
was less than 10 years old in the year during which the SSN was used to re
port UI eligible earnings). Our working hypotheses are: (1) the use of an
invalid SSN reflects the work of a single undocumented immigrant\, so we a
dd that person to both the eligible population and the working population
and (2) the use of a valid SSN issued to someone who appears to be too you
ng (or too old) to work legally represents one person in the population (n
ot working\, not immigrant\; i.e.\, eligible to get an SSN by virtue of bi
rth in the U.S.) and at least one other person both working and in the wor
k-eligible population\, who is an undocumented immigrant.

\n

Getting
the non-working work-eligible population as accurate as possible is import
ant because\, especially during the Great Recession\, many persons had no
income from work for a full calendar year. We have no trouble finding thes
e people for properly documented native-born and immigrant subpopulations\
, but we have to estimate how many work-eligible non-documented immigrants
are still in the U.S. looking for work in any given year.\nWe also
link data from the 1992-2012 Economic Censuses. These data are used to con
struct a measure of surplus per worker (revenue minus factor opportunity c
osts) for every private establishment in the censuses. These data show sim
ilar results for the population of working persons employed in the private
sector. In particular\, they show that there has not been an increase in
overall earnings variability for this population.

X-TAGS;LANGUAGE=en-US:NCRN
END:VEVENT
BEGIN:VEVENT
UID:ai1ec-2825@www.ncrn.cornell.edu
DTSTAMP:20171214T021012Z
CATEGORIES;LANGUAGE=en-US:Conferences\,Presentation\,vilhuber
CONTACT:http://www.eddi-conferences.eu/ocs/index.php/eddi/eddi15/paper/view
/192
DESCRIPTION:“Improving Access and Data Security to Confidential Labor Marke
t Data”\, Warren Brown (Cornell University)\, Stephanie Jacobs (Cornell Un
iversity)\, David Schiller (German Institute for Employment Research)\, Jö
rg Heining (German Institute for Employment Research)\nAbstract: The Corne
ll Institute for Social and Economic Research (CISER)\, Cornell University
and the Institute for Employment Research (IAB)\, German Federal Employme
nt Agency are collaborating to expand use of IAB’s confidential Sample of
Integrated Labour Market Biographies (SIAB). DDI 2.5 is used to enable res
earchers to discover the files by means of variable level searching in a r
epository of metadata on U.S. and German labor market related data files.
The repository is the Comprehensive Extensible Data Documentation and Acce
ss Repository (CED2AR) being developed by researchers at Cornell Universit
y with funding from the U.S. National Science Foundation. CED2AR provides
researchers access to machine-readable codebooks with variable characteris
tics thus enabling researchers to develop detailed proposals for access to
these data that are submitted to IAB. Researchers with approved projects
are able to access and analyze the data using the Cornell Restricted Acces
s Data Center (CRADC)\, a remote access virtual data enclave using remote
desktop protocol. In the initial testing phase several researchers located
in Europe and North America are successfully accessing and analyzing the
Scientific Use Files of the SIAB. The project is well on its way to realiz
ing the goal of wider access to researchers while improving secure managem
ent of confidential data.\nThe presentation can be found at http://hdl.han
dle.net/1813/44707
DTSTART;VALUE=DATE:20151202
DTEND;VALUE=DATE:20151204
GEO:+55.676097;+12.568337
LOCATION:Royal School of Library and Information Sciences @ Copenhagen\, De
nmark
SEQUENCE:0
SUMMARY:Brown presents @ EDDI 2015: Improving Access and Data Security to C
onfidential Labor Market Data
URL:https://www.vilhuber.com/lars/event/brown-presents-eddi-2015-improving-
access-and-data-security-to-confidential-labor-market-data/
X-COST-TYPE:free
X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\n\\n\\n

“Improving Access and Data Security to Confid
ential Labor Market Data”\, Warren Brown (Cornell University)\, Stephanie
Jacobs (Cornell University)\, David Schiller (German Institute for Employm
ent Research)\, Jörg Heining (German Institute for Employment Research)\nAbstract: The Cornell Institute for Social and Economic Research (CIS
ER)\, Cornell University and the Institute for Employment Research (IAB)\,
German Federal Employment Agency are collaborating to expand use of IAB’s
confidential Sample of Integrated Labour Market Biographies (SIAB). DDI 2
.5 is used to enable researchers to discover the files by means of variabl
e level searching in a repository of metadata on U.S. and German labor mar
ket related data files. The repository is the Comprehensive Extensible Dat
a Documentation and Access Repository (CED2AR) being developed by research
ers at Cornell University with funding from the U.S. National Science Foun
dation. CED2AR provides researchers access to machine-readable codebooks w
ith variable characteristics thus enabling researchers to develop detailed
proposals for access to these data that are submitted to IAB. Researchers
with approved projects are able to access and analyze the data using the
Cornell Restricted Access Data Center (CRADC)\, a remote access virtual da
ta enclave using remote desktop protocol. In the initial testing phase sev
eral researchers located in Europe and North America are successfully acce
ssing and analyzing the Scientific Use Files of the SIAB. The project is w
ell on its way to realizing the goal of wider access to researchers while
improving secure management of confidential data.\nThe presentation
can be found at http://hdl.handle.net/1813/44707

Poster presentation at North American DDI (NA
DDI) Conference\, held in Edmonton\, Alberta\, CA on April 7\, 2016. Downl
oad the poster from http://hdl.handle.net/1813/44704.

\n

HTML>
X-TAGS;LANGUAGE=en-US:NCRN
END:VEVENT
BEGIN:VEVENT
UID:ai1ec-3007@www.ncrn.cornell.edu
DTSTAMP:20171214T021012Z
CATEGORIES;LANGUAGE=en-US:Conferences\,Presentation\,vilhuber
CONTACT:
DESCRIPTION:John Abowd will be giving two talks at the University of Nebras
ka-Lincoln\, at the opening of the Central Plains Federal Statistical Res
earch Data Center. The first talk is titled “Social Science Research in th
e Era of Restricted-Access Data”
DTSTART;VALUE=DATE:20160422
DTEND;VALUE=DATE:20160423
GEO:+40.818253;-96.695225
LOCATION:University of Nebraska-Lincoln @ Lincoln\, NE 68588\, USA
SEQUENCE:0
SUMMARY:John Abowd: Social Science Research in the Era of Restricted-Access
Data
URL:https://www.vilhuber.com/lars/event/john-abowd-social-science-research-
in-the-era-of-restricted-access-data/
X-COST-TYPE:free
X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\n\\n\\n

John Abowd will be giving two talks at the Un
iversity of Nebraska-Lincoln\, at the opening of the Central Plains Feder
al Statistical Research Data Center. The first talk is titled “Social Scie
nce Research in the Era of Restricted-Access Data”

Information on room
blocks have been made available to NCRN PIs. If you are coming from out of
town and are not affiliated with a NCRN node\, please contact the NCRN Co
ordinating Office.

\n

Nodes:

\n

NCRN Coordinating Office
\nUniversity of Nebraska\nCornell University\nNorthwestern Uni
versity\nDuke University / National Institute of Statistical Science
s (NISS)\nUniversity of Colorado at Boulder / University of Tennesse
e\nUniversity of Michigan\nUniversity of Missouri\nCarne
gie-Mellon University

John Abowd (Cornell University and U.S Census
Bureau) presents “An Integrated Approach to Statistical Agency Modernizat
ion” at the Missouri-hosted workshop on “Workshop on Spatial and Spatio-Te
mporal Design and Analysis for Official Statistics”.

Develop tools and training modules for online
access enabling researchers to work more effectively with official restri
cted access statistical files. For the presentation\, see http://hdl.handl
e.net/1813/44705

The
workshop is organized by the Labor Dynamics Institute and Cornell NCRN no
de. Funding for the workshop is provided by the National Science Foundatio
n (CNS-1012593) and the Alfred P. Sloan Foundation.\nProceedings wer
e published as\nVilhuber\, Lars\, and Ian Schmutte. 2017. “Proceedin
gs from the 2016 NSF-Sloan Workshop on Practical Privacy.” Labor Dynamics
Institute\, Cornell University. http://digitalcommons.ilr.cornell.edu/ldi/
33/ or http://hdl.handle.net/1813/46197\n

The NCRN Meeting Fall 2016 is the opportunity
to learn about the research done within the NSF-Census Research Network.
Presentations by network researchers are open to the public.

\n

Loca
tion: U.S. Census Bureau HQ

\n

Note: Anyone who does not have a feder
al government badge will need to check in at the main gatehouse (across fr
om the metro). If you are not a US Citizen\, please register on Eventbrite
using the ‘Foreign National’ option at least 2 weeks before the conferenc
e – by close of business Wednesday\, October 19th.\nProgram

The NCRN Meeting Spring 2017 is the opportuni
ty to learn about the research done within the NSF-Census Research Network
.

\n

Location: U.S. Census Bureau HQ

\n

Note: Anyone who does no
t have a federal government badge will need to check in at the main gateho
use (across from the metro). If you are not a US Citizen\, please register
on Eventbrite using the ‘Foreign National’ option at least 2 weeks before
the conference – by close of business Wednesday\, April 5th.

\n

You
will need a security pass if you bring in a laptop – you can get those at
the desk in the main entrance.\nProgram

NCRN Coordinating Office\nCarnegie-Mellon University\nCornell
University\nDuke University / National Institute of Statistical Scie
nces (NISS)\nNorthwestern University\nUniversity of Colorado a
t Boulder / University of Tennessee\nUniversity of Michigan\nU
niversity of Missouri\nUniversity of Nebraska

The
workshop is organized by the Labor Dynamics Institute and Cornell NCRN no
de. Funding for the workshop is provided by the National Science Foundatio
n (CNS-1012593) and the Alfred P. Sloan Foundation.\nThis is a follo
w-up to the NSF–Sloan Workshop On Practical Privacy 2016.\nConferenc
e proceedings: Vilhuber\, Lars\, and Ian Schmutte. 2017. “Proceedings from
the 2017 Cornell-Census- NSF- Sloan Workshop on Practical Privacy”\, Labo
r Dynamics Institute Document 43\, http://digitalcommons.ilr.cornell.edu/l
di/43 or http://hdl.handle.net/1813/52473.

Our paper “Utility Cost of Formal Privacy for
Releasing National Employer-Employee Statistics” (Samuel Haney\, Ashwin M
achanavajjhala\, John Abowd\, Matthew Graham\, Mark Kutzbach and Lars Vilh
uber) will be presented at SIGMOD 2017.\n(link to preprint forthcomi
ng)\nThe conference: The annual ACM SIGMOD/PODS conference is a lead
ing international forum for database researchers\, practitioners\, develop
ers\, and users to explore cutting-edge ideas and results\, and to exchang
e techniques\, tools\, and experiences. The conference includes a fascinat
ing technical program with research and industrial talks\, tutorials\, dem
os\, and focused workshops. It also hosts a poster session to learn about
innovative technology\, an industrial exhibition to meet companies and pub
lishers\, and a careers-in-industry panel with representatives from leadin
g companies.\nTickets: http://sigmod2017.org/.

The class does not have a final exam. The l
ast class at Cornell is on November 30. Check with your local coordinator
about any local arrangements.

\n

\n

\n
END:VEVENT
BEGIN:VEVENT
UID:calendar.1551.field_date_with_zone.0@www.ncrn.info
DTSTAMP:20171214T021012Z
CATEGORIES:
CONTACT:
DESCRIPTION:(all times Eastern Standard Time)\nSpeaker: Amy L. Griffin\, Un
iversity of New South Wales (UNSW) Canberra \nAbstract: Recent changes to
the US Census have led to more timely updates of demographic statistics th
at are used in the delivery and planning of many social and environmental
programs. However\, this timeliness has a tradeoff: increased uncertainty
in the estimates for small area geographies such as census blocks and trac
ts. Although the Census Bureau publishes information about the uncertainty
of the estimates\, few end users engage with and utilize this information
\, perhaps because it comes in a difficult to use form\; another column in
a table with many columns. Many techniques for visualising uncertainty in
attribute data have been proposed\, but few have been empirically tested\
, and fewer still with real end users using an ecologically valid task. He
re\, we report on a broader research program directed to studying the visu
alisation of attribute uncertainty for ACS data\, and report the results o
f an experiment undertaken with 55 urban planners in which they had to mak
e spatial decisions using uncertain demographic estimates. We compared vis
ualisation methods based on two metaphors for communicating uncertainty: t
he stoplight and sketchiness. The experimental task is one taken from a co
ntext of use study we conducted on urban planning. It required planners to
define an area of contiguous census tracts that meets a particular thresh
old with respect to the attribute in question: percentage of households in
poverty. We conclude with some thoughts about how to help urban planners
work with uncertainty in ACS data more effectively. (joint work with Jason
Jurjevich\, Portland State University\, Meg Merrick\, Portland State Univ
ersity\, Seth E Spielman\, Colorado University at Boulder\, Nicholas N Nag
le\, University of Tennessee-Knoxville\, David C Folch\, Florida State Uni
versity) (archived presentation)\nLocation:\nCarnegie Mellon: contact Will
iam Eddy (bill@cmu.edu)\nCensus Bureau headquarters: Room 1\, contact Nanc
y Bates (nancy.a.bates@census.gov)\nCornell University\, Ithaca campus: Iv
es 105\, contact Lars Vilhuber (lars.vilhuber@cornell.edu)\nDuke Universit
y: contact Jerry Reiter (jerry@stat.duke.edu)\nUniversity of Michigan: Roo
m 3443 ISR-Thompson\, contact Maggie Levenstein (maggiel@umich.edu)\nUnive
rsity of Missouri: contact Scott Holan (holans@missouri.edu)\nUniversity o
f Nebraska-Lincoln: Room TBD: contact: Allan McCutcheon (amccutcheon1@unl.
edu)\nNorthwestern University: contact Zach Seeskin (z-seeskin@u.northwest
ern.edu)\nStreaming video: [click here] (link active about 5 minutes after
start of seminar)\nNodes: \nUniversity of Colorado at Boulder / Universit
y of Tennessee\nNCRN Coordinating Office \nDate: \nFeb 04\, 2015\, 3:00
pm to 4:00pm EST \nAddress: \nCanberra ACTAustralia \nVideo:\nAttach
ments: \n Presentation (PDF) \nLocation:
DTSTART;TZID=America/New_York:20150204T150000
DTEND;TZID=America/New_York:20150204T160000
SEQUENCE:0
SUMMARY:NCRN Virtual Seminar – Visualizing Attribute Uncertainty in the ACS
: An Empirical Study of Decision-Making with Urban Planners
URL:https://www.vilhuber.com/lars/event/ncrn-virtual-seminar-visualizing-at
tribute-uncertainty-in-the-acs-an-empirical-study-of-decision-making-with-
urban-planners-11/
X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\n\\n\\n

(all times Eastern Standard Time)

\n

Spe
aker: Amy L. Griffin\, University of New South Wales (UNSW) Canberra

\n

Abstract: Recent changes to the US Census have led to more timely upd
ates of demographic statistics that are used in the delivery and planning
of many social and environmental programs. However\, this timeliness has a
tradeoff: increased uncertainty in the estimates for small area geographi
es such as census blocks and tracts. Although the Census Bureau publishes
information about the uncertainty of the estimates\, few end users engage
with and utilize this information\, perhaps because it comes in a difficul
t to use form\; another column in a table with many columns. Many techniqu
es for visualising uncertainty in attribute data have been proposed\, but
few have been empirically tested\, and fewer still with real end users usi
ng an ecologically valid task. Here\, we report on a broader research prog
ram directed to studying the visualisation of attribute uncertainty for AC
S data\, and report the results of an experiment undertaken with 55 urban
planners in which they had to make spatial decisions using uncertain demog
raphic estimates. We compared visualisation methods based on two metaphors
for communicating uncertainty: the stoplight and sketchiness. The experim
ental task is one taken from a context of use study we conducted on urban
planning. It required planners to define an area of contiguous census trac
ts that meets a particular threshold with respect to the attribute in ques
tion: percentage of households in poverty. We conclude with some thoughts
about how to help urban planners work with uncertainty in ACS data more ef
fectively. (joint work with Jason Jurjevich\, Portland State University\,
Meg Merrick\, Portland State University\, Seth E Spielman\, Colorado Unive
rsity at Boulder\, Nicholas N Nagle\, University of Tennessee-Knoxville\,
David C Folch\, Florida State University) (archived presentation)

University
of Colorado at Boulder / University of Tennessee\nNCRN Coordinating
Office

\n

Date:

\n

Feb 04\, 2015\, 3:00pm to 4:00pm EST

\n

Address:

\n

Canberra ACTAustralia

\n

Video:

\n<
p>Attachments: \n

Presentation (PDF)

\n

Location:

\n
div>
X-TAGS;LANGUAGE=en-US:NCRN
END:VEVENT
BEGIN:VEVENT
UID:calendar.1552.field_date_with_zone.0@www.ncrn.info
DTSTAMP:20171214T021012Z
CATEGORIES:
CONTACT:
DESCRIPTION:Speaker: John M. Abowd (Cornell University)\nTitle: Revisiting
the Economics of Privacy: Population Statistics and Confidentiality Protec
tion as Public Goods (joint work with Ian Schmutte\, University of Georgia
)\nAbstract:\nWe consider the problem of the public release of statistical
information about a population–explicitly accounting for the public-good
properties of both data accuracy and privacy loss. We first consider the i
mplications of adding the public-good component to recently published mode
ls of private data publication under differential privacy guarantees using
a Vickery-Clark-Groves mechanism and a Lindahl mechanism. We show that da
ta quality will be inefficiently under-supplied. Next\, we develop a stand
ard social planner’s problem using the technology set implied by (ε\, δ)-d
ifferential privacy with (α\, β)-accuracy for the Private Multiplicative W
eights query release mechanism to study the properties of optimal provisio
n of data accuracy and privacy loss when both are public goods. Using the
production possibilities frontier implied by this technology\, explicitly
parameterized interdependent preferences\, and the social welfare function
\, we display properties of the solution to the social planner’s problem.
Our results directly quantify the optimal choice of data accuracy and priv
acy loss as functions of the technology and preference parameters. Some of
these properties can be quantified using population statistics on margina
l preferences and correlations between income\, data accuracy preferences\
, and privacy loss preferences that are available from survey data. Our re
sults show that government data custodians should publish more accurate st
atistics with weaker privacy guarantees than would occur with purely priva
te data publishing. Our statistical results using the General Social Surve
y and the Cornell National Social Survey indicate that the welfare losses
from under-providing data accuracy while over-providing privacy protection
can be substantial.\nLocation:\nCarnegie Mellon: contact William Eddy (bi
ll@cmu.edu)\nCensus Bureau headquarters: Room 1\, contact Nancy Bates (nan
cy.a.bates@census.gov)\nCornell University\, Ithaca campus: Ives 105\, con
tact Lars Vilhuber (lars.vilhuber@cornell.edu)\nDuke University: contact J
erry Reiter (jerry@stat.duke.edu)\nUniversity of Michigan: Room 3443 ISR-T
hompson\, contact Maggie Levenstein (maggiel@umich.edu)\nUniversity of Mis
souri: contact Scott Holan (holans@missouri.edu)\nUniversity of Nebraska-L
incoln: Room TBD: contact: Allan McCutcheon (amccutcheon1@unl.edu)\nNorthw
estern University: contact Zach Seeskin (z-seeskin@u.northwestern.edu)\nSt
reaming video: [click here] (link active about 5 minutes after start of se
minar)\nNodes: \nCornell University\nNCRN Coordinating Office \nDate:
\nMar 04\, 2015\, 3:00pm to 4:30pm EST \nAddress: \nBerkeley\, CAUnited
States \nVideo:\nLocation:
DTSTART;TZID=America/New_York:20150304T150000
DTEND;TZID=America/New_York:20150304T163000
SEQUENCE:0
SUMMARY:NCRN Virtual Seminar – Revisiting the Economics of Privacy: Populat
ion Statistics and Confidentiality Protection as Public Goods
URL:https://www.vilhuber.com/lars/event/ncrn-virtual-seminar-revisiting-the
-economics-of-privacy-population-statistics-and-confidentiality-protection
-as-public-goods-11/
X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\n\\n\\n

Speaker: John M. Abowd (Cornell University)
p>\n

Title: Revisiting the Economics of Privacy: Population Statistics a
nd Confidentiality Protection as Public Goods (joint work with Ian Schmutt
e\, University of Georgia)

\n

Abstract:

\n

We consider the probl
em of the public release of statistical information about a population–exp
licitly accounting for the public-good properties of both data accuracy an
d privacy loss. We first consider the implications of adding the public-go
od component to recently published models of private data publication unde
r differential privacy guarantees using a Vickery-Clark-Groves mechanism a
nd a Lindahl mechanism. We show that data quality will be inefficiently un
der-supplied. Next\, we develop a standard social planner’s problem using
the technology set implied by (ε\, δ)-differential privacy with (α\, β)-ac
curacy for the Private Multiplicative Weights query release mechanism to s
tudy the properties of optimal provision of data accuracy and privacy loss
when both are public goods. Using the production possibilities frontier i
mplied by this technology\, explicitly parameterized interdependent prefer
ences\, and the social welfare function\, we display properties of the sol
ution to the social planner’s problem. Our results directly quantify the o
ptimal choice of data accuracy and privacy loss as functions of the techno
logy and preference parameters. Some of these properties can be quantified
using population statistics on marginal preferences and correlations betw
een income\, data accuracy preferences\, and privacy loss preferences that
are available from survey data. Our results show that government data cus
todians should publish more accurate statistics with weaker privacy guaran
tees than would occur with purely private data publishing. Our statistical
results using the General Social Survey and the Cornell National Social S
urvey indicate that the welfare losses from under-providing data accuracy
while over-providing privacy protection can be substantial.

X-TAGS;LANGUAGE=en-US:NCRN
END:VEVENT
BEGIN:VEVENT
UID:calendar.1552.field_date.0@www.ncrn.info
DTSTAMP:20171214T021012Z
CATEGORIES;LANGUAGE=en-US:Conferences\,NCRN Virtual Seminar\,Presentation\,
vilhuber
CONTACT:http://www.ncrn.info/event/ncrn-virtual-seminar-march-4-2015
DESCRIPTION:Speaker: John M. Abowd (Cornell University)\nTitle: Revisiting
the Economics of Privacy: Population Statistics and Confidentiality Protec
tion as Public Goods (joint work with Ian Schmutte\, University of Georgia
)\nAbstract:\nWe consider the problem of the public release of statistical
information about a population–explicitly accounting for the public-good
properties of both data accuracy and privacy loss. We first consider the i
mplications of adding the public-good component to recently published mode
ls of private data publication under differential privacy guarantees using
a Vickery-Clark-Groves mechanism and a Lindahl mechanism. We show that da
ta quality will be inefficiently under-supplied. Next\, we develop a stand
ard social planner’s problem using the technology set implied by (ε\, δ)-d
ifferential privacy with (α\, β)-accuracy for the Private Multiplicative W
eights query release mechanism to study the properties of optimal provisio
n of data accuracy and privacy loss when both are public goods. Using the
production possibilities frontier implied by this technology\, explicitly
parameterized interdependent preferences\, and the social welfare function
\, we display properties of the solution to the social planner’s problem.
Our results directly quantify the optimal choice of data accuracy and priv
acy loss as functions of the technology and preference parameters. Some of
these properties can be quantified using population statistics on margina
l preferences and correlations between income\, data accuracy preferences\
, and privacy loss preferences that are available from survey data. Our re
sults show that government data custodians should publish more accurate st
atistics with weaker privacy guarantees than would occur with purely priva
te data publishing. Our statistical results using the General Social Surve
y and the Cornell National Social Survey indicate that the welfare losses
from under-providing data accuracy while over-providing privacy protection
can be substantial.\nLocation:\nCarnegie Mellon: contact William Eddy (bi
ll@cmu.edu)\nCensus Bureau headquarters: Room 1\, contact Nancy Bates (nan
cy.a.bates@census.gov)\nCornell University\, Ithaca campus: Ives 105\, con
tact Lars Vilhuber (lars.vilhuber@cornell.edu)\nDuke University: contact J
erry Reiter (jerry@stat.duke.edu)\nUniversity of Michigan: Room 3443 ISR-T
hompson\, contact Maggie Levenstein (maggiel@umich.edu)\nUniversity of Mis
souri: contact Scott Holan (holans@missouri.edu)\nUniversity of Nebraska-L
incoln: Room TBD: contact: Allan McCutcheon (amccutcheon1@unl.edu)\nNorthw
estern University: contact Zach Seeskin (z-seeskin@u.northwestern.edu)\nSt
reaming video: [click here] (link active about 5 minutes after start of se
minar)\nNodes:\nCornell University\nDate:\nMar 04\, 2015\, 3:00pm to 4:30p
m EST\nAddress:\nBerkeley\, CAUnited States\nLocation:
DTSTART;TZID=America/New_York:20150304T150000
DTEND;TZID=America/New_York:20150304T163000
SEQUENCE:0
SUMMARY:NCRN Virtual Seminar – Revisiting the Economics of Privacy: Populat
ion Statistics and Confidentiality Protection as Public Goods
URL:https://www.vilhuber.com/lars/event/ncrn-virtual-seminar-revisiting-the
-economics-of-privacy-population-statistics-and-confidentiality-protection
-as-public-goods-10/
X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\n\\n\\n

Speaker: John M. Abowd (Cornell University)\nTitle: Revisiting the Economics of Privacy: Population Statistics an
d Confidentiality Protection as Public Goods (joint work with Ian Schmutte
\, University of Georgia)\nAbstract:\nWe consider the problem
of the public release of statistical information about a population–explic
itly accounting for the public-good properties of both data accuracy and p
rivacy loss. We first consider the implications of adding the public-good
component to recently published models of private data publication under d
ifferential privacy guarantees using a Vickery-Clark-Groves mechanism and
a Lindahl mechanism. We show that data quality will be inefficiently under
-supplied. Next\, we develop a standard social planner’s problem using the
technology set implied by (ε\, δ)-differential privacy with (α\, β)-accur
acy for the Private Multiplicative Weights query release mechanism to stud
y the properties of optimal provision of data accuracy and privacy loss wh
en both are public goods. Using the production possibilities frontier impl
ied by this technology\, explicitly parameterized interdependent preferenc
es\, and the social welfare function\, we display properties of the soluti
on to the social planner’s problem. Our results directly quantify the opti
mal choice of data accuracy and privacy loss as functions of the technolog
y and preference parameters. Some of these properties can be quantified us
ing population statistics on marginal preferences and correlations between
income\, data accuracy preferences\, and privacy loss preferences that ar
e available from survey data. Our results show that government data custod
ians should publish more accurate statistics with weaker privacy guarantee
s than would occur with purely private data publishing. Our statistical re
sults using the General Social Survey and the Cornell National Social Surv
ey indicate that the welfare losses from under-providing data accuracy whi
le over-providing privacy protection can be substantial.\nLocation:<
br />\nCarnegie Mellon: contact William Eddy (bill@cmu.edu)\nCensus
Bureau headquarters: Room 1\, contact Nancy Bates (nancy.a.bates@census.go
v)\nCornell University\, Ithaca campus: Ives 105\, contact Lars Vilh
uber (lars.vilhuber@cornell.edu)\nDuke University: contact Jerry Rei
ter (jerry@stat.duke.edu)\nUniversity of Michigan: Room 3443 ISR-Tho
mpson\, contact Maggie Levenstein (maggiel@umich.edu)\nUniversity of
Missouri: contact Scott Holan (holans@missouri.edu)\nUniversity of
Nebraska-Lincoln: Room TBD: contact: Allan McCutcheon (amccutcheon1@unl.ed
u)\nNorthwestern University: contact Zach Seeskin (z-seeskin@u.north
western.edu)\nStreaming video: [click here] (link active about 5 min
utes after start of seminar)\nNodes:\nCornell University
\nDate:\nMar 04\, 2015\, 3:00pm to 4:30pm EST\nAddress:
\nBerkeley\, CAUnited States\nLocation:

\n

X-TAGS;LANGUAGE=en-US:NCRN
END:VEVENT
BEGIN:VEVENT
UID:calendar.1553.field_date_with_zone.0@www.ncrn.info
DTSTAMP:20171214T021012Z
CATEGORIES:
CONTACT:
DESCRIPTION:Speakers: Marlow Lemons (U.S. Census Bureau) and Paul Massell (
U.S. Census Bureau)\nTitle: A Method to Improve Data Swapping at the U.S.
Census Bureau (M. Lemons)\nAbstract: Data swapping is one of several discl
osure avoidance methods that the Census Bureau implements to uphold confid
entiality mandated by law. The Center for Disclosure Avoidance Research (
CDAR) is currently studying the use of n-cycle swapping as a means to prot
ect respondent identity in large-scale data. N-cycle swapping\, a variant
of data swapping\, uses permutations of size ‘n’ to swap data records rat
her than swapping them in pairs. In this talk\, we will discuss the proce
sses surrounding n-cycle swapping\, CDAR’s current studies and challenges\
, and future projects and data products involving this disclosure avoidanc
e technique. (archived presentation)\nTitle: Cell Suppression as used for
Protecting Magnitude Data Tables (P. Massell)\nAbstract: The most common d
ata products released by the Economic Directorate of the Census Bureau are
magnitude data tables. Common magnitude variables in these tables are ‘s
ales’ (aka ‘receipts’)\, and ‘number of employees’. In this method\, an ag
ency uses the p% rule for determining which cells reveal too much informat
ion about particular establishment or company value contributions to the c
ell. Such a cell is declared sensitive and is suppressed. However\, since
Census tables are typically additive\, additional cells\, called ‘secondar
y’ suppressions\, must also be suppressed in additive to make it impossibl
e for a table user to recovery the value of any sensitive cell. Using tec
hniques from operations research\, Census Bureau researchers developed met
hods for finding these secondary suppressions in a way that minimizes info
rmation loss from the table. Good software had been developed about 1990 f
or implementing cell suppression. We will discuss improvements to the meth
od that have been implemented in the current version\, such as better prot
ection at the ‘company level’\, handling of negative values\, and improved
processing of linked tables. (archived presentation)\nLocation:\nCarnegie
Mellon: contact William Eddy (bill@cmu.edu)\nCensus Bureau headquarters:
Room 1\, contact Nancy Bates (nancy.a.bates@census.gov)\nCornell Universit
y\, Ithaca campus: Ives 105\, contact Lars Vilhuber (lars.vilhuber@cornell
.edu)\nDuke University: contact Jerry Reiter (jerry@stat.duke.edu)\nUniver
sity of Michigan: Room 3443 ISR-Thompson\, contact Maggie Levenstein (magg
iel@umich.edu)\nUniversity of Missouri: contact Scott Holan (holans@missou
ri.edu)\nUniversity of Nebraska-Lincoln: Room TBD: contact: Allan McCutche
on (amccutcheon1@unl.edu)\nNorthwestern University: contact Zach Seeskin (
z-seeskin@u.northwestern.edu)\nStreaming video: [click here] (link active
about 5 minutes after start of seminar)\nNodes: \nNCRN Coordinating Office
\nDate: \nApr 01\, 2015\, 3:00pm to 4:30pm EDT \nAddress: \n4600 Si
lver Hill Rd.\nSuitland\, MDUnited States \nVideo:\nAttachments: \n A M
ethod to Improve Data Swapping at the U.S. Census Bureau (PDF)\n Cell Supp
ression as used for Protecting Magnitude Data Tables \nLocation:
DTSTART;TZID=America/New_York:20150401T150000
DTEND;TZID=America/New_York:20150401T163000
SEQUENCE:0
SUMMARY:NCRN Virtual Seminar – Center for Disclosure Avoidance Research
URL:https://www.vilhuber.com/lars/event/ncrn-virtual-seminar-center-for-dis
closure-avoidance-research-11/
X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\n\\n\\n

Title: A Method to Improve D
ata Swapping at the U.S. Census Bureau (M. Lemons)

\n

Abstract: Data
swapping is one of several disclosure avoidance methods that the Census Bu
reau implements to uphold confidentiality mandated by law. The Center for
Disclosure Avoidance Research (CDAR) is currently studying the use of n-c
ycle swapping as a means to protect respondent identity in large-scale dat
a. N-cycle swapping\, a variant of data swapping\, uses permutations of s
ize ‘n’ to swap data records rather than swapping them in pairs. In this
talk\, we will discuss the processes surrounding n-cycle swapping\, CDAR’s
current studies and challenges\, and future projects and data products in
volving this disclosure avoidance technique. (archived presentation)

Abstract: The most common data products released by the
Economic Directorate of the Census Bureau are magnitude data tables. Com
mon magnitude variables in these tables are ‘sales’ (aka ‘receipts’)\, and
‘number of employees’. In this method\, an agency uses the p% rule for de
termining which cells reveal too much information about particular establi
shment or company value contributions to the cell. Such a cell is declared
sensitive and is suppressed. However\, since Census tables are typically
additive\, additional cells\, called ‘secondary’ suppressions\, must also
be suppressed in additive to make it impossible for a table user to recove
ry the value of any sensitive cell. Using techniques from operations rese
arch\, Census Bureau researchers developed methods for finding these secon
dary suppressions in a way that minimizes information loss from the table.
Good software had been developed about 1990 for implementing cell suppres
sion. We will discuss improvements to the method that have been implemente
d in the current version\, such as better protection at the ‘company level
’\, handling of negative values\, and improved processing of linked tables
. (archived presentation)

A Method to Improve Data Swapping at the U.S. Cen
sus Bureau (PDF)\n Cell Suppression as used for Protecting Magnitude
Data Tables

\n

Location:

\n

X-TAGS;LANGUAGE=en-US:NCRN
END:VEVENT
BEGIN:VEVENT
UID:ai1ec-2658@www.ncrn.cornell.edu
DTSTAMP:20171214T021012Z
CATEGORIES;LANGUAGE=en-US:Conferences\,Presentation\,vilhuber
CONTACT:http://ecommons.library.cornell.edu/handle/1813/40172
DESCRIPTION:Ben Perry (Cornell/NCRN) presents joint work with Venkata Kambh
ampaty\, Kyle Brumsted\, Lars Vilhuber\, & William C. Block. \nAbstract: R
ecent years have shown the power of user-sourced information evidenced by
the success of Wikipedia and its many emulators. This sort of unstructured
discussion is currently not feasible as a part of the otherwise successfu
l metadata repositories. Creating and augmenting metadata is a labor-inten
sive endeavor. Harnessing collective knowledge from actual data users can
supplement officially generated metadata. As part of our Comprehensive Ext
ensible Data Documentation and Access Repository (CED2AR) infrastructure\,
we demonstrate a prototype of crowdsourced DDI\, using DDI-C and suppleme
ntal XML. The system allows for any number of network connected instances
(web or desktop deployments) of the CED2AR DDI editor to concurrently crea
te and modify metadata. The backend transparently handles changes\, and fr
ontend has the ability to separate official edits (by designated curators
of the data and the metadata) from crowd-sourced content. We briefly discu
ss offline edit contributions as well. CED2AR uses DDI-C and supplemental
XML together with Git for a very portable and lightweight implementation.
This distributed network implementation allows for large scale metadata cu
ration without the need for a hardware intensive computing environment\, a
nd can leverage existing cloud services\, such as Github or Bitbucket.
DTSTART;TZID=America/New_York:20150410T111500
DTEND;TZID=America/New_York:20150410T114500
GEO:+43.076145;-89.397711
LOCATION:Pyle Center @ University of Wisconsin-Madison\, 702 Langdon Street
\, Madison\, WI 53706\, USA
SEQUENCE:0
SUMMARY:Presentation @ NADDI 2015: Crowdsourcing DDI Development: New Featu
res from the CED2AR Project
URL:https://www.vilhuber.com/lars/event/presentation-naddi-2015-crowdsourci
ng-ddi-development-new-features-from-the-ced2ar-project/
X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\n\\n\\n

Ben Perry (Cornell/NCRN) presents joint work
with Venkata Kambhampaty\, Kyle Brumsted\, Lars Vilhuber\, & William C. Bl
ock. \nAbstract: Recent years have shown the power of user-sourced i
nformation evidenced by the success of Wikipedia and its many emulators. T
his sort of unstructured discussion is currently not feasible as a part of
the otherwise successful metadata repositories. Creating and augmenting m
etadata is a labor-intensive endeavor. Harnessing collective knowledge fro
m actual data users can supplement officially generated metadata. As part
of our Comprehensive Extensible Data Documentation and Access Repository (
CED2AR) infrastructure\, we demonstrate a prototype of crowdsourced DDI\,
using DDI-C and supplemental XML. The system allows for any number of netw
ork connected instances (web or desktop deployments) of the CED2AR DDI edi
tor to concurrently create and modify metadata. The backend transparently
handles changes\, and frontend has the ability to separate official edits
(by designated curators of the data and the metadata) from crowd-sourced c
ontent. We briefly discuss offline edit contributions as well. CED2AR uses
DDI-C and supplemental XML together with Git for a very portable and ligh
tweight implementation. This distributed network implementation allows for
large scale metadata curation without the need for a hardware intensive c
omputing environment\, and can leverage existing cloud services\, such as
Github or Bitbucket.

\n

X-TAGS;LANGUAGE=en-US:CED2AR\,DDI\,NADDI\,NCRN
END:VEVENT
BEGIN:VEVENT
UID:ai1ec-2659@www.ncrn.cornell.edu
DTSTAMP:20171214T021012Z
CATEGORIES;LANGUAGE=en-US:Conferences\,Presentation\,vilhuber
CONTACT:http://www.ssc.wisc.edu/naddi2015/abstracts.html#ctd
DESCRIPTION:Michelle Edwards (Cornell/CISER) presents on using DDI and CED²
AR to “connect the dots”.\nAbstract: The Cornell Institute for Social and
Economic Research (CISER) data archive has been actively accepting Cornell
social science and economic research data since 1981. Holdings range from
US Census to New York centric studies to International demographic studie
s and many\, many more. Researchers currently search the archive using a b
asic search across a limited number of Study level and File level descript
or tags. To enhance discoverability\, CED2AR will be implemented to add Va
riable level and enhanced Study level metadata. CED2AR uses DDI 2.5 metada
ta standards for documenting the holdings\, along with schema.org for micr
odata markup to allow search engines to parse the semantic information fro
m the DDI metadata. New data deposits? Researcher data or new archive coll
ections will be added using an online data deposit form to create Study le
vel and File level metadata and provide upload capabilities for the data a
nd program files. An API will be used to pass metadata gathered from the d
ata deposit form to both the current archive structure as well as the CED2
AR database\, ensuring the integrity of both systems. Three processes: an
online data deposit form\, the archive holdings\, and CED2AR\, all linked
through DDI 2.5 will create a new workflow for the CISER data archive. By
connecting the dots with DDI\, we will enhance discoverability and usabili
ty of the CISER data holdings.
DTSTART;TZID=America/New_York:20150410T140000
DTEND;TZID=America/New_York:20150410T143000
GEO:+43.076287;-89.397794
LOCATION:Pyle Center @ The Pyle Center\, University of Wisconsin-Madison\,
702 Langdon Street\, Madison\, WI 53706\, USA
SEQUENCE:0
SUMMARY:Presentation @ NADDI 2015: Connecting the Dots with DDI (and CED²AR
)
URL:https://www.vilhuber.com/lars/event/presentation-naddi-2015-connecting-
the-dots-with-ddi-and-ced%c2%b2ar/
X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\n\\n\\n

Michelle Edwards (Cornell/CISER) presents on
using DDI and CED²AR to “connect the dots”.\nAbstract: The Cornell I
nstitute for Social and Economic Research (CISER) data archive has been ac
tively accepting Cornell social science and economic research data since 1
981. Holdings range from US Census to New York centric studies to Internat
ional demographic studies and many\, many more. Researchers currently sear
ch the archive using a basic search across a limited number of Study level
and File level descriptor tags. To enhance discoverability\, CED2AR will
be implemented to add Variable level and enhanced Study level metadata. CE
D2AR uses DDI 2.5 metadata standards for documenting the holdings\, along
with schema.org for microdata markup to allow search engines to parse the
semantic information from the DDI metadata. New data deposits? Researcher
data or new archive collections will be added using an online data deposit
form to create Study level and File level metadata and provide upload cap
abilities for the data and program files. An API will be used to pass meta
data gathered from the data deposit form to both the current archive struc
ture as well as the CED2AR database\, ensuring the integrity of both syste
ms. Three processes: an online data deposit form\, the archive holdings\,
and CED2AR\, all linked through DDI 2.5 will create a new workflow for the
CISER data archive. By connecting the dots with DDI\, we will enhance dis
coverability and usability of the CISER data holdings.

\n

<
/HTML>
X-TAGS;LANGUAGE=en-US:NADDI\,NCRN
END:VEVENT
BEGIN:VEVENT
UID:ai1ec-2880@www.ncrn.cornell.edu
DTSTAMP:20171214T021012Z
CATEGORIES;LANGUAGE=en-US:Conferences\,Presentation\,vilhuber
CONTACT:http://iassist2015.pop.umn.edu/program/posters#p23
DESCRIPTION:Poster Abstract: The Comprehensive Extensible Data Documentatio
n and Access Repository (CED2AR)\, is an online repository for metadata on
surveys\, administrative microdata\, and other statistical information. C
ED2AR runs directly from DDI 2.5 through a single\, non-relational databas
e. While the DDI schema is well developed for documentation purposes\, it
is not ideal for semantic web applications. Using the schema.org microdata
markup\, CED2AR allows search engines to parse semantic information from
DDI. The solution further enhances the discoverability of DDI metadata\, a
s the data are machine readable to several providers such as Google\, Yaho
o and Bing. The schema.org markup is not directly embedded within the DDI\
, so it doesn’t directly export when a user downloads a codebook. However\
, CED2AR can also run as a zero install desktop application. Users can sim
ply download their own copy of CED2AR\, quickly import codebooks\, and ins
tantly see the schema.org enhancements the system offers. The only prerequ
isites for the software is Java version 7\, and an internet browser. This
presentation will demonstrate the advantages schema.org adds to DDI\, and
the ease of deployment CED2AR allows
DTSTART;TZID=America/New_York:20150603T171500
DTEND;TZID=America/New_York:20150603T184500
GEO:+44.973081;-93.244254
LOCATION:University of Minnepapolis @ Willey Hall\, 229 19th Ave S\, Minnea
polis\, MN 55454\, USA
SEQUENCE:0
SUMMARY:IASSIST2015 Poster Presentation
URL:https://www.vilhuber.com/lars/event/iassist2015-poster-presentation/
X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\n\\n\\n

Poster Abstract: The Comprehensive Extensible
Data Documentation and Access Repository (CED2AR)\, is an online reposito
ry for metadata on surveys\, administrative microdata\, and other statisti
cal information. CED2AR runs directly from DDI 2.5 through a single\, non-
relational database. While the DDI schema is well developed for documentat
ion purposes\, it is not ideal for semantic web applications. Using the sc
hema.org microdata markup\, CED2AR allows search engines to parse semantic
information from DDI. The solution further enhances the discoverability o
f DDI metadata\, as the data are machine readable to several providers suc
h as Google\, Yahoo and Bing. The schema.org markup is not directly embedd
ed within the DDI\, so it doesn’t directly export when a user downloads a
codebook. However\, CED2AR can also run as a zero install desktop applicat
ion. Users can simply download their own copy of CED2AR\, quickly import c
odebooks\, and instantly see the schema.org enhancements the system offers
. The only prerequisites for the software is Java version 7\, and an inter
net browser. This presentation will demonstrate the advantages schema.org
adds to DDI\, and the ease of deployment CED2AR allows

\n

<
/HTML>
X-TAGS;LANGUAGE=en-US:NCRN
END:VEVENT
BEGIN:VEVENT
UID:ai1ec-2790@www.ncrn.cornell.edu
DTSTAMP:20171214T021012Z
CATEGORIES;LANGUAGE=en-US:Conferences\,Presentation\,vilhuber
CONTACT:http://www.amstat.org/meetings/JSM/2015/onlineprogram/AbstractDetai
ls.cfm?abstractid=315820
DESCRIPTION:“Synthetic Longitudinal Business Databases for International Co
mparisons” — Joerg Drechsler\, Institute for Employment Research \; Lars V
ilhuber\, Cornell University\nInternational comparison studies on economic
activity are often hampered by the fact that access to business microdata
is very limited on an international level. A recently launched project tr
ies to overcome these limitations by improving access to Business Censuses
from multiple countries based on synthetic data. Starting from the synthe
tic version of the longitudinally edited version of the U.S. Business Regi
ster (the Longitudinal Business Database\, LBD)\, the idea is to create si
milar data products in other countries by applying the synthesis methodolo
gy developed for the LBD to generate synthetic replicates that could be di
stributed without confidentiality concerns. In this paper we present some
first results of this project based on German business data collected at t
he Institute for Employment Research.\nhttp://www.amstat.org/meetings/JSM/
2015/onlineprogram/AbstractDetails.cfm?abstractid=315820
DTSTART;TZID=America/New_York:20150811T140000
DTEND;TZID=America/New_York:20150811T155000
GEO:+47.611389;-122.33168
LOCATION:Joint Statistical Meetings (JSM) 2015 @ 800 Convention Pl\, Seattl
e\, WA 98101\, USA
SEQUENCE:0
SUMMARY:JSM 2015: Synthetic Longitudinal Business Databases for Internation
al Comparisons
URL:https://www.vilhuber.com/lars/event/jsm-2015-synthetic-longitudinal-bus
iness-databases-for-international-comparisons/
X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\n\\n\\n

“Synthetic Longitudinal Business Databases fo
r International Comparisons” — Joerg Drechsler\, Institute for Employment
Research \; Lars Vilhuber\, Cornell University\nInternational compar
ison studies on economic activity are often hampered by the fact that acce
ss to business microdata is very limited on an international level. A rece
ntly launched project tries to overcome these limitations by improving acc
ess to Business Censuses from multiple countries based on synthetic data.
Starting from the synthetic version of the longitudinally edited version o
f the U.S. Business Register (the Longitudinal Business Database\, LBD)\,
the idea is to create similar data products in other countries by applying
the synthesis methodology developed for the LBD to generate synthetic rep
licates that could be distributed without confidentiality concerns. In thi
s paper we present some first results of this project based on German busi
ness data collected at the Institute for Employment Research.\nhttp:
//www.amstat.org/meetings/JSM/2015/onlineprogram/AbstractDetails.cfm?abstr
actid=315820

“Assessing the Data Quality of Public Use Tab
ulations Produced from Synthetic Data: Synthetic Business Dynamics Statist
ics“\, Lars Vilhuber\, Cornell University\; Javier Miranda\, U.S. Census B
ureau\nDiscussant: John Abowd\, Cornell University\nWe describ
e and analyze a method that blends records from both observed and syntheti
c microdata into public-use tabulations on establishment statistics. The r
esulting tables use synthetic data only in potentially sensitive cells. We
describe different algorithms\, and present preliminary results when appl
ied to the Census Bureau’s Business Dynamics Statistics and Synthetic Long
itudinal Business Database\, highlighting accuracy and protection afforded
by the method when compared to existing public-use tabulations (with supp
ressions).\nhttp://www.amstat.org/meetings/jsm/2015/onlineprogram/Ab
stractDetails.cfm?abstractid=316288

\n

X-TAGS;LANGUAGE=en-US:ASA\,Joint Statistical Meetings\,JSM\,NCRN\,Seattle\,
SynLBD
END:VEVENT
BEGIN:VEVENT
UID:calendar.1870.field_date_with_zone.0@www.ncrn.info
DTSTAMP:20171214T021012Z
CATEGORIES:
CONTACT:
DESCRIPTION:Due to a schedule conflict\, the seminar has been cancelled. \n
Speaker: Bimal Sinha (University of Maryland\, Baltimore County)\nTitle: N
oise Multiplication for Statistical Disclosure Control of Extreme Values i
n Log-normal Regression Samples \nAbstract: Noise Multiplication for Stati
stical Disclosure Control of Extreme Values in Log-normal Regression Sampl
es (Bimal Sinha) In this article multiplication of original data values by
random noise is suggested as a disclosure control strategy when only the
top part of the data is sensitive\, as is often the case with income data.
The proposed method can serve as an alternative to top coding which is a
standard method in this context. Because the log-normal distribution usual
ly fits income data well\, the present investigation focuses exclusively o
n the log-normal. It is assumed that the log-scale mean of the sensitive v
ariable is described by a linear regression on a set of non-sensitive cova
riates\, and we show how a data user can draw valid inference on the param
eters of the regression. An appealing feature of noise multiplication is t
he presence of an explicit tuning mechanism\, namely\, the noise generatin
g distribution. By appropriately choosing this distribution\, one can cont
rol the accuracy of inferences and the level of disclosure protection desi
red in the released data. Usually\, more information is retained on the to
p part of the data under noise multiplication than under top coding. Likel
ihood based analysis is developed when only the large values in the data s
et are noise multiplied\, under the assumption that the original data form
a sample from a log-normal distribution. In this scenario\, data analysis
methods are developed under two types of data releases: (I) each released
value includes an indicator of whether or not it has been noise multiplie
d\, and (II) no such indicator is provided. A simulation study is carried
out to assess the accuracy of inference for some parameters of interest. S
ince top coding and synthetic data methods are already available as disclo
sure control strategies for extreme values\, some comparisons with the pro
posed method are made through a simulation study. The results are illustra
ted with a data analysis example based on 2000 U.S. Current Population Sur
vey data. Furthermore\, a disclosure risk evaluation of the proposed metho
dology is presented in the context of the Current Population Survey data e
xample\, and the disclosure risk of the proposed noise multiplication meth
od is compared with the disclosure risk of synthetic data.\nLocation:\nCar
negie Mellon: contact William Eddy (bill@cmu.edu)\nCensus Bureau headquart
ers: Room T5\, contact Nancy Bates (nancy.a.bates@census.gov)\nCornell Uni
versity\, Ithaca campus: Ives 105\, contact Lars Vilhuber (lars.vilhuber@c
ornell.edu)\nDuke University: contact Jerry Reiter (jerry@stat.duke.edu)\n
University of Michigan: Room 3443 ISR-Thompson\, contact Maggie Levenstein
(maggiel@umich.edu)\nUniversity of Missouri: contact Scott Holan (holans@
missouri.edu)\nUniversity of Nebraska-Lincoln: Room TBD: contact: Kristen
Olson (kolson5@unl.edu)\nNorthwestern University: contact Zach Seeskin (z-
seeskin@u.northwestern.edu)\nStreaming video: [click here] (link active ab
out 5 minutes after start of seminar)\nDate: \nSep 02\, 2015\, 3:00pm to 4
:30pm EDT \nAddress: \nUniversity of Maryland\, Baltimore County\n1000
Hilltop Circle\nBaltimore\, MD 21250United States \nLocation:
DTSTART;TZID=America/New_York:20150902T150000
DTEND;TZID=America/New_York:20150902T163000
SEQUENCE:0
SUMMARY:NCRN Virtual Seminar – CANCELLED – Noise Multiplication for Statist
ical Disclosure Control of Extreme Values in Log-normal Regression Samples
URL:https://www.vilhuber.com/lars/event/ncrn-virtual-seminar-cancelled-nois
e-multiplication-for-statistical-disclosure-control-of-extreme-values-in-l
og-normal-regression-samples-11/
X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\n\\n\\n

A
bstract: Noise Multiplication for Statistical Disclosure Control of Extrem
e Values in Log-normal Regression Samples (Bimal Sinha) In this article mu
ltiplication of original data values by random noise is suggested as a dis
closure control strategy when only the top part of the data is sensitive\,
as is often the case with income data. The proposed method can serve as a
n alternative to top coding which is a standard method in this context. Be
cause the log-normal distribution usually fits income data well\, the pres
ent investigation focuses exclusively on the log-normal. It is assumed tha
t the log-scale mean of the sensitive variable is described by a linear re
gression on a set of non-sensitive covariates\, and we show how a data use
r can draw valid inference on the parameters of the regression. An appeali
ng feature of noise multiplication is the presence of an explicit tuning m
echanism\, namely\, the noise generating distribution. By appropriately ch
oosing this distribution\, one can control the accuracy of inferences and
the level of disclosure protection desired in the released data. Usually\,
more information is retained on the top part of the data under noise mult
iplication than under top coding. Likelihood based analysis is developed w
hen only the large values in the data set are noise multiplied\, under the
assumption that the original data form a sample from a log-normal distrib
ution. In this scenario\, data analysis methods are developed under two ty
pes of data releases: (I) each released value includes an indicator of whe
ther or not it has been noise multiplied\, and (II) no such indicator is p
rovided. A simulation study is carried out to assess the accuracy of infer
ence for some parameters of interest. Since top coding and synthetic data
methods are already available as disclosure control strategies for extreme
values\, some comparisons with the proposed method are made through a sim
ulation study. The results are illustrated with a data analysis example ba
sed on 2000 U.S. Current Population Survey data. Furthermore\, a disclosur
e risk evaluation of the proposed methodology is presented in the context
of the Current Population Survey data example\, and the disclosure risk of
the proposed noise multiplication method is compared with the disclosure
risk of synthetic data.

X-TAGS;LANGUAGE=en-US:NCRN\,RDC
END:VEVENT
BEGIN:VEVENT
UID:ai1ec-2849@www.ncrn.cornell.edu
DTSTAMP:20171214T021012Z
CATEGORIES;LANGUAGE=en-US:Conferences\,Presentation\,vilhuber
CONTACT:https://sites.stanford.edu/researchdatacenter/about-conference
DESCRIPTION:“Total Variability Measures for Selected Quarterly Workforce In
dicators and LEHD Origin Destination Employment Statistics in OnTheMap”\,
Andrew Green (Cornell University)\, Kevin McKinney (U.S. Census Bureau)\,
Lars Vilhuber (Cornell University)\, John Abowd (Cornell University)\nAbst
ract\nWe report results from the first comprehensive total quality evaluat
ion of three major indicators in the U.S. Census Bureau’s Longitudinal Emp
loyer-Household Dynamics (LEHD) Program Quarterly Workforce Indicators (QW
I): beginning-of-quarter employment\, full-quarter employment\, and averag
e monthly earnings of full-quarter employees. Beginning-of-quarter employm
ent is also the main tabulation variable in the LEHD Origin-Destination Em
ployment Statistics workplace reports as displayed in OnTheMap (OTM). The
evaluation is conducted using the multiple threads generated by the edit a
nd imputation models used in the LEHD Infrastructure File System. These th
reads conform to the Rubin (1987) multiple imputation model. Each implicat
e is the output of formal probability models that address coverage\, edit
and imputation errors. Design-based sampling variability and finite popula
tion corrections are also included in the evaluation. We derive special fo
rmulas for the Rubin total variability and its components that are consist
ent with the disclosure avoidance system used for QWI and LODES/OTM workpl
ace reports. These formulas allow us to publish the complete set of detail
ed total quality measures for QWI and LODES. The analysis reveals that the
three publication variables under study are estimated very accurately for
tabulations involving at least 10 jobs. Tabulations involving three to ni
ne jobs have acceptable quality. Tabulations involving one or two jobs\, w
hich are generally suppressed in the QWI\, have substantial total variabil
ity but their publication in LODES allows the formation of larger custom a
ggregations\, which will in general have the accuracy estimated for tabula
tions in the QWI of similar magnitude.
DTSTART;TZID=America/New_York:20150918T133000
DTEND;TZID=America/New_York:20150918T141500
GEO:+37.432005;-122.175774
LOCATION:Stanford University @ 291 Campus Drive\, Stanford\, CA 94305\, USA
SEQUENCE:0
SUMMARY:RDC 2015: “Total Variability Measures for Selected Quarterly Workfo
rce Indicators and LEHD Origin Destination Employment Statistics in OnTheM
ap”
URL:https://www.vilhuber.com/lars/event/rdc-2015-total-variability-measures
-for-selected-quarterly-workforce-indicators-and-lehd-origin-destination-e
mployment-statistics-in-onthemap/
X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\n\\n\\n

We report results from the first comp
rehensive total quality evaluation of three major indicators in the U.S. C
ensus Bureau’s Longitudinal Employer-Household Dynamics (LEHD) Program Qua
rterly Workforce Indicators (QWI): beginning-of-quarter employment\, full-
quarter employment\, and average monthly earnings of full-quarter employee
s. Beginning-of-quarter employment is also the main tabulation variable in
the LEHD Origin-Destination Employment Statistics workplace reports as di
splayed in OnTheMap (OTM). The evaluation is conducted using the multiple
threads generated by the edit and imputation models used in the LEHD Infra
structure File System. These threads conform to the Rubin (1987) multiple
imputation model. Each implicate is the output of formal probability model
s that address coverage\, edit and imputation errors. Design-based samplin
g variability and finite population corrections are also included in the e
valuation. We derive special formulas for the Rubin total variability and
its components that are consistent with the disclosure avoidance system us
ed for QWI and LODES/OTM workplace reports. These formulas allow us to pub
lish the complete set of detailed total quality measures for QWI and LODES
. The analysis reveals that the three publication variables under study ar
e estimated very accurately for tabulations involving at least 10 jobs. Ta
bulations involving three to nine jobs have acceptable quality. Tabulation
s involving one or two jobs\, which are generally suppressed in the QWI\,
have substantial total variability but their publication in LODES allows t
he formation of larger custom aggregations\, which will in general have th
e accuracy estimated for tabulations in the QWI of similar magnitude.

\n

X-TAGS;LANGUAGE=en-US:NCRN\,RDC
END:VEVENT
BEGIN:VEVENT
UID:calendar.2063.field_date_with_zone.0@www.ncrn.info
DTSTAMP:20171214T021012Z
CATEGORIES:
CONTACT:
DESCRIPTION:Speaker: Maria De Yoreo (Duke University)\nTitle: Incorporating
Conditionally Representative Auxiliary Information in Data Fusion\nAbstra
ct: In data fusion analysts seek to combine information from two databases
comprised of disjoint sets of individuals\, in which some variables appea
r in both databases and other variables appear in only one database. Most
data fusion techniques rely on variants of conditional independence assump
tions\, which can lead to unreliable inferences if this assumption is not
satisfied. We propose a data fusion technique that allows analysts to easi
ly incorporate auxiliary information (glue) on the dependence structure of
variables not observed jointly. Using simulations\, we illustrate the ben
efits of leveraging the information in glue. We also perform a data fusion
experiment with the goal to fuse two surveys from the book publisher Harp
erCollins\, using glue obtained from the Internet polling company CivicSci
ence. Due to the convenience sampling nature of the auxiliary online surve
y\, we find that the glue is not representative of the population sampled
by HarperCollins. This is a scenario very likely to be encountered in prac
tice\, and points to the more general problem of combining information fro
m multiple data sources that are not all probability samples of the same p
opulation. We discuss current work in this direction. (archived presentati
on)\nPaper: http://arxiv.org/abs/1506.05886. \nLocation:\nCarnegie Mellon
: contact William Eddy (bill@cmu.edu)\nCensus Bureau headquarters: Confere
nce Room 1\, contact Nancy Bates (nancy.a.bates@census.gov)\nCornell Unive
rsity\, Ithaca campus: Ives 105\, contact Lars Vilhuber (lars.vilhuber@cor
nell.edu)\nDuke University: contact Jerry Reiter (jerry@stat.duke.edu)\nUn
iversity of Michigan: Room 3443 ISR-Thompson\, contact Maggie Levenstein (
maggiel@umich.edu)\nUniversity of Missouri: contact Scott Holan (holans@mi
ssouri.edu)\nUniversity of Nebraska-Lincoln: Room TBD: contact: Kristen Ol
son (kolson5@unl.edu)\nNorthwestern University: contact Zach Seeskin (z-se
eskin@u.northwestern.edu)\nStreaming video: [click here] (link active abou
t 5 minutes after start of seminar)\nNodes: \nDuke University / National I
nstitute of Statistical Sciences (NISS) \nDate: \nOct 07\, 2015\, 3:00p
m to 4:30pm EDT \nAddress: \nDurham\, NC 27708United States \nVideo:
\nAttachments: \n Presentation \nLocation:
DTSTART;TZID=America/New_York:20151007T150000
DTEND;TZID=America/New_York:20151007T163000
SEQUENCE:0
SUMMARY:NCRN Virtual Seminar – Incorporating Conditionally Representative A
uxiliary Information in Data Fusion
URL:https://www.vilhuber.com/lars/event/ncrn-virtual-seminar-incorporating-
conditionally-representative-auxiliary-information-in-data-fusion-11/
X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\n\\n\\n

Abstract: In data fusion analysts seek to combin
e information from two databases comprised of disjoint sets of individuals
\, in which some variables appear in both databases and other variables ap
pear in only one database. Most data fusion techniques rely on variants of
conditional independence assumptions\, which can lead to unreliable infer
ences if this assumption is not satisfied. We propose a data fusion techni
que that allows analysts to easily incorporate auxiliary information (glue
) on the dependence structure of variables not observed jointly. Using sim
ulations\, we illustrate the benefits of leveraging the information in glu
e. We also perform a data fusion experiment with the goal to fuse two surv
eys from the book publisher HarperCollins\, using glue obtained from the I
nternet polling company CivicScience. Due to the convenience sampling natu
re of the auxiliary online survey\, we find that the glue is not represent
ative of the population sampled by HarperCollins. This is a scenario very
likely to be encountered in practice\, and points to the more general prob
lem of combining information from multiple data sources that are not all p
robability samples of the same population. We discuss current work in this
direction. (archived presentation)

X-TAGS;LANGUAGE=en-US:NCRN
END:VEVENT
BEGIN:VEVENT
UID:ai1ec-2852@www.ncrn.cornell.edu
DTSTAMP:20171214T021012Z
CATEGORIES;LANGUAGE=en-US:Conferences\,Presentation\,vilhuber
CONTACT:http://economics.cornell.edu/seminars/joint-microeconomics-and-comp
uter-science-workshop-john-abowd
DESCRIPTION:“Revisiting the Economics of Privacy: Population Statistics and
Confidentiality Protection as Public Good”\, John Abowd (Cornell Universi
ty and U.S. Census Bureau)\, Ian Schmutte (University of Georgia)\nAbstrac
t\nWe consider the problem of the public release of statistical informatio
n about a population–explicitly accounting for the public-good properties
of both data accuracy and privacy loss. We first consider the implications
of adding the public-good component to recently published models of priva
te data publication under differential privacy guarantees using a Vickery-
Clark-Groves mechanism and a Lindahl mechanism. We show that data quality
will be inefficiently under-supplied. Next\, we develop a standard social
planner’s problem using the technology set implied by (ε\, δ)-differential
privacy with (α\, β)-accuracy for the Private Multiplicative Weights quer
y release mechanism to study the properties of optimal provision of data a
ccuracy and privacy loss when both are public goods. Using the production
possibilities frontier implied by this technology\, explicitly parameteriz
ed interdependent preferences\, and the social welfare function\, we displ
ay properties of the solution to the social planner’s problem. Our results
directly quantify the optimal choice of data accuracy and privacy loss as
functions of the technology and preference parameters. Some of these prop
erties can be quantified using population statistics on marginal preferenc
es and correlations between income\, data accuracy preferences\, and priva
cy loss preferences that are available from survey data. Our results show
that government data custodians should publish more accurate statistics wi
th weaker privacy guarantees than would occur with purely private data pub
lishing. Our statistical results using the General Social Survey and the C
ornell National Social Survey indicate that the welfare losses from under-
providing data accuracy while over-providing privacy protection can be sub
stantial.
DTSTART;TZID=America/New_York:20151019T161500
GEO:+42.447255;-76.48225
LOCATION:Cornell University @ Uris Hall\, Ithaca\, NY 14853\, USA
SEQUENCE:0
SUMMARY:Abowd @ Cornell Microeconomic Theory and Computer Science Workshop:
“Revisiting the Economics of Privacy: Population Statistics and Confident
iality Protection as Public Good”
URL:https://www.vilhuber.com/lars/event/abowd-cornell-microeconomic-theory-
and-computer-science-workshop-revisiting-the-economics-of-privacy-populati
on-statistics-and-confidentiality-protection-as-public-good/
X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\n\\n\\n

“Revisiting the Economics of Privacy: Populat
ion Statistics and Confidentiality Protection as Public Good”\, John Abowd
(Cornell University and U.S. Census Bureau)\, Ian Schmutte (University of
Georgia)\nAbstract\nWe consider the problem of the public rel
ease of statistical information about a population–explicitly accounting f
or the public-good properties of both data accuracy and privacy loss. We f
irst consider the implications of adding the public-good component to rece
ntly published models of private data publication under differential priva
cy guarantees using a Vickery-Clark-Groves mechanism and a Lindahl mechani
sm. We show that data quality will be inefficiently under-supplied. Next\,
we develop a standard social planner’s problem using the technology set i
mplied by (ε\, δ)-differential privacy with (α\, β)-accuracy for the Priva
te Multiplicative Weights query release mechanism to study the properties
of optimal provision of data accuracy and privacy loss when both are publi
c goods. Using the production possibilities frontier implied by this techn
ology\, explicitly parameterized interdependent preferences\, and the soci
al welfare function\, we display properties of the solution to the social
planner’s problem. Our results directly quantify the optimal choice of dat
a accuracy and privacy loss as functions of the technology and preference
parameters. Some of these properties can be quantified using population st
atistics on marginal preferences and correlations between income\, data ac
curacy preferences\, and privacy loss preferences that are available from
survey data. Our results show that government data custodians should publi
sh more accurate statistics with weaker privacy guarantees than would occu
r with purely private data publishing. Our statistical results using the G
eneral Social Survey and the Cornell National Social Survey indicate that
the welfare losses from under-providing data accuracy while over-providing
privacy protection can be substantial.

\n

X-TAGS;LANGUAGE=en-US:NCRN
X-INSTANT-EVENT:1
END:VEVENT
BEGIN:VEVENT
UID:ai1ec-2878@www.ncrn.cornell.edu
DTSTAMP:20171214T021012Z
CATEGORIES;LANGUAGE=en-US:Conferences\,Presentation\,vilhuber
CONTACT:John M. Abowd\; http://www.economics.cornell.edu/seminars/joint-mic
roeconomics-and-computer-science-workshop-john-abowd
DESCRIPTION:Joint Microeconomics & Computer Science Workshop: John M. Abowd
\n \nAbstract: We consider the problem of the public release of statistica
l information about a population–explicitly accounting for the public-good
properties of both data accuracy and privacy loss. We first consider the
implications of adding the public-good component to recently published mod
els of private data publication under differential privacy guarantees usin
g a Vickery-Clark-Groves mechanism and a Lindahl mechanism. We show that d
ata quality will be inefficiently under-supplied. Next\, we develop a stan
dard social planner’s problem using the technology set implied by (ε\, δ)-
differential privacy with (α\, β)-accuracy for the Private Multiplicative
Weights query release mechanism to study the properties of optimal provisi
on of data accuracy and privacy loss when both are public goods. Using the
production possibilities frontier implied by this technology\, explicitly
parameterized interdependent preferences\, and the social welfare functio
n\, we display properties of the solution to the social planner’s problem.
Our results directly quantify the optimal choice of data accuracy and pri
vacy loss as functions of the technology and preference parameters. Some o
f these properties can be quantified using population statistics on margin
al preferences and correlations between income\, data accuracy preferences
\, and privacy loss preferences that are available from survey data. Our r
esults show that government data custodians should publish more accurate s
tatistics with weaker privacy guarantees than would occur with purely priv
ate data publishing. Our statistical results using the General Social Surv
ey and the Cornell National Social Survey indicate that the welfare losses
from under-providing data accuracy while over-providing privacy protectio
n can be substantial.\n \nPaper: https://ecommons.cornell.edu/handle/1813/
40581
DTSTART;TZID=America/New_York:20151019T161500
DTEND;TZID=America/New_York:20151019T174500
GEO:+42.447255;-76.48225
LOCATION:498 Uris Hall @ Uris Hall\, Ithaca\, NY 14853\, USA
SEQUENCE:0
SUMMARY:Abowd presents Revisiting the Economics of Privacy: Population Sta
tistics and Confidentiality Protection as Public Goods
URL:https://www.vilhuber.com/lars/event/abowd-presents-revisiting-the-econo
mics-of-privacy-population-statistics-and-confidentiality-protection-as-pu
blic-goods/
X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\n\\n\\n

Joint Microeconomics & Computer Science Works
hop: John M. Abowd\n \nAbstract: We consider the problem of th
e public release of statistical information about a population–explicitly
accounting for the public-good properties of both data accuracy and privac
y loss. We first consider the implications of adding the public-good compo
nent to recently published models of private data publication under differ
ential privacy guarantees using a Vickery-Clark-Groves mechanism and a Lin
dahl mechanism. We show that data quality will be inefficiently under-supp
lied. Next\, we develop a standard social planner’s problem using the tech
nology set implied by (ε\, δ)-differential privacy with (α\, β)-accuracy f
or the Private Multiplicative Weights query release mechanism to study the
properties of optimal provision of data accuracy and privacy loss when bo
th are public goods. Using the production possibilities frontier implied b
y this technology\, explicitly parameterized interdependent preferences\,
and the social welfare function\, we display properties of the solution to
the social planner’s problem. Our results directly quantify the optimal c
hoice of data accuracy and privacy loss as functions of the technology and
preference parameters. Some of these properties can be quantified using p
opulation statistics on marginal preferences and correlations between inco
me\, data accuracy preferences\, and privacy loss preferences that are ava
ilable from survey data. Our results show that government data custodians
should publish more accurate statistics with weaker privacy guarantees tha
n would occur with purely private data publishing. Our statistical results
using the General Social Survey and the Cornell National Social Survey in
dicate that the welfare losses from under-providing data accuracy while ov
er-providing privacy protection can be substantial.\n \nPaper:
https://ecommons.cornell.edu/handle/1813/40581

\n

X-TAGS;LANGUAGE=en-US:NCRN
END:VEVENT
BEGIN:VEVENT
UID:ai1ec-2829@www.ncrn.cornell.edu
DTSTAMP:20171214T021012Z
CATEGORIES;LANGUAGE=en-US:Conferences\,Presentation\,vilhuber
CONTACT:http://caed2015.sabanciuniv.edu
DESCRIPTION:“Usage and outcomes of the Synthetic Data Server\,” Lars Vilhub
er (NCRN\, Cornell University) and John Abowd (NCRN\, Cornell University)
\nThe Synthetic Data Server (SDS) at Cornell University was set up to prov
ide early access to new synthetic data products by the U.S. Census Bureau.
These datasets are made available to interested researchers in a controll
ed environment\, prior to a more generalized release. Over the past 5 year
s\, 4 synthetic datasets were made available on the server\, and over 100
users have accessed the server over that time period. This paper reports o
n interim outcomes of the activity: results of validation requests from a
user perspective\, functioning of the feedback loop due to validation and
user input\, and the role of the SDS as a access gateway to and educationa
l tool for other mechanisms of accessing detailed person\, household\, est
ablishment\, and firm statistics.\nTickets: http://caed2015.sabanciuniv.ed
u/registration-form.\nTickets: http://caed2015.sabanciuniv.edu/registratio
n-form.
DTSTART;TZID=America/New_York:20151023T083000
DTEND;TZID=America/New_York:20151025T141500
GEO:+41.03714;+28.98099
LOCATION:Comparative Analysis of Enterprise Data (CAED) 2015 Conference @ Ş
ht. Muhtar\, taksim istanbul apart\, 34435 Beyoğlu/İstanbul\, Turkey
SEQUENCE:0
SUMMARY:Vilhuber @ CAED 2015: “Usage and outcomes of the Synthetic Data Ser
ver”
URL:https://www.vilhuber.com/lars/event/vilhuber-caed-2015-usage-and-outcom
es-of-the-synthetic-data-server/
X-COST-TYPE:external
X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\n\\n\\n

The Synthetic Data Server (SDS) at Cornell Univ
ersity was set up to provide early access to new synthetic data products b
y the U.S. Census Bureau. These datasets are made available to interested
researchers in a controlled environment\, prior to a more generalized rele
ase. Over the past 5 years\, 4 synthetic datasets were made available on t
he server\, and over 100 users have accessed the server over that time per
iod. This paper reports on interim outcomes of the activity: results of va
lidation requests from a user perspective\, functioning of the feedback lo
op due to validation and user input\, and the role of the SDS as a access
gateway to and educational tool for other mechanisms of accessing detailed
person\, household\, establishment\, and firm statistics.

X-TAGS;LANGUAGE=en-US:CAED\,NCRN\,SIPP Synthetic Beta\,SynLBD\,Synthetic\,S
ynthetic Data Server
X-TICKETS-URL:http://caed2015.sabanciuniv.edu/registration-form
END:VEVENT
BEGIN:VEVENT
UID:ai1ec-2853@www.ncrn.cornell.edu
DTSTAMP:20171214T021012Z
CATEGORIES;LANGUAGE=en-US:Conferences\,Presentation\,vilhuber
CONTACT:http://www.cla.temple.edu/economics/ai1ec_event/john-abowd-cornell/
?instance_id=52084
DESCRIPTION:“Revisiting the Economics of Privacy: Population Statistics and
Confidentiality Protection as Public Good”\, John Abowd (Cornell Universi
ty and U.S. Census Bureau)\, Ian Schmutte (University of Georgia)\nAbstrac
t\nWe consider the problem of the public release of statistical informatio
n about a population–explicitly accounting for the public-good properties
of both data accuracy and privacy loss. We first consider the implications
of adding the public-good component to recently published models of priva
te data publication under differential privacy guarantees using a Vickery-
Clark-Groves mechanism and a Lindahl mechanism. We show that data quality
will be inefficiently under-supplied. Next\, we develop a standard social
planner’s problem using the technology set implied by (ε\, δ)-differential
privacy with (α\, β)-accuracy for the Private Multiplicative Weights quer
y release mechanism to study the properties of optimal provision of data a
ccuracy and privacy loss when both are public goods. Using the production
possibilities frontier implied by this technology\, explicitly parameteriz
ed interdependent preferences\, and the social welfare function\, we displ
ay properties of the solution to the social planner’s problem. Our results
directly quantify the optimal choice of data accuracy and privacy loss as
functions of the technology and preference parameters. Some of these prop
erties can be quantified using population statistics on marginal preferenc
es and correlations between income\, data accuracy preferences\, and priva
cy loss preferences that are available from survey data. Our results show
that government data custodians should publish more accurate statistics wi
th weaker privacy guarantees than would occur with purely private data pub
lishing. Our statistical results using the General Social Survey and the C
ornell National Social Survey indicate that the welfare losses from under-
providing data accuracy while over-providing privacy protection can be sub
stantial.
DTSTART;TZID=America/New_York:20151030T143000
DTEND;TZID=America/New_York:20151030T160000
GEO:+39.981437;-75.15507
LOCATION:Temple University RA580 @ Temple University\, 1801 N Broad St\, Ph
iladelphia\, PA 19122\, USA
SEQUENCE:0
SUMMARY:Abowd @ Temple University Economics Department Workshop: “Revisitin
g the Economics of Privacy: Population Statistics and Confidentiality Prot
ection as Public Good”
URL:https://www.vilhuber.com/lars/event/abowd-temple-university-economics-d
epartment-workshop-revisiting-the-economics-of-privacy-population-statisti
cs-and-confidentiality-protection-as-public-good/
X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\n\\n\\n

“Revisiting the Economics of Privacy: Populat
ion Statistics and Confidentiality Protection as Public Good”\, John Abowd
(Cornell University and U.S. Census Bureau)\, Ian Schmutte (University of
Georgia)\nAbstract\nWe consider the problem of the public rel
ease of statistical information about a population–explicitly accounting f
or the public-good properties of both data accuracy and privacy loss. We f
irst consider the implications of adding the public-good component to rece
ntly published models of private data publication under differential priva
cy guarantees using a Vickery-Clark-Groves mechanism and a Lindahl mechani
sm. We show that data quality will be inefficiently under-supplied. Next\,
we develop a standard social planner’s problem using the technology set i
mplied by (ε\, δ)-differential privacy with (α\, β)-accuracy for the Priva
te Multiplicative Weights query release mechanism to study the properties
of optimal provision of data accuracy and privacy loss when both are publi
c goods. Using the production possibilities frontier implied by this techn
ology\, explicitly parameterized interdependent preferences\, and the soci
al welfare function\, we display properties of the solution to the social
planner’s problem. Our results directly quantify the optimal choice of dat
a accuracy and privacy loss as functions of the technology and preference
parameters. Some of these properties can be quantified using population st
atistics on marginal preferences and correlations between income\, data ac
curacy preferences\, and privacy loss preferences that are available from
survey data. Our results show that government data custodians should publi
sh more accurate statistics with weaker privacy guarantees than would occu
r with purely private data publishing. Our statistical results using the G
eneral Social Survey and the Cornell National Social Survey indicate that
the welfare losses from under-providing data accuracy while over-providing
privacy protection can be substantial.

Abstrac
t: The 2020 Census Operational Plan was baselined in October 2015. This
high level review will highlight some major innovations documented in that
plan. Further discussion includes future research to further refine the
design and testing planned in the upcoming years.(archived presentation)
p>\n

We report results from the first comprehensive total quality eval
uation of three major indicators in the U.S. Census Bureau’s Longitudinal
Employer-Household Dynamics (LEHD) Program Quarterly Workforce Indicators
(QWI): beginning-of-quarter employment\, full-quarter employment\, and ave
rage monthly earnings of full-quarter employees. Beginning-of-quarter empl
oyment is also the main tabulation variable in the LEHD Origin-Destination
Employment Statistics workplace reports as displayed in OnTheMap (OTM). T
he evaluation is conducted using the multiple threads generated by the edi
t and imputation models used in the LEHD Infrastructure File System. These
threads conform to the Rubin (1987) multiple imputation model. Each impli
cate is the output of formal probability models that address coverage\, ed
it and imputation errors. Design-based sampling variability and finite pop
ulation corrections are also included in the evaluation. We derive special
formulas for the Rubin total variability and its components that are cons
istent with the disclosure avoidance system used for QWI and LODES/OTM wor
kplace reports. These formulas allow us to publish the complete set of det
ailed total quality measures for QWI and LODES. The analysis reveals that
the three publication variables under study are estimated very accurately
for tabulations involving at least 10 jobs. Tabulations involving three to
nine jobs have acceptable quality. Tabulations involving one or two jobs\
, which are generally suppressed in the QWI\, have substantial total varia
bility but their publication in LODES allows the formation of larger custo
m aggregations\, which will in general have the accuracy estimated for tab
ulations in the QWI of similar magnitude.

‘Assessing the Data Quality of Public Use Tabul
ations Produced from Synthetic Data: Synthetic Business Dynamics Statistic
s’ (Lars Vilhuber\, Cornell)

\n

We describe and analyze a method that
blends records from both observed and synthetic microdata into public-use
tabulations on establishment statistics. The resulting tables use synthet
ic data only in potentially sensitive cells. We describe different algorit
hms\, and present preliminary results when applied to the Census Bureau’s
Business Dynamics Statistics and Synthetic Longitudinal Business Database\
, highlighting accuracy and protection afforded by the method when compare
d to existing public-use tabulations (with suppressions).

\n

‘Synthet
ic Data Generation for Firm Links’ (Saki Kinney\, RTI)

\n

In most cou
ntries\, national statistical agencies do not release establishment-level
business microdata\, because doing so represents too large a risk to estab
lishments’ confidentiality. Agencies potentially can manage these risks by
releasing synthetic microdata\, i.e.\, individual establishment records s
imulated from statistical models designed to mimic the joint distribution
of the underlying observed data. Previously\, we used this approach to gen
erate a public-use version—now available for public use—of the U.S. Census
Bureau’s Longitudinal Business Database (LBD)\, a longitudinal census of
establishments dating back to 1976. While the synthetic LBD has proven to
be a useful product\, we now seek to improve and expand it by using new sy
nthesis models and adding features. This paper describes our efforts to cr
eate the second generation of the SynLBD\, including synthesis procedures
that we believe could be replicated in other contexts.

‘Assessing the Data Quality of Public Use Tabul
ations Produced from Synthetic Data: Synthetic Business Dynamics Statistic
s’ (Lars Vilhuber\, Cornell)

\n

We describe and analyze a method that
blends records from both observed and synthetic microdata into public-use
tabulations on establishment statistics. The resulting tables use synthet
ic data only in potentially sensitive cells. We describe different algorit
hms\, and present preliminary results when applied to the Census Bureau’s
Business Dynamics Statistics and Synthetic Longitudinal Business Database\
, highlighting accuracy and protection afforded by the method when compare
d to existing public-use tabulations (with suppressions). (archived presen
tation)

\n

‘Synthetic Data Generation for Firm Links’ (Saki Kinney\,
RTI)

\n

In most countries\, national statistical agencies do not rele
ase establishment-level business microdata\, because doing so represents t
oo large a risk to establishments’ confidentiality. Agencies potentially c
an manage these risks by releasing synthetic microdata\, i.e.\, individual
establishment records simulated from statistical models designed to mimic
the joint distribution of the underlying observed data. Previously\, we u
sed this approach to generate a public-use version—now available for publi
c use—of the U.S. Census Bureau’s Longitudinal Business Database (LBD)\, a
longitudinal census of establishments dating back to 1976. While the synt
hetic LBD has proven to be a useful product\, we now seek to improve and e
xpand it by using new synthesis models and adding features. This paper des
cribes our efforts to create the second generation of the SynLBD\, includi
ng synthesis procedures that we believe could be replicated in other conte
xts. (archived presentation)

Cornell University\nDuke University / National In
stitute of Statistical Sciences (NISS)

\n

Date:

\n

Jan 06\,
2016\, 3:00pm to 4:30pm EST

\n

Address:

\n

Ithaca\, NY 148
53United States

\n

Attachments:

\n

Presentation (Vilhuber)
\n Presentation (Kinney)

\n

Location:

\n

HTML>
X-TAGS;LANGUAGE=en-US:NCRN
END:VEVENT
BEGIN:VEVENT
UID:ai1ec-39@www.vrdc.cornell.edu/info747x
DTSTAMP:20171214T021012Z
CATEGORIES;LANGUAGE=en-US:INFO7470
CONTACT:
DESCRIPTION:We will introduce the teaching environment\, and present the cl
ass itself. An overview of the U.S. statistical system is given.\n \nLectu
re notes\n\nINFO7470-S1-2016-Course Introduction\nINFO7470-S1-2016-Technic
al points \nINFO7470-S1-2016-Overview of the U.S. Statistical System
DTSTART;TZID=America/New_York:20160201T132500
DTEND;TZID=America/New_York:20160201T161000
SEQUENCE:0
SUMMARY:Session 1: Course Introduction and Overview of the U.S. Statistical
System
URL:https://www.vilhuber.com/lars/event/session-1-course-introduction-and-o
verview-of-the-u-s-statistical-system/
X-COST-TYPE:free
X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\n\\n\\n

\n

We will introduce the teaching environment\
, and present the class itself. An overview of the U.S. statistical system
is given.

Readings and other information
h4>\n

Lectu
re Notes

\n
END:VEVENT
BEGIN:VEVENT
UID:ai1ec-107@www.vrdc.cornell.edu/info747x
DTSTAMP:20171214T021012Z
CATEGORIES;LANGUAGE=en-US:INFO7470
CONTACT:
DESCRIPTION:This lecture is a “flipped” lecture. However\, we have the priv
ilege of discussing in class (LIVE) a variety of topics on the federal sta
tistical system with one of the foremost experts on it\, Connie Citro (CNS
TAT).\nLecture Notes\n\nINFO7470-S4-Household Surveys\n\nDiscussion Notes
(Connie Citro)\n\nCitro – 2016-NCRN – 44 years of CNSTAT (PDF).\nAlso see
National Research Council. 2013. “Principles and Practices for a Federal S
tatistical Agency: Fifth Edition.” Washington\, DC: The National Academies
Press. doi: 10.17226/18318. (referenced by Connie Citro).
DTSTART;TZID=America/New_York:20160222T132500
DTEND;TZID=America/New_York:20160222T161000
SEQUENCE:0
SUMMARY:Session 4: Measuring People and Households
URL:https://www.vilhuber.com/lars/event/session-4-measuring-people-and-hous
eholds/
X-COST-TYPE:free
X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\n\\n\\n

\n

This lecture is a “flipped” lecture. Howeve
r\, we have the privilege of discussing in class (LIVE) a variety of topic
s on the federal statistical system with one of the foremost experts on it
\, Connie Citro (CNSTAT).

\n
END:VEVENT
BEGIN:VEVENT
UID:ai1ec-108@www.vrdc.cornell.edu/info747x
DTSTAMP:20171214T021012Z
CATEGORIES;LANGUAGE=en-US:INFO7470
CONTACT:
DESCRIPTION:This lecture is a “flipped” lecture. Discussion of the material
s viewed by students will occur on February 29\, 2016.\nLecture Notes\n\nI
NFO7470-S5 Economic Statistics\nUpdates: INFO7470-S5 Updates\n\nLab\nThe l
ab is posted on edX\, was made available to registered students on Feb 22\
, 2016\, and is due on March 1\, 2016 at 5:01 UTC (12:01 AM EST).
DTSTART;TZID=America/New_York:20160229T132500
DTEND;TZID=America/New_York:20160229T161000
SEQUENCE:0
SUMMARY:Session 5: Measuring Business and Economic Activity
URL:https://www.vilhuber.com/lars/event/session-5-measuring-business-and-ec
onomic-activity/
X-COST-TYPE:free
X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\n\\n\\n

\n

This lecture is a “flipped” lecture. Discus
sion of the materials viewed by students will occur on February 29\, 2016.

I
n this paper\, we evaluate the joint effects of question\, respondent and
interviewer characteristics on two proxy indicators of data quality – resp
onse time and question misreading – in a telephone survey. We include ques
tion features traditionally examined\, such as the length of the question
and format of response options\, and features that are related to the layo
ut and format of interviewer-administered questions. First\, we examine ho
w these question features affect the time to ask and answer survey questio
ns and how different interviewers vary in their administration of these qu
estions. Second\, we investigate how choices in visual design features in
particular\, that is design features that require interviewer decisions\,
contribute to interviewer question misreading. These two measures of quest
ion time and question misreading are both proxies for the risk of measurem
ent error in responses to survey questions.

\n

To examine these quest
ions\, we use paradata and behavior codes from the Work and Leisure Today
(n=450\, AAPOR RR3=6.3%) survey and use cross-classified random effects mo
dels. Overall\, more of the variation in both response time and question m
isreading is due to question characteristics compared to respondent or int
erviewer attributes. Additionally\, we find that question characteristics
related to necessary survey design features and respondent confusion are t
he primary predictors of response time\, with little effect of visual desi
gn features of the question. Our results for question misreading show a di
fferent pattern. Characteristics related to task complexity and visual des
ign significantly affect question misreading\, with little contribution of
necessary survey design features. We conclude with implications for surve
y practice.(archived presentation)

Recording

\n

A recording of the live session will be availa
ble shortly afterwards.

\n

\n

\n

\n
END:VEVENT
BEGIN:VEVENT
UID:ai1ec-111@www.vrdc.cornell.edu/info747x
DTSTAMP:20171214T021012Z
CATEGORIES;LANGUAGE=en-US:INFO7470
CONTACT:
DESCRIPTION:Part 1 will be “flipped classroom” on Geographic Information Sy
stems (GIS) – basic geocoding\, geographic concepts\, and other topics. Th
e recordings are from the 2013 INFO7470 lecture given by Michael Ratcliffe
\, of the Geography Division at the U.S. Census Bureau.\nPart 2 will be ab
out access to restricted access data. Students will be introduced to the r
esearch proposal mechanism of the Federal Statistical Research Data Center
. This will also be “flipped”.\nPart 3 is a live presentation on two parti
cular aspects: how to access the RDC of the German Institute for Employme
nt Research (Matthias Umkehrer)\, and considerations on requesting access
to BLS data in the FSRDC (Kristen Monaco). For both topics\, guest present
ers from those institutions will present live in the videoconference class
room.\nLecture Notes\n\nGeography: INFO7470-S8-Census Geography Concepts\n
Restricted Access Data: INFO7470-S8-Proposals\, Kristen Monaco on BLS prop
osal review\, Matthias Umkehrer on IAB access\nUpdates and Flipped Class q
uestions: INFO7470-S8-Updates and flipped class questions\n\nAdditional li
nks\n\nIRS SOI Joint Statistical Research Program – with links to the 2014
Call for proposals (now closed)(local copy) and projects in 2012 and 2014
DTSTART;TZID=America/New_York:20160321T132500
DTEND;TZID=America/New_York:20160321T161000
SEQUENCE:0
SUMMARY:Session 8: Census Geography – Restricted Access Data
URL:https://www.vilhuber.com/lars/event/session-8-census-geography-restrict
ed-access-data/
X-COST-TYPE:free
X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\n\\n\\n

\n

Part 1 will be “flipped classroom” on Geographic Information Systems (GIS) – basic geocoding\, geogr
aphic concepts\, and other topics. The recordings are from the 2013 INFO74
70 lecture given by Michael Ratcliffe\, of the Geography Division at the U
.S. Census Bureau.

\n

Part 2 will be about access to restrict
ed access data. Students will be introduced to the research propo
sal mechanism of the Federal Statistical Research Data Center. This will a
lso be “flipped”.

\n

Part 3 is a live presentation on two particular
aspects: how to access the RDC of the German Institute f
or Employment Research (Matthias Umkehrer)\, and considerations on requesting access to BLS data in the FSRDC (Kristen Monaco). For
both topics\, guest presenters from those institutions will present live
in the videoconference classroom.

Title: Microclustering: Wh
en the Cluster Sizes Grow Sublinearly with the Data Set

\n

Abstract:
Most generative models for clustering implicitly assume that the number of
data points in each cluster grows linearly with the total number of data
points. Finite mixture models\, Dirichlet process mixture models\, and Pit
man–Yor process mixture models make this assumption\, as do all other infi
nitely exchangeable clustering models. However\, for some tasks\, this ass
umption is undesirable. For example\, when performing entity resolution\,
the size of each cluster is often unrelated to the size of the data set. C
onsequently\, each cluster contains a negligible fraction of the total num
ber of data points. Such tasks therefore require models that yield cluster
s whose sizes grow sublinearly with the size of the data set. We address t
his requirement by defining the microclustering property and introducing a
new model that exhibits this property. We compare this model to several c
ommonly used clustering models by checking model fit using real and simula
ted data sets. (archived presentation)

Lecture Notes

Lab

\n

The lab (an edit and
imputation exercise) has been posted on the INFO7470x edX site. Your program needs to be uploaded by A
pril 22\, 2016 (this is clearly marked in the lab). You will then be asked
to peer-review two other programs and answers between April 23 and April
29.

\n

\n

\n
END:VEVENT
BEGIN:VEVENT
UID:ai1ec-115@www.vrdc.cornell.edu/info747x
DTSTAMP:20171214T021012Z
CATEGORIES;LANGUAGE=en-US:INFO7470
CONTACT:
DESCRIPTION:Total quality evaluation – errors from coverage\, sampling\, ed
it\, and imputation.\nIntroduction to record linking\n\nWhat is record lin
king\, what is it not\, what is the theory?\nRecord linking: applications
and examples – How do you do it\, what do you need\, what are the possible
complications?\nExamples of record linking\n\n \nLecture Notes\n\nINFO747
0 S11 -Updates\nINFO7470 S11 -Statistical Tools Edit and Imputation Exampl
es\nINFO7470 S11-record-linking
DTSTART;TZID=America/New_York:20160418T132500
DTEND;TZID=America/New_York:20160418T161000
SEQUENCE:0
SUMMARY:Session 11: Statistical Tools – Record Linkage and Total Quality Ev
aluation
URL:https://www.vilhuber.com/lars/event/session-11-statistical-tools-record
-linkage-and-total-quality-evaluation/
X-COST-TYPE:free
X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\n\\n\\n

X-TAGS;LANGUAGE=en-US:DDI\,NCRN
END:VEVENT
BEGIN:VEVENT
UID:ai1ec-113@www.vrdc.cornell.edu/info747x
DTSTAMP:20171214T021012Z
CATEGORIES;LANGUAGE=en-US:INFO7470
CONTACT:
DESCRIPTION:Part A: Spatial Analysis: This part of the lecture is a flipped
class\, consisting of a 2013 lecture given by Prof. Nicholas Nagle of Uni
versity of Tennessee – Knoxville. You will find the video links on the edX
class website.\nPart B: Network Analysis: This part of the lecture is a l
ive class.\n\nUpdates\n\nINFO7470 S13 -Updates\n\n\nPart A: Spatial Analys
is\nTopics\n\nBasic Geocoding\nTools for Geocoding\nAnalysis Methods\nTool
s for Geographic Analysis\n\nLecture Notes\n\nINFO7470 S13 – SpatialAnalys
is – Nagle\n\nAbout the Guest Lecturer\nNicholas Nagle\, University of Ten
nessee – Knoxville\n\n\n\nNicholas Nagle is a GIScientist/geospatial analy
st whose research centers on combining spatial data in order to produce mo
re reliable geographic information. Prof. Nagle holds a joint faculty app
ointment with the Geographic Information Science and Technology group at O
ak Ridge National Laboratory. He is currently working on a number of proj
ects improving the availability and reliability of data from the US Census
Bureau\, developing methods to identify land cover change\, and is workin
g on a number of projects related to population and health\, both in Tenne
ssee and in developing countries.\n\n\n\nPart B: Network Analysis\nThis pa
rt of the lecture is a live class.\nLecture Notes\n\nINFO7470-S13-Statisti
cal Tools-Hierarchical Models and Network Analysis
DTSTART;TZID=America/New_York:20160502T132500
DTEND;TZID=America/New_York:20160502T161000
SEQUENCE:0
SUMMARY:Session 13: Geographic and Network Analysis Methods
URL:https://www.vilhuber.com/lars/event/session-13-geographic-and-network-a
nalysis-methods/
X-COST-TYPE:free
X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\n\\n\\n

\n

\n

Part A: Spatial Analysis: This part
of the lecture is a flipp
ed class\, consisting of a 2013 lecture given by Prof. Nic
holas Nagle of University of Tennessee – Knoxville. You will find the vide
o links on the edX class website.

Part A: Spatial Analysis

Topics

Lecture Notes

About the Guest Lecturer

\n

Nicholas Nagle\, University of Tennes
see – Knoxville

\n

\n

\n

\n

Nicholas Na
gle is a GIScientist/geospatial analyst whose research centers on combinin
g spatial data in order to produce more reliable geographic information.
Prof. Nagle holds a joint faculty appointment with the Geographic Informat
ion Science and Technology group at Oak Ridge National Laboratory. He is
currently working on a number of projects improving the availability and r
eliability of data from the US Census Bureau\, developing methods to ident
ify land cover change\, and is working on a number of projects related to
population and health\, both in Tennessee and in developing countries.

X-TAGS;LANGUAGE=en-US:Lars Vilhuber\,NCRN
X-TICKETS-URL:https://www.eventbrite.com/e/ncrn-meeting-spring-2016-public-
events-tickets-22247855936?ref=ecount
END:VEVENT
BEGIN:VEVENT
UID:ai1ec-4500@www.vilhuber.com/lars
DTSTAMP:20171214T021012Z
CATEGORIES:
CONTACT:http://www.ncrn.info/event/ncrn-spring-2016-meeting
DESCRIPTION:Benjamin Perry\, Venkata Kambhampaty\, Kyle Brumsted\, Lars Vil
huber\, & William C. Block: “Crowdsourcing Codebook Development and Enhanc
ements in CED²AR”\nAbstract: Recent years have shown the power of user­sou
rced information evidenced by the success of Wikipedia and its many emulat
ors. This sort of unstructured discussion is currently not feasible as a p
art of the otherwise successful metadata repositories. Creating and augmen
ting metadata is a labor­intensive endeavor. Harnessing collective knowled
ge from actual data users can supplement officially generated metadata. As
part of our Comprehensive Extensible Data Documentation and Access Reposi
tory (CED²AR) infrastructure\, we demonstrate a prototype of crowdsourced
DDI on actual codebooks. While the system itself is more general\, the dem
onstrated implementation relies on a set of linked deployments of the basi
c software on web servers. The backend transparently handles changes\, and
frontend has the ability to separate official edits (by designated curato
rs of the data and the metadata) from crowd­sourced content. The implement
ation allows a data curator\, such as a statistical agency\, to collect an
d incorporate improvements suggested by knowledgeable users in a structure
d way.\nTickets: https://www.eventbrite.com/e/ncrn-meeting-spring-2016-pub
lic-events-tickets-22247855936?ref=ecount.\nTickets: https://www.eventbrit
e.com/e/ncrn-meeting-spring-2016-public-events-tickets-22247855936?ref=eco
unt.
DTSTART;TZID=America/New_York:20160509T113000
DTEND;TZID=America/New_York:20160509T120000
GEO:+38.847071;-76.929454
LOCATION:U.S. Census Bureau @ 4600 Silver Hill Rd\, Suitland\, MD 20746\, U
SA
SEQUENCE:0
SUMMARY:Crowdsourcing Codebook Development and Enhancements in CED²AR
URL:https://www.vilhuber.com/lars/event/crowdsourcing-codebook-development-
and-enhancements-in-ced%c2%b2ar-2/
X-COST-TYPE:external
X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\n\\n\\n

Benjamin Perry\, Venkata Kambhampaty\, Kyle B
rumsted\, Lars Vilhuber\, & William C. Block: “Crowdsourcing Codebook Deve
lopment and Enhancements in CED²AR”\nAbstract: Recent years have sho
wn the power of user­sourced information evidenced by the success of Wikip
edia and its many emulators. This sort of unstructured discussion is curre
ntly not feasible as a part of the otherwise successful metadata repositor
ies. Creating and augmenting metadata is a labor­intensive endeavor. Harne
ssing collective knowledge from actual data users can supplement officiall
y generated metadata. As part of our Comprehensive Extensible Data Documen
tation and Access Repository (CED²AR) infrastructure\, we demonstrate a pr
ototype of crowdsourced DDI on actual codebooks. While the system itself i
s more general\, the demonstrated implementation relies on a set of linked
deployments of the basic software on web servers. The backend transparent
ly handles changes\, and frontend has the ability to separate official edi
ts (by designated curators of the data and the metadata) from crowd­source
d content. The implementation allows a data curator\, such as a statistica
l agency\, to collect and incorporate improvements suggested by knowledgea
ble users in a structured way.\nTickets: https://www.eventbrite.com/
e/ncrn-meeting-spring-2016-public-events-tickets-22247855936?ref=ecount.
p>\n

Lecture notes

Links

\n
END:VEVENT
BEGIN:VEVENT
UID:ai1ec-3009@www.ncrn.cornell.edu
DTSTAMP:20171214T021012Z
CATEGORIES;LANGUAGE=en-US:Conferences\,NCRN Meetings\,Presentation\,vilhube
r
CONTACT:http://www.ncrn.info/event/ncrn-spring-2016-meeting
DESCRIPTION:John M. Abowd and Ian M. Schmutte : “The Advantages And Disadva
ntages Of Statistical Disclosure Limitation For Program Evaluation”\nAbstr
act: This paper formalizes the manner in which statistical disclosure limi
tation (SDL) hinders empirical research in economics. We also highlight a
hitherto unappreciated advantage of SDL\, formal privacy models\, and synt
hetic data systems: they can serve as a defense against model overfitting
and false­discovery bias. More specifically\, a synthetic data validation
system can – and we argue should – be used in conjunction with systems in
which researchers register their research design ahead of analysis. The ke
y insight is that privacy­protected data can be used for model development
while minimizing risk of model overfitting. To demonstrate these points\,
we develop a model in which the statistical agency collects data from a p
opulation\, but publishes a version in which the data that have been inten
tionally distorted by some SDL process. We say the SDL process is ignorabl
e if inferences based on the published data are indistinguishable from inf
erences based on the unprotected data. SDL is rarely ignorable. If the res
earcher has knowledge of the SDL model\, she can conduct an SDL­aware anal
ysis that explicitly corrects for the effects of SDL. If\, as is often the
case\, if the SDL model is unknown\, we describe circumstances under whic
h SDL can still be learned.\n[Presentation]\nTickets: https://www.eventbri
te.com/e/ncrn-meeting-spring-2016-public-events-tickets-22247855936?ref=ec
ount.\nTickets: https://www.eventbrite.com/e/ncrn-meeting-spring-2016-publ
ic-events-tickets-22247855936?ref=ecount.
DTSTART;TZID=America/New_York:20160510T101500
DTEND;TZID=America/New_York:20160510T104500
GEO:+38.847071;-76.929454
LOCATION:U.S. Census Bureau @ 4600 Silver Hill Rd\, Suitland\, MD 20746\, U
SA
SEQUENCE:0
SUMMARY:Schmutte presents on The Advantages and Disadvantages of Statistica
l Disclosure Limitation for Program Evaluation
URL:https://www.vilhuber.com/lars/event/the-advantages-and-disadvantages-of
-statistical-disclosure-limitation-for-program-evaluation/
X-COST-TYPE:external
X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\n\\n\\n

John M. Abowd and Ian M. Schmutte : “The Adva
ntages And Disadvantages Of Statistical Disclosure Limitation For Program
Evaluation”\nAbstract: This paper formalizes the manner in which sta
tistical disclosure limitation (SDL) hinders empirical research in economi
cs. We also highlight a hitherto unappreciated advantage of SDL\, formal p
rivacy models\, and synthetic data systems: they can serve as a defense ag
ainst model overfitting and false­discovery bias. More specifically\, a sy
nthetic data validation system can – and we argue should – be used in conj
unction with systems in which researchers register their research design a
head of analysis. The key insight is that privacy­protected data can be us
ed for model development while minimizing risk of model overfitting. To de
monstrate these points\, we develop a model in which the statistical agenc
y collects data from a population\, but publishes a version in which the d
ata that have been intentionally distorted by some SDL process. We say the
SDL process is ignorable if inferences based on the published data are in
distinguishable from inferences based on the unprotected data. SDL is rare
ly ignorable. If the researcher has knowledge of the SDL model\, she can c
onduct an SDL­aware analysis that explicitly corrects for the effects of S
DL. If\, as is often the case\, if the SDL model is unknown\, we describe
circumstances under which SDL can still be learned.\n[Presentation]<
br />\nTickets: https://www.eventbrite.com/e/ncrn-meeting-spring-2016-publ
ic-events-tickets-22247855936?ref=ecount.

X-TAGS;LANGUAGE=en-US:Ian Schmutte\,John Abowd\,NCRN
X-TICKETS-URL:https://www.eventbrite.com/e/ncrn-meeting-spring-2016-public-
events-tickets-22247855936?ref=ecount
END:VEVENT
BEGIN:VEVENT
UID:ai1ec-3010@www.ncrn.cornell.edu
DTSTAMP:20171214T021012Z
CATEGORIES;LANGUAGE=en-US:Conferences\,NCRN Meetings\,Presentation\,vilhube
r
CONTACT:http://www.ncrn.info/event/ncrn-spring-2016-meeting
DESCRIPTION:Benjamin Perry\, Venkata Kambhampaty\, Kyle Brumsted\, Lars Vil
huber\, & William C. Block: “Crowdsourcing Codebook Development and Enhanc
ements in CED²AR”\nAbstract: Recent years have shown the power of user­sou
rced information evidenced by the success of Wikipedia and its many emulat
ors. This sort of unstructured discussion is currently not feasible as a p
art of the otherwise successful metadata repositories. Creating and augmen
ting metadata is a labor­intensive endeavor. Harnessing collective knowled
ge from actual data users can supplement officially generated metadata. As
part of our Comprehensive Extensible Data Documentation and Access Reposi
tory (CED²AR) infrastructure\, we demonstrate a prototype of crowdsourced
DDI on actual codebooks. While the system itself is more general\, the dem
onstrated implementation relies on a set of linked deployments of the basi
c software on web servers. The backend transparently handles changes\, and
frontend has the ability to separate official edits (by designated curato
rs of the data and the metadata) from crowd­sourced content. The implement
ation allows a data curator\, such as a statistical agency\, to collect an
d incorporate improvements suggested by knowledgeable users in a structure
d way.\nAvailable: https://ecommons.cornell.edu/handle/1813/43887\nTickets
: https://www.eventbrite.com/e/ncrn-meeting-spring-2016-public-events-tick
ets-22247855936?ref=ecount.\nTickets: https://www.eventbrite.com/e/ncrn-me
eting-spring-2016-public-events-tickets-22247855936?ref=ecount.
DTSTART;TZID=America/New_York:20160510T113000
DTEND;TZID=America/New_York:20160510T120000
GEO:+38.847071;-76.929454
LOCATION:U.S. Census Bureau @ 4600 Silver Hill Rd\, Suitland\, MD 20746\, U
SA
SEQUENCE:0
SUMMARY:Vilhuber presents on Crowdsourcing Codebook Development and Enhance
ments in CED²AR
URL:https://www.vilhuber.com/lars/event/crowdsourcing-codebook-development-
and-enhancements-in-ced%c2%b2ar/
X-COST-TYPE:external
X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\n\\n\\n

Benjamin Perry\, Venkata Kambhampaty\, Kyle B
rumsted\, Lars Vilhuber\, & William C. Block: “Crowdsourcing Codebook Deve
lopment and Enhancements in CED²AR”\nAbstract: Recent years have sho
wn the power of user­sourced information evidenced by the success of Wikip
edia and its many emulators. This sort of unstructured discussion is curre
ntly not feasible as a part of the otherwise successful metadata repositor
ies. Creating and augmenting metadata is a labor­intensive endeavor. Harne
ssing collective knowledge from actual data users can supplement officiall
y generated metadata. As part of our Comprehensive Extensible Data Documen
tation and Access Repository (CED²AR) infrastructure\, we demonstrate a pr
ototype of crowdsourced DDI on actual codebooks. While the system itself i
s more general\, the demonstrated implementation relies on a set of linked
deployments of the basic software on web servers. The backend transparent
ly handles changes\, and frontend has the ability to separate official edi
ts (by designated curators of the data and the metadata) from crowd­source
d content. The implementation allows a data curator\, such as a statistica
l agency\, to collect and incorporate improvements suggested by knowledgea
ble users in a structured way.\nAvailable: https://ecommons.cornell.
edu/handle/1813/43887\nTickets: https://www.eventbrite.com/e/ncrn-me
eting-spring-2016-public-events-tickets-22247855936?ref=ecount.

2:30 PM\nThe Challenge of Reproducible Science and
Privacy Protection for Statistical Agencies — John M. Abowd\, U.S. Census
Bureau/Cornell University

\n

2:55 PM\nSpatio-Temporal Change o
f Support with Application to American Community Survey Multi-Year Period
Estimates — Scott H. Holan\, University of Missouri \; Jonathan R. Bradley
\, University of Missouri \; Christopher Wikle\, University of Missouri

X-TAGS;LANGUAGE=en-US:JSM\,NCRN\,Privacy
END:VEVENT
BEGIN:VEVENT
UID:ai1ec-3067@www.ncrn.cornell.edu
DTSTAMP:20171214T021012Z
CATEGORIES;LANGUAGE=en-US:Conferences\,Presentation\,vilhuber
CONTACT:
DESCRIPTION:John M. Abowd\, NCRN Cornell and now Associate Director for Res
earch and Methodology and Chief Scientist at the U.S. Census Bureau is the
2016 Recipient of Julius Shiskin Memorial Award for Economic Statistics. H
e will speak on September 6\, 2016 at the WSS JULIUS SHISKIN MEMORIAL AWAR
D SEMINAR on “How Will Statistical Agencies Operate When All Data are Priv
ate?”\nTime: 1 – 3 p.m.\nLocation: Auditorium\, U.S. Census Bureau\, 4600
Silver Hill Road\, Suitland\, Maryland\, available through Webex.\nAbstra
ct: The dual problems of respecting citizen privacy and protecting the con
fidentiality of their data—Ken Prewitt’s famous “don’t ask/don’t tell” dic
tum—have become hopelessly conflated in the “Big Data” era. There are orde
rs of magnitude more data outside an agency’s firewall than inside it—comp
romising the integrity of traditional statistical disclosure limitation me
thods. And increasingly the information processed by the agency was “asked
” in a context wholly outside the agency’s operations—blurring the distinc
tion between what was asked and what is published. Already private busines
ses like Microsoft\, Google and Apple recognize that cybersecurity (safegu
arding the integrity and access controls for internal data) and privacy pr
otection (ensuring that what is published does not reveal too much about a
ny person or business) are two sides of the same coin. This is a paradigm-
shifting moment for statistical agencies. This talk will examine how stati
stical agencies can respond in manner consistent with their missions.
DTSTART;TZID=America/New_York:20160906T130000
DTEND;TZID=America/New_York:20160906T150000
GEO:+38.847071;-76.929454
LOCATION:U.S. Census Bureau Auditorium @ 4600 Silver Hill Rd\, Suitland\, M
D 20746\, USA
SEQUENCE:0
SUMMARY:John M. Abowd\, WSS JULIUS SHISKIN MEMORIAL AWARD SEMINAR\, “How W
ill Statistical Agencies Operate When All Data are Private?”
URL:https://www.vilhuber.com/lars/event/john-m-abowd-wss-julius-shiskin-mem
orial-award-seminar-how-will-statistical-agencies-operate-when-all-data-ar
e-private/
X-COST-TYPE:free
X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\n\\n\\n

John M. Abowd\, NCRN Cornell and now Associat
e Director for Research and Methodology and Chief Scientist at the U.S. Ce
nsus Bureau is the2016 Recipient of Julius Shiskin Memorial Award for Econ
omic Statistics. He will speak on September 6\, 2016 at the WSS JULIUS SHI
SKIN MEMORIAL AWARD SEMINAR on “How Will Statistical Agencies Operate When
All Data are Private?”\nTime: 1 – 3 p.m.\nLocation: Auditori
um\, U.S. Census Bureau\, 4600 Silver Hill Road\, Suitland\, Maryland\, av
ailable through Webex.\nAbstract: The dual problems of respecting ci
tizen privacy and protecting the confidentiality of their data—Ken Prewitt
’s famous “don’t ask/don’t tell” dictum—have become hopelessly conflated i
n the “Big Data” era. There are orders of magnitude more data outside an a
gency’s firewall than inside it—compromising the integrity of traditional
statistical disclosure limitation methods. And increasingly the informatio
n processed by the agency was “asked” in a context wholly outside the agen
cy’s operations—blurring the distinction between what was asked and what i
s published. Already private businesses like Microsoft\, Google and Apple
recognize that cybersecurity (safeguarding the integrity and access contro
ls for internal data) and privacy protection (ensuring that what is publis
hed does not reveal too much about any person or business) are two sides o
f the same coin. This is a paradigm-shifting moment for statistical agenci
es. This talk will examine how statistical agencies can respond in manner
consistent with their missions.

Lars Vilhuber speaks about “Disclosure Limita
tion and Confidentiality Protection in Linked Data” at the Center for Inte
runiversity Research and Analysis of Organizations‘s conference on “Facili
tate the access to Quebec data: How and to what ends?” The conference is j
ointly organized with the Quebec inter-University Centre for Social Statis
tics (QICSS). The presentation relies on joint work with John M. Abowd and
Ian M. Schmutte.

Lars Vilhuber speaks about “Disclosure Limita
tion and Confidentiality Protection in Linked Data” at the Center for Inte
runiversity Research and Analysis of Organizations‘s conference on “Facili
tate the access to Quebec data: How and to what ends?” The conference is j
ointly organized with the Quebec inter-University Centre for Social Statis
tics (QICSS). The presentation relies on joint work with John M. Abowd and
Ian M. Schmutte.\n[Presentation]

\n

X-TAGS;LANGUAGE=en-US:confidentiality protection\,Lars Vilhuber\,NCRN\,Sloa
n\,statistical disclosure limitation\,Synthetic Data Server\,TC-Large
END:VEVENT
BEGIN:VEVENT
UID:ai1ec-377@www.vrdc.cornell.edu/info747x
DTSTAMP:20171214T021012Z
CATEGORIES;LANGUAGE=en-US:INFO7470
CONTACT:
DESCRIPTION:Tentative: We will introduce the teaching environment\, and pre
sent the class itself. An overview of the U.S. statistical system is given
.
DTSTART;TZID=America/New_York:20170130T132500
DTEND;TZID=America/New_York:20170130T161000
SEQUENCE:0
SUMMARY:Session 1: Course Introduction and Overview of the U.S. Statistical
System
URL:https://www.vilhuber.com/lars/event/session-1-course-introduction-and-o
verview-of-the-u-s-statistical-system-2/
X-COST-TYPE:free
X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\n\\n\\n

Tentative: We will introduce the teaching env
ironment\, and present the class itself. An overview of the U.S. statistic
al system is given.

Lars Vilhuber\, PhD\nAbstract:\nConf
identiality protection is a multi-layered concept\, involving statistical
(cryptographic) methods and physical safeguards. When providing access to
researchers (both internal to the agency and external academic)\, a tensio
n arises between the level of trust vis-à-vis the researcher\, the statist
ical disclosure limitation applied to the data visible to the researcher\;
and the physical access mechanisms used by the researcher. This presentat
ion will review systems used by national and private research organization
s around the world\, putting them into the relevant legal and societal con
text.

Testimony to the U.S. Commission on Evidence-
based Policymaking (including video)

\n

X-TAGS;LANGUAGE=en-US:CEP\,Commission on Evidence-based Policy\,NCRN\,Sloan
END:VEVENT
BEGIN:VEVENT
UID:ai1ec-3278@www.ncrn.cornell.edu
DTSTAMP:20171214T021012Z
CATEGORIES;LANGUAGE=en-US:Conferences\,Presentation\,vilhuber
CONTACT:https://docs.google.com/document/d/1on41QIJwt4yBebNBGG7XoYOHLEkAmlQ
NRCUSwElfc9I/edit?usp=sharing
DESCRIPTION:In this seminar\, we discuss with interested parties the condit
ions necessary to implement the SynLBD approach\, with the goal of providi
ng other statistical agencies a straightforward toolkit to implement the s
ame procedure on their own data. Our hope is that by implementing similar
procedures on comparable business microdata\, new research both within and
across countries can be enabled. The ideal end result is a series of coun
try-specific datasets on establishments and/or firms available within the
same computing environment. We discuss the data and software requirements
for the lowest-cost approach\, the disclosure protection statistics alread
y implemented that can be used to achieve release of the data in this way
\, the validation procedures that an agency should agree to\, and the like
ly cost of maintaining such procedures. The seminar brings together academ
ics working on cutting-edge methods for the protection of privacy in stati
stical databases\, and researchers and implementers at statistical agencie
s that have started or are interested in starting a similar project.\nFive
sessions will touch on the full lifecycle of a SynLBD development and imp
lementation\, and will follow the same pattern. We will first discuss exis
ting implementations and experiences\, and will then as a group discuss is
sues as they pertain to the broader community. Emphasis should be on discu
ssing open issues\, specific solutions to specific problems. Proceedings w
ill be published later.\nFor more details\, please see the full agenda.\nP
roceedings\nVilhuber\, Lars\; Kinney\, Saki\; Schmutte\, Ian M.\, 2017. “P
roceedings from the Synthetic LBD International Seminar”\, Labor Dynamics
Institute Document 44\, available at http://digitalcommons.ilr.cornell.edu
/ldi/44/ or http://hdl.handle.net/1813/52472\nDocuments\nOverview of the
SynLBD methodology\nLink to presentation. Contains excerpts from\n
S. Kinney\, “Presentation: Synthetic Data Generation for Firm Links\,” NSF
Census Research Network – NCRN-Cornell\, 1813:50054\, 2016. [A
bstract] [URL] [Bibtex]\nIn most countries\, national sta
tistical agencies do not release establishment-level business microdata\,
because doing so represents too large a risk to establishments’ confidenti
ality. Agencies potentially can manage these risks by releasing synthetic
microdata\, i.e.\, individual establishment records simulated from statist
ical models designed to mimic the joint distribution of the underlying obs
erved data. Previously\, we used this approach to generate a public-use ve
rsion—now available for public use—of the U.S. Census Bureau’s Longitudina
l Business Database (LBD)\, a longitudinal census of establishments dating
back to 1976. While the synthetic LBD has proven to be a useful product\,
we now seek to improve and expand it by using new synthesis models and ad
ding features. This paper describes our efforts to create the second gener
ation of the SynLBD\, including synthesis procedures that we believe could
be replicated in other contexts.\n@TechReport{kinney-2016-ecommons\,\ntit
le = {Presentation: Synthetic Data Generation for Firm Links}\,\nauthor =
{Kinney\, Saki}\,\ninstitution = {NSF Census Research Network – NCRN-Corne
ll }\,\nyear = {2016}\,\nnumber = {1813:50054}\,\nAbstract = {In most coun
tries\, national statistical agencies do not release establishment-level b
usiness microdata\, because doing so represents too large a risk to establ
ishments’ confidentiality. Agencies potentially can manage these risks by
releasing synthetic microdata\, i.e.\, individual establishment records si
mulated from statistical models designed to mimic the joint distribution o
f the underlying observed data. Previously\, we used this approach to gene
rate a public-use version—now available for public use—of the U.S. Census
Bureau’s Longitudinal Business Database (LBD)\, a longitudinal census of e
stablishments dating back to 1976. While the synthetic LBD has proven to b
e a useful product\, we now seek to improve and expand it by using new syn
thesis models and adding features. This paper describes our efforts to cre
ate the second generation of the SynLBD\, including synthesis procedures t
hat we believe could be replicated in other contexts.}\,\nkeywords = {conf
identiality\; US Longitudinal Business Database\; synthetic data}\,\nowner
= {vilhuber}\,\nURL = {http://hdl.handle.net/1813/50054}\n}\nInputs to th
e SynLBD process\nLink to presentation. Based on Drechsler and Vilhuber (2
014).\nConfidentiality of the SynLBD\nLink to presentation. Contains excer
pts from \n S. Kinney\, “Presentation: Synthetic Data Generation fo
r Firm Links\,” NSF Census Research Network – NCRN-Cornell\, 1813:50054\,
2016. [Abstract] [URL] [Bibtex]\nIn most count
ries\, national statistical agencies do not release establishment-level bu
siness microdata\, because doing so represents too large a risk to establi
shments’ confidentiality. Agencies potentially can manage these risks by r
eleasing synthetic microdata\, i.e.\, individual establishment records sim
ulated from statistical models designed to mimic the joint distribution of
the underlying observed data. Previously\, we used this approach to gener
ate a public-use version—now available for public use—of the U.S. Census B
ureau’s Longitudinal Business Database (LBD)\, a longitudinal census of es
tablishments dating back to 1976. While the synthetic LBD has proven to be
a useful product\, we now seek to improve and expand it by using new synt
hesis models and adding features. This paper describes our efforts to crea
te the second generation of the SynLBD\, including synthesis procedures th
at we believe could be replicated in other contexts.\n@TechReport{kinney-2
016-ecommons\,\ntitle = {Presentation: Synthetic Data Generation for Firm
Links}\,\nauthor = {Kinney\, Saki}\,\ninstitution = {NSF Census Research N
etwork – NCRN-Cornell }\,\nyear = {2016}\,\nnumber = {1813:50054}\,\nAbstr
act = {In most countries\, national statistical agencies do not release es
tablishment-level business microdata\, because doing so represents too lar
ge a risk to establishments’ confidentiality. Agencies potentially can man
age these risks by releasing synthetic microdata\, i.e.\, individual estab
lishment records simulated from statistical models designed to mimic the j
oint distribution of the underlying observed data. Previously\, we used th
is approach to generate a public-use version—now available for public use—
of the U.S. Census Bureau’s Longitudinal Business Database (LBD)\, a longi
tudinal census of establishments dating back to 1976. While the synthetic
LBD has proven to be a useful product\, we now seek to improve and expand
it by using new synthesis models and adding features. This paper describes
our efforts to create the second generation of the SynLBD\, including syn
thesis procedures that we believe could be replicated in other contexts.}\
,\nkeywords = {confidentiality\; US Longitudinal Business Database\; synth
etic data}\,\nowner = {vilhuber}\,\nURL = {http://hdl.handle.net/1813/5005
4}\n}\nValidation Servers\nLink to presentation. Contains excerpts from \n
L. Vilhuber and J. M. Abowd\, “Presentation: SOLE 2016: Usage and
outcomes of the Synthetic Data Server\,” NSF Census Research Network – NCR
N-Cornell\, 1813:43883\, 2016. [Abstract] [URL]
[Bibtex]\nThe Synthetic Data Server (SDS) at Cornell University was set
up to provide early access to new synthetic data products by the U.S. Cens
us Bureau. These datasets are made available to interested researchers in
a controlled environment\, prior to a more generalized release. Over the p
ast 5 years\, 4 synthetic datasets were made available on the server\, and
over 100 users have accessed the server over that time period. This paper
reports on interim outcomes of the activity: results of validation reques
ts from a user perspective\, functioning of the feedback loop due to valid
ation and user input\, and the role of the SDS as an access gateway to and
educational tool for other mechanisms of accessing detailed person\, hous
ehold\, establishment\, and firm statistics.\n@TechReport{Vilhuber2016-cy\
,\ntitle = ‘Presentation: {SOLE} 2016: Usage and outcomes of the Synthetic
\nData Server’\,\nauthor = ‘Vilhuber\, Lars and Abowd\, John M’\,\nabstrac
t = ‘The Synthetic Data Server (SDS) at Cornell University was set\nup to
provide early access to new synthetic data products by\nthe U.S. Census Bu
reau. These datasets are made available to\ninterested researchers in a co
ntrolled environment\, prior to a\nmore generalized release. Over the past
5 years\, 4 synthetic\ndatasets were made available on the server\, and o
ver 100 users\nhave accessed the server over that time period. This paper
\nreports on interim outcomes of the activity: results of\nvalidation requ
ests from a user perspective\, functioning of the\nfeedback loop due to va
lidation and user input\, and the role of\nthe SDS as an access gateway to
and educational tool for other\nmechanisms of accessing detailed person\,
household\,\nestablishment\, and firm statistics.’\,\nconference = ‘SOLE
2016’\,\ninstitution = {NSF Census Research Network – NCRN-Cornell }\,\nye
ar = {2016}\,\nnumber = {1813:43883}\,\nURL = {http://hdl.handle.net/1813/
43883}\n}\nOther recommended readings\n L. Vilhuber\, J. M. Abowd\,
and J. P. Reiter\, “Synthetic establishment microdata around the world\,”
Statistical Journal of the International Association for Official Statist
ics\, vol. 32\, iss. 1\, pp. 65-68\, 2016. [Abstract]
[DOI] [Bibtex]\nIn contrast to the many public-use microdata sample
s available for individual and household data from many statistical agenci
es around the world\, there are virtually no establishment or firm microda
ta available. In large part\, this difficulty in providing access to busin
ess micro data is due to the skewed and sparse distributions that characte
rize business data. Synthetic data are simulated data generated from stati
stical models. We organized sessions at the 2015 World Statistical Congres
s and the 2015 Joint Statistical Meetings\, highlighting work on synthetic
establishment microdata. This overview situates those papers\, published
in this issue\, within the broader literature.\n@article{VilhuberAbowdReit
er:Synthetic:SJIAOS:2016\,\ntitle = {Synthetic establishment microdata aro
und the world}\,\njournal = {Statistical Journal of the International Asso
ciation for Official Statistics}\,\nauthor = {Lars Vilhuber and John M. Ab
owd and Jerome P. Reiter}\,\nyear=2016\,\nvolume={32}\,\nnumber={1}\,\npag
es={65-68}\,\ndoi={10.3233/SJI-160964}\,\nabstract={In contrast to the man
y public-use microdata samples available for individual and household data
from many statistical agencies around the world\, there are virtually no
establishment or firm microdata available. In large part\, this difficulty
in providing access to business micro data is due to the skewed and spars
e distributions that characterize business data. Synthetic data are simula
ted data generated from statistical models. We organized sessions at the 2
015 World Statistical Congress and the 2015 Joint Statistical Meetings\, h
ighlighting work on synthetic establishment microdata. This overview situa
tes those papers\, published in this issue\, within the broader literature
.}\,\n}\n S. K. Kinney\, J. P. Reiter\, A. P. Reznek\, J. Miranda\,
R. S. Jarmin\, and J. M. Abowd\, “Towards Unrestricted Public Use Busines
s Microdata: The Synthetic Longitudinal Business Database\,” International
Statistical Review\, vol. 79\, iss. 3\, pp. 362-384\, 2011. [Ab
stract] [DOI] [URL] [Bibtex]\nIn most countries\, national
statistical agencies do not release establishment-level business microdat
a\, because doing so represents too large a risk to establishments’ confid
entiality. One approach with the potential for overcoming these risks is t
o release synthetic data\; that is\, the released establishment data are s
imulated from statistical models designed to mimic the distributions of th
e underlying real microdata. In this article\, we describe an application
of this strategy to create a public use file for the Longitudinal Business
Database\, an annual economic census of establishments in the United Stat
es comprising more than 20 million records dating back to 1976. The U.S. B
ureau of the Census and the Internal Revenue Service recently approved the
release of these synthetic microdata for public use\, making the syntheti
c Longitudinal Business Database the first-ever business microdata set pub
licly released in the United States. We describe how we created the synthe
tic data\, evaluated analytical validity\, and assessed disclosure risk.\n
@ARTICLE{Kinney2011-ic\,\ntitle = ‘Towards Unrestricted Public Use Busines
s Microdata: The\nSynthetic Longitudinal Business Database’\,\nauthor = ‘K
inney\, Satkartar K and Reiter\, Jerome P and Reznek\, Arnold P\nand Miran
da\, Javier and Jarmin\, Ron S and Abowd\, John M’\,\njournal = {Internati
onal Statistical Review}\,\nyear = {2011}\,\nvolume = {79}\,\npages = {362
–384}\,\nnumber = {3}\,\ndoi = {10.1111/j.1751-5823.2011.00153.x}\,\nissn
= {1751-5823}\,\nkeywords = {Economic census\, data confidentiality\, synt
hetic data\, disclosure\nlimitation}\,\nowner = {vilhuber}\,\npublisher =
{Blackwell Publishing Ltd}\,\ntimestamp = {2012.09.04}\,\nabstract = {In m
ost countries\, national statistical agencies do not release establishment
-level\nbusiness microdata\, because doing so represents too large a risk
\nto establishments’ confidentiality. One approach with the potential\nfor
overcoming these risks is to release synthetic data\; that is\,\nthe rele
ased establishment data are simulated from statistical models\ndesigned to
mimic the distributions of the underlying real microdata.\nIn this articl
e\, we describe an application of this strategy to create\na public use fi
le for the Longitudinal Business Database\, an annual\neconomic census of
establishments in the United States comprising\nmore than 20 million recor
ds dating back to 1976. The U.S. Bureau\nof the Census and the Internal Re
venue Service recently approved\nthe release of these synthetic microdata
for public use\, making the\nsynthetic Longitudinal Business Database the
first-ever business\nmicrodata set publicly released in the United States.
We describe\nhow we created the synthetic data\, evaluated analytical val
idity\,\nand assessed disclosure risk.}\,\nurl = {http://dx.doi.org/10.111
1/j.1751-5823.2011.00153.x}\n}\n J. Drechsler and L. Vilhuber\, “A
First Step Towards A German SynLBD: Constructing A German Longitudinal Bus
iness Database\,” Statistical Journal of the IAOS: Journal of the Internat
ional Association for Official Statistics\, vol. 30\, 2014. [Abs
tract] [DOI] [URL] [Bibtex]\nOne major criticism against t
he use of synthetic data has been that the efforts necessary to generate u
seful synthetic data are so in- tense that many statistical agencies canno
t afford them. We argue many lessons in this evolving field have been lear
ned in the early years of synthetic data generation\, and can be used in t
he development of new synthetic data products\, considerably reducing the
required in- vestments. The final goal of the project described in this pa
per will be to evaluate whether synthetic data algorithms developed in the
U.S. to generate a synthetic version of the Longitudinal Business Databas
e (LBD) can easily be transferred to generate a similar data product for o
ther countries. We construct a German data product with infor- mation comp
arable to the LBD – the German Longitudinal Business Database (GLBD) – tha
t is generated from different administrative sources at the Institute for
Employment Research\, Germany. In a fu- ture step\, the algorithms develop
ed for the synthesis of the LBD will be applied to the GLBD. Extensive eva
luations will illustrate whether the algorithms provide useful synthetic d
ata without further adjustment. The ultimate goal of the project is to pro
vide access to multiple synthetic datasets similar to the SynLBD at Cornel
l to enable comparative studies between countries. The Synthetic GLBD is a
first step towards that goal.\n@Article{SJIAOS-2014b\,\nTitle = {{A First
Step Towards A {German} {SynLBD}: {C}onstructing A {G}erman {L}ongitudina
l {B}usiness {D}atabase}}\,\nAuthor = {J{‘o}rg Drechsler and Lars Vilhuber
}\,\nJournal = {Statistical Journal of the IAOS: Journal of the Internatio
nal Association for Official Statistics}\,\nYear = {2014}\,\nVolume = {30}
\,\nAbstract = {One major criticism against the use of synthetic data has
been that the efforts necessary to generate useful synthetic data are so i
n- tense that many statistical agencies cannot afford them. We argue many
lessons in this evolving field have been learned in the early years of syn
thetic data generation\, and can be used in the development of new synthet
ic data products\, considerably reducing the required in- vestments. The f
inal goal of the project described in this paper will be to evaluate wheth
er synthetic data algorithms developed in the U.S. to generate a synthetic
version of the Longitudinal Business Database (LBD) can easily be transfe
rred to generate a similar data product for other countries. We construct
a German data product with infor- mation comparable to the LBD – the Germa
n Longitudinal Business Database (GLBD) – that is generated from different
administrative sources at the Institute for Employment Research\, Germany
. In a fu- ture step\, the algorithms developed for the synthesis of the L
BD will be applied to the GLBD. Extensive evaluations will illustrate whet
her the algorithms provide useful synthetic data without further adjustmen
t. The ultimate goal of the project is to provide access to multiple synth
etic datasets similar to the SynLBD at Cornell to enable comparative studi
es between countries. The Synthetic GLBD is a first step towards that goal
.}\,\nDOI = {10.3233/SJI-140812}\,\nKeywords = {confidentiality\; comparat
ive studies\; US Longitudinal Business Database\; synthetic data}\,\nOwner
= {vilhuber}\,\nTimestamp = {2014.03.24}\,\nURL = {http://iospress.metapr
ess.com/content/X415V18331Q33150}\n}\nFunding\nFunding for the workshop is
provided by the National Science Foundation (CNS-1012593\, SES-1131848) a
nd the Alfred P. Sloan Foundation. The organizers thank the National Acad
emies’ Committee on National Statistics for hosting the seminar.
DTSTART;TZID=America/New_York:20170509T090000
DTEND;TZID=America/New_York:20170509T140000
GEO:+38.896556;-77.019424
LOCATION:National Academy of Sciences @ 500 5th St NW\, Washington\, DC 200
01\, USA
SEQUENCE:0
SUMMARY:Synthetic Longitudinal Business Data International User Seminar
URL:https://www.vilhuber.com/lars/event/synthetic-longitudinal-business-dat
a-international-user-seminar/
X-COST-TYPE:free
X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\n\\n\\n

In this seminar\, we discuss with interested
parties the conditions necessary to implement the SynLBD approach\, with t
he goal of providing other statistical agencies a straightforward toolkit
to implement the same procedure on their own data. Our hope is that by imp
lementing similar procedures on comparable business microdata\, new resear
ch both within and across countries can be enabled. The ideal end result i
s a series of country-specific datasets on establishments and/or firms ava
ilable within the same computing environment. We discuss the data and soft
ware requirements for the lowest-cost approach\, the disclosure protection
statistics already implemented that can be used to achieve release of the
data in this way\, the validation procedures that an agency should agree
to\, and the likely cost of maintaining such procedures. The seminar brin
gs together academics working on cutting-edge methods for the protection o
f privacy in statistical databases\, and researchers and implementers at s
tatistical agencies that have started or are interested in starting a simi
lar project.\nFive sessions will touch on the full lifecycle of a Sy
nLBD development and implementation\, and will follow the same pattern. We
will first discuss existing implementations and experiences\, and will th
en as a group discuss issues as they pertain to the broader community. Emp
hasis should be on discussing open issues\, specific solutions to specific
problems. Proceedings will be published later.\nFor more details\,
please see the full agenda.\nProceedings\nVilhuber\, Lars\; Ki
nney\, Saki\; Schmutte\, Ian M.\, 2017. “Proceedings from the Synthetic LB
D International Seminar”\, Labor Dynamics Institute Document 44\, availabl
e at http://digitalcommons.ilr.cornell.edu/ldi/44/ or http://hdl.handle.n
et/1813/52472\nDocuments\nOverview of the SynLBD methodology\nLink to presentation. Contains excerpts from

\n

S. Kinn
ey\, “Presentation: Synthetic Data Generation for Firm Links\,” NSF Census
Research Network – NCRN-Cornell\, 1813:50054\, 2016. [Abstract
] [URL] [Bibtex]\nIn most countries\, national stat
istical agencies do not release establishment-level business microdata\, b
ecause doing so represents too large a risk to establishments’ confidentia
lity. Agencies potentially can manage these risks by releasing synthetic m
icrodata\, i.e.\, individual establishment records simulated from statisti
cal models designed to mimic the joint distribution of the underlying obse
rved data. Previously\, we used this approach to generate a public-use ver
sion—now available for public use—of the U.S. Census Bureau’s Longitudinal
Business Database (LBD)\, a longitudinal census of establishments dating
back to 1976. While the synthetic LBD has proven to be a useful product\,
we now seek to improve and expand it by using new synthesis models and add
ing features. This paper describes our efforts to create the second genera
tion of the SynLBD\, including synthesis procedures that we believe could
be replicated in other contexts.

\n

@TechReport{kinney-2016-ecommons\
,\ntitle = {Presentation: Synthetic Data Generation for Firm Links}\
,\nauthor = {Kinney\, Saki}\,\ninstitution = {NSF Census Resea
rch Network – NCRN-Cornell }\,\nyear = {2016}\,\nnumber = {181
3:50054}\,\nAbstract = {In most countries\, national statistical age
ncies do not release establishment-level business microdata\, because doin
g so represents too large a risk to establishments’ confidentiality. Agenc
ies potentially can manage these risks by releasing synthetic microdata\,
i.e.\, individual establishment records simulated from statistical models
designed to mimic the joint distribution of the underlying observed data.
Previously\, we used this approach to generate a public-use version—now av
ailable for public use—of the U.S. Census Bureau’s Longitudinal Business D
atabase (LBD)\, a longitudinal census of establishments dating back to 197
6. While the synthetic LBD has proven to be a useful product\, we now seek
to improve and expand it by using new synthesis models and adding feature
s. This paper describes our efforts to create the second generation of the
SynLBD\, including synthesis procedures that we believe could be replicat
ed in other contexts.}\,\nkeywords = {confidentiality\; US Longitudi
nal Business Database\; synthetic data}\,\nowner = {vilhuber}\,\nURL = {http://hdl.handle.net/1813/50054}\n}

\n

Inputs to the
SynLBD process\nLink to presentation. Based on Drechsler and Vilhub
er (2014).\nConfidentiality of the SynLBD\nLink to presentatio
n. Contains excerpts from

\n

S. Kinney\, “Presentation: Synt
hetic Data Generation for Firm Links\,” NSF Census Research Network – NCRN
-Cornell\, 1813:50054\, 2016. [Abstract] [URL]
[Bibtex]\nIn most countries\, national statistical agencies do not
release establishment-level business microdata\, because doing so represen
ts too large a risk to establishments’ confidentiality. Agencies potential
ly can manage these risks by releasing synthetic microdata\, i.e.\, indivi
dual establishment records simulated from statistical models designed to m
imic the joint distribution of the underlying observed data. Previously\,
we used this approach to generate a public-use version—now available for p
ublic use—of the U.S. Census Bureau’s Longitudinal Business Database (LBD)
\, a longitudinal census of establishments dating back to 1976. While the
synthetic LBD has proven to be a useful product\, we now seek to improve a
nd expand it by using new synthesis models and adding features. This paper
describes our efforts to create the second generation of the SynLBD\, inc
luding synthesis procedures that we believe could be replicated in other c
ontexts.

\n

@TechReport{kinney-2016-ecommons\,\ntitle = {Presen
tation: Synthetic Data Generation for Firm Links}\,\nauthor = {Kinne
y\, Saki}\,\ninstitution = {NSF Census Research Network – NCRN-Corne
ll }\,\nyear = {2016}\,\nnumber = {1813:50054}\,\nAbstra
ct = {In most countries\, national statistical agencies do not release est
ablishment-level business microdata\, because doing so represents too larg
e a risk to establishments’ confidentiality. Agencies potentially can mana
ge these risks by releasing synthetic microdata\, i.e.\, individual establ
ishment records simulated from statistical models designed to mimic the jo
int distribution of the underlying observed data. Previously\, we used thi
s approach to generate a public-use version—now available for public use—o
f the U.S. Census Bureau’s Longitudinal Business Database (LBD)\, a longit
udinal census of establishments dating back to 1976. While the synthetic L
BD has proven to be a useful product\, we now seek to improve and expand i
t by using new synthesis models and adding features. This paper describes
our efforts to create the second generation of the SynLBD\, including synt
hesis procedures that we believe could be replicated in other contexts.}\,
\nkeywords = {confidentiality\; US Longitudinal Business Database\;
synthetic data}\,\nowner = {vilhuber}\,\nURL = {http://hdl.han
dle.net/1813/50054}\n}

\n

Validation Servers\nLink to pre
sentation. Contains excerpts from

\n

L. Vilhuber and J. M. A
bowd\, “Presentation: SOLE 2016: Usage and outcomes of the Synthetic Data
Server\,” NSF Census Research Network – NCRN-Cornell\, 1813:43883\, 2016.
[Abstract] [URL] [Bibtex]\nThe Synthetic
Data Server (SDS) at Cornell University was set up to provide early acces
s to new synthetic data products by the U.S. Census Bureau. These datasets
are made available to interested researchers in a controlled environment\
, prior to a more generalized release. Over the past 5 years\, 4 synthetic
datasets were made available on the server\, and over 100 users have acce
ssed the server over that time period. This paper reports on interim outco
mes of the activity: results of validation requests from a user perspectiv
e\, functioning of the feedback loop due to validation and user input\, an
d the role of the SDS as an access gateway to and educational tool for oth
er mechanisms of accessing detailed person\, household\, establishment\, a
nd firm statistics.

\n

@TechReport{Vilhuber2016-cy\,\ntitle = ‘
Presentation: {SOLE} 2016: Usage and outcomes of the Synthetic\nData
Server’\,\nauthor = ‘Vilhuber\, Lars and Abowd\, John M’\,\na
bstract = ‘The Synthetic Data Server (SDS) at Cornell University was set\nup to provide early access to new synthetic data products by\n
the U.S. Census Bureau. These datasets are made available to\nintere
sted researchers in a controlled environment\, prior to a\nmore gene
ralized release. Over the past 5 years\, 4 synthetic\ndatasets were
made available on the server\, and over 100 users\nhave accessed the
server over that time period. This paper\nreports on interim outcom
es of the activity: results of\nvalidation requests from a user pers
pective\, functioning of the\nfeedback loop due to validation and us
er input\, and the role of\nthe SDS as an access gateway to and educ
ational tool for other\nmechanisms of accessing detailed person\, ho
usehold\,\nestablishment\, and firm statistics.’\,\nconference
= ‘SOLE 2016’\,\ninstitution = {NSF Census Research Network – NCRN-
Cornell }\,\nyear = {2016}\,\nnumber = {1813:43883}\,\nU
RL = {http://hdl.handle.net/1813/43883}\n}

\n

Other recommended
readings

\n

L. Vilhuber\, J. M. Abowd\, and J. P. Reiter\, “
Synthetic establishment microdata around the world\,” Statistical Journal
of the International Association for Official Statistics\, vol. 32\, iss.
1\, pp. 65-68\, 2016. [Abstract] [DOI] [Bibtex]
\nIn contrast to the many public-use microdata samples available for
individual and household data from many statistical agencies around the w
orld\, there are virtually no establishment or firm microdata available. I
n large part\, this difficulty in providing access to business micro data
is due to the skewed and sparse distributions that characterize business d
ata. Synthetic data are simulated data generated from statistical models.
We organized sessions at the 2015 World Statistical Congress and the 2015
Joint Statistical Meetings\, highlighting work on synthetic establishment
microdata. This overview situates those papers\, published in this issue\,
within the broader literature.

\n

@article{VilhuberAbowdReiter:Synth
etic:SJIAOS:2016\,\ntitle = {Synthetic establishment microdata aroun
d the world}\,\njournal = {Statistical Journal of the International
Association for Official Statistics}\,\nauthor = {Lars Vilhuber and
John M. Abowd and Jerome P. Reiter}\,\nyear=2016\,\nvolume={32
}\,\nnumber={1}\,\npages={65-68}\,\ndoi={10.3233/SJI-160
964}\,\nabstract={In contrast to the many public-use microdata sampl
es available for individual and household data from many statistical agenc
ies around the world\, there are virtually no establishment or firm microd
ata available. In large part\, this difficulty in providing access to busi
ness micro data is due to the skewed and sparse distributions that charact
erize business data. Synthetic data are simulated data generated from stat
istical models. We organized sessions at the 2015 World Statistical Congre
ss and the 2015 Joint Statistical Meetings\, highlighting work on syntheti
c establishment microdata. This overview situates those papers\, published
in this issue\, within the broader literature.}\,\n}

\n

S. K. Kinney\, J. P. Reiter\, A. P. Reznek\, J. Miranda\, R. S. Jarmin\,
and J. M. Abowd\, “Towards Unrestricted Public Use Business Microdata: Th
e Synthetic Longitudinal Business Database\,” International Statistical Re
view\, vol. 79\, iss. 3\, pp. 362-384\, 2011. [Abstract]
[DOI] [URL] [Bibtex]\nIn most countries\, national statisti
cal agencies do not release establishment-level business microdata\, becau
se doing so represents too large a risk to establishments’ confidentiality
. One approach with the potential for overcoming these risks is to release
synthetic data\; that is\, the released establishment data are simulated
from statistical models designed to mimic the distributions of the underly
ing real microdata. In this article\, we describe an application of this s
trategy to create a public use file for the Longitudinal Business Database
\, an annual economic census of establishments in the United States compri
sing more than 20 million records dating back to 1976. The U.S. Bureau of
the Census and the Internal Revenue Service recently approved the release
of these synthetic microdata for public use\, making the synthetic Longitu
dinal Business Database the first-ever business microdata set publicly rel
eased in the United States. We describe how we created the synthetic data\
, evaluated analytical validity\, and assessed disclosure risk.

\n

@A
RTICLE{Kinney2011-ic\,\ntitle = ‘Towards Unrestricted Public Use Bus
iness Microdata: The\nSynthetic Longitudinal Business Database’\,\nauthor = ‘Kinney\, Satkartar K and Reiter\, Jerome P and Reznek\, Arn
old P\nand Miranda\, Javier and Jarmin\, Ron S and Abowd\, John M’\,
\njournal = {International Statistical Review}\,\nyear = {2011
}\,\nvolume = {79}\,\npages = {362–384}\,\nnumber = {3}\
,\ndoi = {10.1111/j.1751-5823.2011.00153.x}\,\nissn = {1751-58
23}\,\nkeywords = {Economic census\, data confidentiality\, syntheti
c data\, disclosure\nlimitation}\,\nowner = {vilhuber}\,
\npublisher = {Blackwell Publishing Ltd}\,\ntimestamp = {2012.09.04}
\,\nabstract = {In most countries\, national statistical agencies do
not release establishment-level\nbusiness microdata\, because doing
so represents too large a risk\nto establishments’ confidentiality.
One approach with the potential\nfor overcoming these risks is to r
elease synthetic data\; that is\,\nthe released establishment data a
re simulated from statistical models\ndesigned to mimic the distribu
tions of the underlying real microdata.\nIn this article\, we descri
be an application of this strategy to create\na public use file for
the Longitudinal Business Database\, an annual\neconomic census of e
stablishments in the United States comprising\nmore than 20 million
records dating back to 1976. The U.S. Bureau\nof the Census and the
Internal Revenue Service recently approved\nthe release of these syn
thetic microdata for public use\, making the\nsynthetic Longitudinal
Business Database the first-ever business\nmicrodata set publicly r
eleased in the United States. We describe\nhow we created the synthe
tic data\, evaluated analytical validity\,\nand assessed disclosure
risk.}\,\nurl = {http://dx.doi.org/10.1111/j.1751-5823.2011.00153.x}
\n}

\n

J. Drechsler and L. Vilhuber\, “A First Step Tow
ards A German SynLBD: Constructing A German Longitudinal Business Database
\,” Statistical Journal of the IAOS: Journal of the International Associat
ion for Official Statistics\, vol. 30\, 2014. [Abstract]
[DOI] [URL] [Bibtex]\nOne major criticism against the use o
f synthetic data has been that the efforts necessary to generate useful sy
nthetic data are so in- tense that many statistical agencies cannot afford
them. We argue many lessons in this evolving field have been learned in t
he early years of synthetic data generation\, and can be used in the devel
opment of new synthetic data products\, considerably reducing the required
in- vestments. The final goal of the project described in this paper will
be to evaluate whether synthetic data algorithms developed in the U.S. to
generate a synthetic version of the Longitudinal Business Database (LBD)
can easily be transferred to generate a similar data product for other cou
ntries. We construct a German data product with infor- mation comparable t
o the LBD – the German Longitudinal Business Database (GLBD) – that is gen
erated from different administrative sources at the Institute for Employme
nt Research\, Germany. In a fu- ture step\, the algorithms developed for t
he synthesis of the LBD will be applied to the GLBD. Extensive evaluations
will illustrate whether the algorithms provide useful synthetic data with
out further adjustment. The ultimate goal of the project is to provide acc
ess to multiple synthetic datasets similar to the SynLBD at Cornell to ena
ble comparative studies between countries. The Synthetic GLBD is a first s
tep towards that goal.

\n

@Article{SJIAOS-2014b\,\nTitle = {{A
First Step Towards A {German} {SynLBD}: {C}onstructing A {G}erman {L}ongit
udinal {B}usiness {D}atabase}}\,\nAuthor = {J{‘o}rg Drechsler and La
rs Vilhuber}\,\nJournal = {Statistical Journal of the IAOS: Journal
of the International Association for Official Statistics}\,\nYear =
{2014}\,\nVolume = {30}\,\nAbstract = {One major criticism aga
inst the use of synthetic data has been that the efforts necessary to gene
rate useful synthetic data are so in- tense that many statistical agencies
cannot afford them. We argue many lessons in this evolving field have bee
n learned in the early years of synthetic data generation\, and can be use
d in the development of new synthetic data products\, considerably reducin
g the required in- vestments. The final goal of the project described in t
his paper will be to evaluate whether synthetic data algorithms developed
in the U.S. to generate a synthetic version of the Longitudinal Business D
atabase (LBD) can easily be transferred to generate a similar data product
for other countries. We construct a German data product with infor- matio
n comparable to the LBD – the German Longitudinal Business Database (GLBD)
– that is generated from different administrative sources at the Institut
e for Employment Research\, Germany. In a fu- ture step\, the algorithms d
eveloped for the synthesis of the LBD will be applied to the GLBD. Extensi
ve evaluations will illustrate whether the algorithms provide useful synth
etic data without further adjustment. The ultimate goal of the project is
to provide access to multiple synthetic datasets similar to the SynLBD at
Cornell to enable comparative studies between countries. The Synthetic GLB
D is a first step towards that goal.}\,\nDOI = {10.3233/SJI-140812}\
,\nKeywords = {confidentiality\; comparative studies\; US Longitudin
al Business Database\; synthetic data}\,\nOwner = {vilhuber}\,
\nTimestamp = {2014.03.24}\,\nURL = {http://iospress.metapress.com/c
ontent/X415V18331Q33150}\n}

\n

Funding\nFunding for the w
orkshop is provided by the National Science Foundation (CNS-1012593\, SES-
1131848) and the Alfred P. Sloan Foundation. The organizers thank the Nat
ional Academies’ Committee on National Statistics for hosting the seminar.

\n

X-TAGS;LANGUAGE=en-US:NCRN\,Sloan\,SynLBD\,TC-Large
END:VEVENT
BEGIN:VEVENT
UID:ai1ec-3241@www.ncrn.cornell.edu
DTSTAMP:20171214T021012Z
CATEGORIES;LANGUAGE=en-US:Conferences\,Presentation\,vilhuber
CONTACT:http://www.isi2017.org/
DESCRIPTION:Together with a few others from around the world\, Lars Vilhube
r will be presenting on results from a synthetic data validation cycle at
the International Statistical Institute’s World Statistical Congress.
DTSTART;TZID=America/New_York:20170718T103000
DTEND;TZID=America/New_York:20170718T123000
GEO:+31.629472;-7.981084
LOCATION:Palais des Congrès - Mansouri Eddahbi @ Marrakesh\, Morocco
SEQUENCE:0
SUMMARY:Synthetic Datasets for Statistical Disclosure Control – Research an
d Applications Around the World
URL:https://www.vilhuber.com/lars/event/synthetic-datasets-for-statistical-
disclosure-control-research-and-applications-around-the-world/
X-COST-TYPE:free
X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\n\\n\\n

Together with a few others from around the wo
rld\, Lars Vilhuber will be presenting on results from a synthetic data va
lidation cycle at the International Statistical Institute’s World Statisti
cal Congress.

Lecture notes

\n
END:VEVENT
BEGIN:VEVENT
UID:ai1ec-413@www.vrdc.cornell.edu/info747x
DTSTAMP:20171214T021012Z
CATEGORIES;LANGUAGE=en-US:INFO7470
CONTACT:
DESCRIPTION:An overview of the U.S. statistical system is given.\n \nLectur
e notes\n\nINFO7470-S1-2016-Overview of the U.S. Statistical System
DTSTART;TZID=America/New_York:20170831T162500
DTEND;TZID=America/New_York:20170831T180000
SEQUENCE:0
SUMMARY:Session 1: Overview of the U.S. Statistical System
URL:https://www.vilhuber.com/lars/event/session-1-overview-of-the-u-s-stati
stical-system/
X-COST-TYPE:free
X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\n\\n\\n

\n

An overview of the U.S. statistical system
is given.

\n

\n

Lecture notes

\n
END:VEVENT
BEGIN:VEVENT
UID:ai1ec-420@www.vrdc.cornell.edu/info747x
DTSTAMP:20171214T021012Z
CATEGORIES;LANGUAGE=en-US:INFO7470
CONTACT:
DESCRIPTION:Margo Anderson (University of Wisconsin – Milwaukee) presents o
n the history of the federal statistical system (flipped classroom). She w
ill be present to discuss the lecture.\nReadings and other information\n\n
Anderson\, Margo. The American Census: A Social History\, Second Edition.
Yale University Press\, 2015.\nAnderson\, Margo J.\, and Seltzer\, William
. “Federal Statistical Confidentiality and Business Data: Twentieth Centur
y Challenges and Continuing Issues’.” Journal of Privacy and Confidentiali
ty 1.1 (2009): 7-52\, 55-58.\n\nLecture Notes\n“Historical Perspectives on
the U.S. Federal Statistical System”\nAbout the Guest Lecturer\nMargo And
erson\, University of Wisconsin – Milwaukee\n\n\n\nMargo Anderson is Disti
nguished Professor of History & Urban Studies at the University of Wiscons
on – Milwaukee. She specializes in American social\, urban and women’s his
tory and has research interests in both urban history and the history of t
he social sciences and the development of statistical data systems\, parti
cularly the census. Her publications include Who Counts? The Politics of C
ensus Taking in Contemporary America (2001)\, coauthored with Stephen E. F
ienberg\, and a coedited volume with Victor Greene\, Perspectives on Milwa
ukee’s Past (University of Illinois Press\, 2009). Her most recent publica
tion\, of particular relevance to this class\, is The American Census: A S
ocial History\, Second Edition. Yale University Press\, 2015. More informa
tion about Margo can be found at her University of Wisconsin-Milwaukee web
site and her personal website.
DTSTART;TZID=America/New_York:20170907T162500
DTEND;TZID=America/New_York:20170907T180000
SEQUENCE:0
SUMMARY:Session 2: History of the Federal Statistical Infrastructure
URL:https://www.vilhuber.com/lars/event/session-2-history-of-the-federal-st
atistical-infrastructure-2/
X-COST-TYPE:free
X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\n\\n\\n

\n

Margo Anderson (University of Wisconsin
– Milwaukee) presents on the history of the federal statistical system (fl
ipped classroom). She will be present to discuss the lecture.

L
ecture Notes

About the Guest Lecturer

\n

Margo Anderson\, University of Wis
consin – Milwaukee

\n

\n

\n

\n

Margo Anderson is Distinguished Professor of History & Urban S
tudies at the University of Wisconson – Milwaukee. She specializes in Amer
ican social\, urban and women’s history and has research interests in both
urban history and the history of the social sciences and the development
of statistical data systems\, particularly the census. Her publications in
clude Who Counts? The Politics of Census Taking in Contemporary America
(2001)\, coauthored with Stephen E. Fienberg\, and a coedited volume
with Victor Greene\, Perspectives on Milwaukee’s Past (University o
f Illinois Press\, 2009). Her most recent publication\, of particular rele
vance to this class\, is The American Census: A Social History\, S
econd Edition. Yale University Press\, 2015. More information about Ma
rgo can be found at her University of Wisconsin-Milwaukee website and her personal website.

\n\n

\n

\n

\n

\n
END:VEVENT
BEGIN:VEVENT
UID:ai1ec-422@www.vrdc.cornell.edu/info747x
DTSTAMP:20171214T021012Z
CATEGORIES;LANGUAGE=en-US:INFO7470
CONTACT:
DESCRIPTION:This class coincides with FSRDC system’s annual conference. The
re will be no in-classroom activity at most sites on this day (please chec
k with local coordinator). The content of this section will be discussed o
n Sept 21\, 2017\, so students should take the time to view the materials
on edX during this week.\nLecture notes\n\nINFO7470-S3 PopulationsFramesSa
mples
DTSTART;TZID=America/New_York:20170914T162500
DTEND;TZID=America/New_York:20170914T163000
SEQUENCE:0
SUMMARY:Session 3: [No class] Universes\, Populations\, Frames\, and Sampli
ng
URL:https://www.vilhuber.com/lars/event/session-3-universes-populations-fra
mes-and-sampling-2/
X-COST-TYPE:free
X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\n\\n\\n

\n

This class coincides with FSRDC system’s an
nual conference. There will be no in-classroom activity a
t most sites on this day (please check with local coordinator). The conten
t of this section will be discussed on Sept 21\, 2017\, so students should take the time to view the materials on edX during this week.

\n

Lecture notes

\n
END:VEVENT
BEGIN:VEVENT
UID:ai1ec-424@www.vrdc.cornell.edu/info747x
DTSTAMP:20171214T021012Z
CATEGORIES;LANGUAGE=en-US:INFO7470
CONTACT:
DESCRIPTION:This lecture is a “flipped” lecture.\nDiscussion lead\nWarren B
rown\, Cornell University\n\n\n\nWarren A. Brown is Senior Research Associ
ate at Cornell University where he directs the Program on Applied Demograp
hics and is the Research Director of the Cornell site of the New York Fede
ral Statistical Research Data Center\, a consortium of research institutio
ns in the New York metropolitan area and upstate New York. He is also the
2015-2016 President of the Association of Public Data Users (APDU) and ser
ving on the National Academy of Science’s Standing Committee on Reengineer
ing Census Operations. His teaching\, research and outreach efforts involv
e him with the application of demographic information to areas such as str
ategic planning for workforce and economic development\, consumer behavior
and market analysis\, households and housing market analysis\, regional t
ransportation planning\, hospitality and recreation industries\, health se
rvices for the elderly\, and environmental protection. He is an expert on
the American Community Survey.\n\n\n\nLecture Notes\n\nINFO7470-S4-Househo
ld Surveys\n\n\nDiscussion Notes\n\nAlso see National Research Council. 20
13. ‘Principles and Practices for a Federal Statistical Agency: Fifth Edit
ion.‘ Washington\, DC: The National Academies Press. doi: 10.17226/18318.
\n\n–>
DTSTART;TZID=America/New_York:20170921T162500
DTEND;TZID=America/New_York:20170921T180000
SEQUENCE:0
SUMMARY:Session 4: Measuring People and Households
URL:https://www.vilhuber.com/lars/event/session-4-measuring-people-and-hous
eholds-2/
X-COST-TYPE:free
X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\n\\n\\n

\n

This lecture is a “flipped” lecture.

\n<
h4 id='Discussion-lead'>Discussion lead\n

Warren Brown\, Cornell Un
iversity

\n

\n

\n\n

Warren A. Brown is Senior Research Associate at Cor
nell University where he directs the Program on Applied Demographics and is the Researc
h Director of the Cornell site of the New York Federal Statistical Research Data Center
\, a consortium of research institutions in the New York metropolitan area
and upstate New York. He is also the 2015-2016 President of the Association of Public Data Users (APDU) and serving
on the National Academy of Science’s Standing Committee on Reengineering C
ensus Operations. His teaching\, research and outreach efforts involve him
with the application of demographic information to areas such as strategi
c planning for workforce and economic development\, consumer behavior and
market analysis\, households and housing market analysis\, regional transp
ortation planning\, hospitality and recreation industries\, health service
s for the elderly\, and environmental protection. He is an expert on the A
merican Community Survey.

Lecture Notes

\n
END:VEVENT
BEGIN:VEVENT
UID:ai1ec-441@www.vrdc.cornell.edu/info747x
DTSTAMP:20171214T021012Z
CATEGORIES;LANGUAGE=en-US:INFO7470
CONTACT:
DESCRIPTION:Health statistics\, energy statistics\, agricultural statistics
\, others. Registered-based statistics\, organic data.\nDiscussion leads\n
Erica Groshen\, Cornell University\, will take part in the discussion.\n\n
\n\nErica L. Groshen is currently a visiting scholar at the ILR School of
Cornell University. She is the former Commissioner of the U.S. Bureau of L
abor Statistics (BLS)\, which is the principal federal statistical agency
responsible for measuring U.S.labor market activity\, working conditions a
nd inflation. Her term ended on January 27\, 2017. Previously\, Groshen se
rved as a vice president of the Federal Reserve Bank of New York. Dr. Gros
hen’s research focuses on jobless recoveries\, regional labor markets\, wa
ge rigidity and dispersion\, the male-female wage differential\, service-s
ector employment\, and the role of employers in labor market outcomes.\nSh
e has served as a member of the BLS Data Users’ Advisory Committee and the
Census Bureau’s 2010 Census Advisory Committee and also as an American Ec
onomic Association representative to the Census Advisory Committee of Prof
essional Associations. On behalf of the New York Fed\, she initiated the
effort to form the consortium of thirteen research institutions that creat
ed the New York Census Research Data Center at Baruch College in 2006. Gro
shen received a bachelor’s degree in economics and mathematics from the Un
iversity of Wisconsin-Madison and a Ph.D. in economics from Harvard Univer
sity.\n\n\n\nBrent Hueth\, University of Wisconsin-Madison\, will be discu
ssing topics related to agricultural statistics.\n\n\n\nBrent Hueth is Dir
ector of the University of Wisconsin Center for Cooperatives\, with an app
ointment as associate professor in the Department of Agricultural and Appl
ied Economics. Brent has published in top economics journal including the
American Journal of Agricultural Economics\, the Journal of Regulatory Eco
nomics\, the Journal of Economic Behavior and Organization\, and the Journ
al of Economics and Management Strategy. Brent is a Research Fellow at the
Institute for Exceptional Growth Companies\, and Executive Director of th
e Census Bureau’s Research Data Center at the University of Wisconsin—Madi
son. Brent’s research and teaching focus on agricultural markets\, coopera
tive enterprise\, and economic development. (More info)\n\n\n\nLecture Not
es\n\n\n\nHealth statistics (Lecture Notes: INFO7470-S7-Parker\, Jennifer
Parker (NCHS))\nAgricultural statistics (Lecture Notes: INFO7470-S7-DunnHu
eth\, additional materials\, INFO7470-S7-Migrant Farm Labor in the Census
of Agriculture\, Richard Dunn (University of Connecticut) and Brent Hueth
(University of Wisconsin-Madison))\nEIA presentation: INFO7470-S9-EIA-Back
ground-2016 (Jacob Bournazian (EIA))\nRegister-based statistics: INFO7470-
S9-Register-data\nAlternate data sources: INFO7470-S9-Organic-data\nUpdate
s by Erica Groshen on working with BLS data: INFO7470 2017 Groshen BLS\n\n
\n\n\nUpdates on the above: INFO7470-S9-Updates\n–>
DTSTART;TZID=America/New_York:20171012T162500
DTEND;TZID=America/New_York:20171012T180000
SEQUENCE:0
SUMMARY:Session 7: Data from Other Statistical Agencies and Other Sources
URL:https://www.vilhuber.com/lars/event/session-7-data-from-other-statistic
al-agencies-2/
X-COST-TYPE:free
X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\n\\n\\n

Discussion leads

\n

Eri
ca Groshen\, Cornell University\, will take part in the discussion.
\n

\n

\n

\n

Erica L. Groshen is currently a
visiting scholar at the ILR School of Cornell University. She is the forme
r Commissioner of the U.S. Bureau of Labor Statistics (BLS)\, which is the
principal federal statistical agency responsible for measuring U.S.labor
market activity\, working conditions and inflation. Her term ended on Janu
ary 27\, 2017. Previously\, Groshen served as a vice president of the Fede
ral Reserve Bank of New York. Dr. Groshen’s research focuses on jobless re
coveries\, regional labor markets\, wage rigidity and dispersion\, the mal
e-female wage differential\, service-sector employment\, and the role of e
mployers in labor market outcomes.\nShe has served as a member of th
e BLS Data Users’ Advisory Committee and the Census Bureau’s 2010 Census A
dvisory Committee and also as an American Economic Association representat
ive to the Census Advisory Committee of Professional Associations. On beh
alf of the New York Fed\, she initiated the effort to form the consortium
of thirteen research institutions that created the New York Census Researc
h Data Center at Baruch College in 2006. Groshen received a bachelor’s deg
ree in economics and mathematics from the University of Wisconsin-Madison
and a Ph.D. in economics from Harvard University.

\n\n

\n

\n

Brent Hueth\, University of Wisconsin-Madison\, will be discussing t
opics related to agricultural statistics.\n

\n
END:VEVENT
BEGIN:VEVENT
UID:41rg443ri65i1dv8g4eqncckjt@google.com
DTSTAMP:20171214T021012Z
CATEGORIES;LANGUAGE=en-US:Conferences\,Presentation\,vilhuber
CONTACT:
DESCRIPTION:Michael Ratcliffe will be presenting “Maintaining an Accurate A
ddress List: Reengineering Address Canvassing through the Use of Multiple
Sources and Methods” and discussing topics related to the definitions\, in
the past\, now\, and in the future\, of geography for census data collect
ion purposes. This presentation is part of INFO7470 (https://www.vrdc.cor
nell.edu/info747x/) but all are welcome.
DTSTART;TZID=America/New_York:20171019T162500
DTEND;TZID=America/New_York:20171019T180000
LOCATION:Ives 109
SEQUENCE:0
SUMMARY:INFO7470: Michael Ratcliffe: Maintaining an Accurate Address List:
Reengineering Address Canvassing through the Use of Multiple Sources and M
ethods
URL:https://www.vilhuber.com/lars/event/info7470-census-geography-and-maf-r
edesign-michael-ratcliffe/
X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\n\\n\\n

Michael Ratcliffe will be presenting “Maintai
ning an Accurate Address List: Reengineering Address Canvassing through th
e Use of Multiple Sources and Methods” and discussing topics related to th
e definitions\, in the past\, now\, and in the future\, of geography for c
ensus data collection purposes. This presentation is part of INFO7470 (ht
tps://www.vrdc.cornell.edu/info747x/) but all are welcome.

Lecture Notes\n

\n

\n

\n

\n\n
END:VEVENT
BEGIN:VEVENT
UID:710vv7t2mqp0fq39alskkd8gf1@google.com
DTSTAMP:20171214T021012Z
CATEGORIES;LANGUAGE=en-US:Conferences\,Presentation\,vilhuber
CONTACT:
DESCRIPTION:Barbara Downs (U.S. Census Bureau) will be discussing how best
to access data in the FSRDC system. This presentation is part of INFO7470
(https://www.vrdc.cornell.edu/info747x/) but all are welcome.
DTSTART;TZID=America/New_York:20171026T162500
DTEND;TZID=America/New_York:20171026T180000
LOCATION:Ives 109
SEQUENCE:0
SUMMARY:INFO7470: Restricted Access Data in the FSRDC system
URL:https://www.vilhuber.com/lars/event/info7470-restricted-access-data-in-
the-fsrdc-system/
X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\n\\n\\n

Barbara Downs (U.S. Census Bureau) will be di
scussing how best to access data in the FSRDC system. This presentation is
part of INFO7470 (https://www.vrdc.cornell.edu/info747x/) but all are wel
come.

\n

X-TAGS;LANGUAGE=en-US:Census@Cornell\,NCRN
END:VEVENT
BEGIN:VEVENT
UID:ai1ec-527@www.vrdc.cornell.edu/info747x
DTSTAMP:20171214T021012Z
CATEGORIES;LANGUAGE=en-US:INFO7470
CONTACT:
DESCRIPTION:Flipped classrom about access to restricted access data. Studen
ts will be introduced to the research proposal mechanism of the Federal St
atistical Research Data Center\, including data from the Census Bureau\, N
CHS\, and BLS.\nDiscussion will focus on how to access various restricted
access data sets. Guest presenters may be present live in the videoconfere
nce classroom.\nThe presentation on replicable science is moved to next we
ek a later date.\n\nLecture Notes\n\n\n\n\n\nRestricted Access Data: INFO7
470-S8-Proposals\, Kristen Monaco on BLS proposal review\, Matthias Umkehr
er on IAB access\n\n\n\n\nReplicable Science: INFO7470-S9-Replicable Scien
ce\n–>\n\n\nUpdates and Flipped Class questions: INFO7470-S8-Updates and f
lipped class questions\n–>\nAdditional links\n\nIRS SOI Joint Statistical
Research Program – with links to the 2014 Call for proposals (now closed)(
local copy) and projects in 2012 and 2014
DTSTART;TZID=America/New_York:20171026T162500
DTEND;TZID=America/New_York:20171026T180000
SEQUENCE:0
SUMMARY:Session 9: Restricted Access Data
URL:https://www.vilhuber.com/lars/event/session-9-restricted-access-data/
X-COST-TYPE:free
X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\n\\n\\n

\n

Flipped classrom about access to re
stricted access data. Students will be introduced to the research
proposal mechanism of the Federal Statistical Research Data Center\, incl
uding data from the Census Bureau\, NCHS\, and BLS.

\n

Discussion wil
l focus on how to access various restricted access data sets. Guest presen
ters may be present live in the videoconference classroom.

\n

The pre
sentation on replicable science is moved to next weeka lat
er date.

\n

\n

Lecture Notes

\n

\n

\n

\n

\n

Discussion lead

\n

John M. Abowd\, U.S. Census Bureau a
nd Cornell University\, will lead the discussion.\n

\n\n

\n

John M. Abowd i
s currently the Associate Director for Research and Methodology and Chief
Scientist\, United States Census Bureau\, on leave from Cornell University
. At Cornell\, he is the Edmund Ezra Day Professor of Economics\, Professo
r of Statistics and Information Science at Cornell University\, and the D
irector of the Labor Dynamics Institute (LDI) at Cornell. He previously s
erved as a Distinguished Senior Research Fellow at the United States Censu
s Bureau (1998-2015). He is also a Research Associate at the National Bur
eau of Economic Research (NBER\, Cambridge\, MA)\, Research Affiliate at t
he Centre de Recherche en Economie et Statistique (CREST\, Paris\, France)
\, Research Fellow at the Institute for Labor Economics (IZA\, Bonn\, Germ
any)\, and Research Fellow at IAB (Institut für Arbeitsmarkt-und Berufsfo
rschung\, Nürnberg\, Germany). He is the outgoing President (2014-2015) a
nd Fellow of the Society of Labor Economists\, a past Chair (2013) of the
Business and Economic Statistics Section and a Fellow of the American Stat
istical Association. He is an Elected Member of the International Statisti
cal Institute and a Fellow of the Econometric Society. He previously serve
d on the National Academies’ Committee on National Statistics (2010- 2016)
and on the American Economic Association’s Committee on Economic Statisti
cs. He served as Director of the Cornell Institute for Social and Economic
Research (CISER) from 1999 to 2007.

\n

\n

\n

\n

Extra lecture

\n\n

\n\n

Lab

\n

The lab (an edit and imputation
exercise) is posted on the INFO7470x edX site. You will need to create a program\, and u
pload the program (language of your choice) to edX. A toy example is illus
trated in a video on the edX site\, you can download the spreadsheet toy-example-imputation.xlsx here.