https://www.vilhuber.com/larsAmerica/New_YorkAmerica/New_YorkAmerica/New_York20171105T020000-0400-0500EST20180311T020000-0500-0400EDTai1ec-1074@www.ncrn.cornell.edu20171214T020841ZConferences,Presentation,vilhuberhttp://www.ssc.wisc.edu/naddi2015/The North American Data Documentation Initiative Conference (NADDI) is an opportunity for those using metadata standards and those interested in learning more to come together and learn from each other. Modeled on the successful European DDI User Conference, NADDI 2015 will be a three day conference (April 8-10) with invited and contributed presentations, and should be of interest to both researchers and data professionals in the social sciences and other disciplines.
Cornell’s Bill Block is on the Program Committee.
2015040820150411+43.076592;-89.412488University of Wisconsin-Madison @ Madison, WI, USA0NADDI 2015DDI,Metadata,NCRNcalendar.1602.field_date_with_zone.0@www.ncrn.info20171214T020841ZNCRN MeetingsLocation:
May 7: U.S. Census Bureau, Suitland, MD (Metro: Suitland) Registration (Registration closes on 5/1/15 at noon)
May 8: National Academy of Sciences Main Building, 2101 Constitution Avenue N.W., Washington, D.C. Registration
Hotel Accommodations: Please contact the NCRN Coordinating Office at info@ncrn.info for further information
Program
Detailed information on the program, with links to available presentations, are below.
May 7, 2015
May 8, 2015
May 7, 2015 (@ Census Bureau)
1:30-2:30 PI-only meeting (by invitation only)
1:00-2:30 Meetings between NCRN and Census collaborators
2:30-3:00 Coffee break
3:00-4:30 Research presentations
● Room 1:○ Rapid Cycle Evaluation for Field Operations (Benjamin Reist, CAD)○ The SIPP Adaptive Workload Project (Gina Walejko, CAD)○ Usability testing of the ACS on Smartphones & Tablets: Why we must optimize for mobile and what happens when we don’t. (Erica L Olmsted Hawala, CSM)○ Enhancing Operational Efficiency: Using paradata to improve the data collection process (Rachael Walsh, CSM)
● Room 2:○ Simulating tax liabilities in PSID (Luke Shaefer, U Michigan node) [poverty measures]○ SWELL (Summer Workgroup for Employer List Linking) presentation/discussion (Mark Kutzbach, CES)
● Room 3:○ CED²AR presentation: DDI­based tools and processes (metadata generation, collaborative editing, other uses) (Lars Vilhuber and Ben Perry, Cornell node)
3:00-4:30 Meetings between NCRN and Census collaborators (self-organized)
4:30-5:30 PI Meeting with Director Thompson and Bureau staff (by invitation only)
May 8, 2015 (@ National Academies)
8:30 Registration for Workshops: http://sgiz.mobi/s3/b2a05e12c2de
9:00-10:20 Concurrent Mini-Workshops on Technical Work of NCRN Nodes – Round 1
9:00-10:20 Session 1: NCRN and the Training of the Next Generation of Methodologists.​
Rebecca Nugent (CMU), ‘Building and Training the Next Generation of Survey Methodologists and Researchers’
Noel Cressie, Scott Holan, and Christopher K. Wikle (Missouri Node): ‘Training Undergraduates, Graduate Students, Postdocs, and Federal Agencies: Methodology, Data, and Science for Federal Statistics’
Allan McCutcheon (Nebraska), ‘Survey Informatics: the Future of Survey Methodology and Survey Statistics Training in the Academy?’
Discussant: (Stephanie Shipp, Virginia Tech)
9:00-10:20 Session 2: Uses and Benefits from Government Statistics
Bruce Spencer (Northwestern), ‘Research on Data Use, or Measuring the Value of Data Requires Knowing How Data Get Used’
Ian Schmutte (Cornell node/UGA), ‘Economics of Privacy’
Seth Spielman (Colorado), ‘Geographic cost-benefit analysis of federal statistics: Assessing criteria for “usable” statistical geographies’
Discussant (Mark Doms, Department of Commerce)
10:20-10:40 Coffee break
10:40-12:00 Concurrent Mini-Workshops on Technical Work of NCRN Nodes – Round 2
10:40-12:00 Session 3:Geographic Aspects of Statistics
Christopher K. Wikle (Missouri Node): Regionalization of Multiscale Spatial Processes using a Criterion for Spatial Aggregation Error
Scott H. Holan (Missouri Node): Models for Multiscale Spatially-Referenced Count Data
Nicholas Nagle (Colorado/Tennessee Node): Geographic aspects of direct and indirect estimators for small areas
Discussant: Thomas Louis (U.S. Census Bureau) [20 min]
10:40-11:35 Session 4a: Confidentiality
Jerry Reiter (Duke), A Vision for the future of data access
Lars Vilhuber (Cornell), Expanding the use of synthetic data
Discussant: Laura McKenna (U.S. Census Bureau) [15 min]
11:35-12:30 Session 4b:Statistics and unstructured data
Michael Cafarella (Michigan), Using Social Media to Measure Labor Market Flows
Beka Steorts (CMU)/Shrivastava (Cornell), Quantifying populations when we don’t know who is being counted: A real-life application
Discussant: Amy O’Hara (U.S. Census Bureau) [15 min]
12:15 Boxed lunches
1:45-2:15 Light Refreshments – First Floor East Court
2:15 Welcome to the Seminar — Lawrence Brown, CNSTAT Chair and the University of Pennsylvania
2:20 Developments at the OMB Statistical and Science Policy Office — Katherine Wallman, Chief Statistician of the U.S.
2:35 Featured Topic:
‘Can Government-Academic Partnerships Help Secure the Future of the Federal Statistical System?Examples from the NSF-Census Research Network,’
John Abowd, Cornell University [presenter] and Stephen Fienberg, Carnegie Mellon University
Robert Groves, Georgetown University [facilitator and discussant]
Erica Groshen, Bureau of Labor Statitics [discussant] ‘Comment on: Can Government-Academic Partnerships Help Secure the Future of the Federal Statistical System? Examples from the NSF-Census Research Network’
4:00 Floor Discussion
Reception East Court
For additional information, contact the NCRN Coordinating Office
Nodes:
NCRN Coordinating Office
Carnegie-Mellon University
Cornell University
Duke University / National Institute of Statistical Sciences (NISS)
Northwestern University
University of Colorado at Boulder / University of Tennessee
University of Michigan
University of Missouri
University of Nebraska
Date:
May 07, 2015 to May 08, 2015
Address:
Washington, DCUnited States
Attachments:
Agenda for NCRN Meeting @ Census Bureau May 7
CNSTAT NCRN Agenda May 8.pdf
Location:
20150507201505090NCRN Meeting Spring 2015ai1ec-1073@www.ncrn.cornell.edu20171214T020841ZConferences,Presentation,vilhuberhttp://iassistdata.org/conferences/iassist-2015-call-papersBridging The Data Divide: Data In The International Context
The theme of our 2015 conference is Bridging the Data Divide: Data in the International Context. Going hand in hand with the well-known digital divide is a growing inequity in access to data. Increasing budget concerns have placed strains on governments, universities, and other institutions in the provision of data services. From the cancellation of the Statistical Abstract of the United States, to the controversy over the Canadian Census long form, to political barriers in the data collection process in some countries, access to data and the data divide presents organizational, economic and educational challenges to the community of data professionals worldwide.
2015060220150606+44.977753;-93.265011Minneapolis, MN, USA0IASSIST 2015DDI,Metadata,NCRNai1ec-4177@www.vilhuber.com/lars20171214T020841ZPresentationhttp://cerium.umontreal.ca/etudes/ecoles-dete-2015/la-statistique-publique-a-lere-du-big-data/
Presented during CERIUM 2015: Public Statistics and Big Data, the presentation is at http://www.vilhuber.com/lars/cerium2015
2015062020150621Université de Montréal0CERIUM 2015: Challenges for Official Statisticsai1ec-2828@www.ncrn.cornell.edu20171214T020841ZConferences,Presentation,vilhuberjsessionid=96E461383BD863435CE6213B550DE662; http://www1.unece.org/stat/platform/display/SDCWS15/Statistical Data Confidentiality Work Session Oct 2015 Home“Using partially synthetic microdata to protect sensitive cells in business statistics,” Lars Vilhuber (NCRN, Cornell University), Javier Miranda (U.S. Census Bureau). This is an updated version of the presentation made at JSM 2015.
2015100620151007+60.185526;+24.979765UNECE Statistical Data Confidentiality Work Session @ Kalasatama, Helsinki, Finland0Vilhuber @ UNECE 2015: Using partially synthetic micro data to protect sensitive cells in business statisticsfreeConfidentiality,NCRN,Privacy,SynLBD,UNECEai1ec-2855@www.ncrn.cornell.edu20171214T020841ZConferences,Presentation,vilhuberhttps://sites.stanford.edu/researchdatacenter/conference-agenda“Earnings Inequality Trends in the United States: Nationally Representative Estimates from Longitudinally Linked Employer-Employee Data”, John Abowd (Cornell University and U.S. Census Bureau), Kevin McKinney (U.S. Census Bureau), Nellie Zhao (Cornell University)
Extended Abstract
We track sources of earnings inequality using the statistical technique introduced to the labor economics literature in 1999 (Abowd, Kramarz and Margolis, Econometrica 1999). When this technique has been used in Europe (Card, Heining and Kline QJE 2013 for Germany, in particular), the biggest contributor to the increase in earnings inequality appears to be increased employer-level heterogeneity (called the firm effect in AKM). Using the Census Bureau’s Longitudinal Employer-Household Dynamics Infrastructure data for 1990-2013, we show that with respect to the U.S. data, the CHK result does not hold. There has been very little change in employer-level earnings heterogeneity in the U.S. when one compares wage measures similar to the ones used to analyze the European data. European administrative databases allow one to construct something akin to a wage rate (usually, the amount that would be earned if an individual worked full-time full-year). The American data does not directly allow that. We develop a statistical approximation to the full-year full-time wage rate, using integrated Current Population Survey, Census 2000, and American Community Survey data. Using that measure, the earnings inequality trends in the U.S. look more similar to the European analyses.
But, for the purposes of studying earnings inequality, considering only the wage rate, and not the amount of time a person actually works, is seriously incomplete—especially in the U.S. where there is very little statutory employment security except in the public sector. The most important determinant of increased earnings inequality in our analyses is changes in labor force attachment (weeks worked in the year, hours worked per week).
In attempting to estimate how important the labor-force attachment component is, we reconstruct the work-eligible population (18-70) for each year from 1990-2013. The administrative records database developed at the Census Bureau uses an encrypted SSN to track individuals. The researcher can tell if the number that was encrypted is a valid SSN, and can also access the demographic details and employment history associated with the underlying SSN. In our model, there are two kinds of SSNs that are suspect: ones that are not valid (this means that the employer reported earnings in a state’s UI system for an SSN that was never issued) and ones associated with demographic characteristics that mean it is unlikely that the owner of the SSN used it (leading case: the SSN was issued to a person who was less than 10 years old in the year during which the SSN was used to report UI eligible earnings). Our working hypotheses are: (1) the use of an invalid SSN reflects the work of a single undocumented immigrant, so we add that person to both the eligible population and the working population and (2) the use of a valid SSN issued to someone who appears to be too young (or too old) to work legally represents one person in the population (not working, not immigrant; i.e., eligible to get an SSN by virtue of birth in the U.S.) and at least one other person both working and in the work-eligible population, who is an undocumented immigrant.
Getting the non-working work-eligible population as accurate as possible is important because, especially during the Great Recession, many persons had no income from work for a full calendar year. We have no trouble finding these people for properly documented native-born and immigrant subpopulations, but we have to estimate how many work-eligible non-documented immigrants are still in the U.S. looking for work in any given year.
We also link data from the 1992-2012 Economic Censuses. These data are used to construct a measure of surplus per worker (revenue minus factor opportunity costs) for every private establishment in the censuses. These data show similar results for the population of working persons employed in the private sector. In particular, they show that there has not been an increase in overall earnings variability for this population.
2015111320151115+37.427475;-122.169719Stanford University @ Li Ka Shing Conference Center 291 Campus Drive, Stanford, CA 943050Abowd @ NBER Conference on Firm Heterogeneity and Income Inequality: “Earnings Inequality Trends in the United States: Nationally Representative Estimates from Longitudinally Linked Employer-Employee Data”NBER,NCRNai1ec-2862@www.ncrn.cornell.edu20171214T020841ZConferences,Presentation,vilhuberhttp://projects.informatics.mit.edu/bigdataworkshops/book/second-workshop-location-confidentiality-and-official-surveys#overlay-context=bigdataworkshops/book/second-workshop-location-confidentiality-and-official-surveysSecond Workshop: Location Confidentiality and Official Surveys
2015113020151201+42.361623;-71.086625MIT/Census Big Data Meeting @ 45 Carleton St, Cambridge, MA 02142, USA0Abowd @ MIT/Census Big Data Meeting: Invited SpeakerBig Data,NCRNai1ec-3169@www.ncrn.cornell.edu20171214T020841ZConferences,Presentation,vilhuberhttp://www.eddi-conferences.eu/ocs/index.php/eddi/eddi15Michelle Edwards and William Block, Presentation at EDDI 2015
20151202201512030Block and Edwards present on “What comes first? Metadata or Data Access?” at EDDI 2015freeNCRNai1ec-2825@www.ncrn.cornell.edu20171214T020841ZConferences,Presentation,vilhuberhttp://www.eddi-conferences.eu/ocs/index.php/eddi/eddi15/paper/view/192“Improving Access and Data Security to Confidential Labor Market Data”, Warren Brown (Cornell University), Stephanie Jacobs (Cornell University), David Schiller (German Institute for Employment Research), Jörg Heining (German Institute for Employment Research)
Abstract: The Cornell Institute for Social and Economic Research (CISER), Cornell University and the Institute for Employment Research (IAB), German Federal Employment Agency are collaborating to expand use of IAB’s confidential Sample of Integrated Labour Market Biographies (SIAB). DDI 2.5 is used to enable researchers to discover the files by means of variable level searching in a repository of metadata on U.S. and German labor market related data files. The repository is the Comprehensive Extensible Data Documentation and Access Repository (CED2AR) being developed by researchers at Cornell University with funding from the U.S. National Science Foundation. CED2AR provides researchers access to machine-readable codebooks with variable characteristics thus enabling researchers to develop detailed proposals for access to these data that are submitted to IAB. Researchers with approved projects are able to access and analyze the data using the Cornell Restricted Access Data Center (CRADC), a remote access virtual data enclave using remote desktop protocol. In the initial testing phase several researchers located in Europe and North America are successfully accessing and analyzing the Scientific Use Files of the SIAB. The project is well on its way to realizing the goal of wider access to researchers while improving secure management of confidential data.
The presentation can be found at http://hdl.handle.net/1813/44707
2015120220151204+55.676097;+12.568337Royal School of Library and Information Sciences @ Copenhagen, Denmark0Brown presents @ EDDI 2015: Improving Access and Data Security to Confidential Labor Market DatafreeEDDI,NCRNcalendar.2026.field_date_with_zone.0@www.ncrn.info20171214T020841ZNCRN MeetingsLocation:
Dec 14: BLS Conference Center (open to the public, registration required) Registration is closed.
Dec 15: U.S. Census Bureau HQ (NCRN nodes only)
Registration:
Registration is closed.
Program:
This version last updated 2015-12-09.
Monday, December 14, 2015
Location: BLS Conference Center, 2 Massachusetts Avenue, N.E. Washington, D.C. 20212 Attendees are requested to bring identification to the BLS front entrance.
9:00-9:10 Opening Remarks – NCRN Coordinating Office – Lars Vilhuber
9:10-9:20 Opening Remarks – BLS Commissioner Erica Groshen
9:30-12:00 Research Session I: Confidentiality (Organizer: Jerry Reiter, Duke; Chair: Warren Brown, Cornell University and President, APDU) [Conference rooms 1-3]
9:30 ‘Formal Privacy Protection for Data Products Combining Individual and Employer Frames’ (John Abowd, Cornell University, with Samuel Haney, Ashwin Machanavajjhala (Duke University), Mark Kutzbach, Matthew Graham (US Census Bureau), Lars Vilhuber (Cornell University)) This slideshow could not be started. Try refreshing the page or viewing it in another browser.
10:00 ‘The effect of data swapping on contingency table analyses’ (Nicolas Kim, CMU) This slideshow could not be started. Try refreshing the page or viewing it in another browser.
10:30 ‘Simultaneous Edit-Imputation and Synthetic Data Generation for Establishment Microdata’ (Jerry Reiter, Duke University) This slideshow could not be started. Try refreshing the page or viewing it in another browser.
11:00 “Exact Analysis of Singly and Multiply Imputed Synthetic Data Generated Under Plug-in Sampling From a Multiple Linear Regression Normal Model” (Bimal Sinha, UMBC/U.S. Census Bureau) This slideshow could not be started. Try refreshing the page or viewing it in another browser.
11:30 Discussion: John Eltinge (BLS) This slideshow could not be started. Try refreshing the page or viewing it in another browser. and questions
12:00-1:30 Lunch (on your own, BLS cafeteria)
1:30-4:00 Research Session II: Small domain estimation and visualization of uncertainty in small area data (Organizer: Scott Holan (U Missouri), Nicholas Nagle (UTK), David Folch (Florida State U); Chair: Matthew Simpson, U Missouri) [Conference rooms 1-3]
1:30 ‘Census Data on Mean Usual Weekly Income: Regression and Simultaneous Autoregression when the Dataset is Large’, Noel Cressie (University of Wollongong and University of Missouri) This slideshow could not be started. Try refreshing the page or viewing it in another browser.
2:00 ‘Spatio-temporal change of support with application to American Community Survey multi-year period estimates’, Scott Holan (University of Missouri) This slideshow could not be started. Try refreshing the page or viewing it in another browser.
2:30 ‘Navigating ACS Data Uncertainty: Insights from Mapping Experiments with Urban Planners’, Jason Jurjevich (Assistant Professor, Nohad A. Toulan School of Urban Studies and Planning, Portland State University and Assistant Director, Population Research Center) This slideshow could not be started. Try refreshing the page or viewing it in another browser.
3:00’Exploring a scalable framework for spatially-local regression analysis’, Carson Farmer (Assistant Professor, Geography Department, University of Colorado Boulder) This slideshow could not be started. Try refreshing the page or viewing it in another browser.
3:30 Discussion (Connie Citro, Committee on National Statistics) This slideshow could not be started. Try refreshing the page or viewing it in another browser. and questions
4:00-5:00 NCRN PI-only meeting [by invitation only]
7:00-9:00 Group dinner [by invitation only]
Tuesday, December 15, 2015
Location: U.S. Census Bureau
8:00-9:00 Coffee and breakfast
9:00-10:00 Meeting with Decennial (3 parallel sessions)
10:00-11:00 Meet with Director Thompson, Deputy Director Potok, others [by invitation only, Conference Room T3]
11:00-12:00 Meeting with Decennial (3 parallel sessions)
Conference ends. [Lunch in Census Cafeteria]
For additional information, contact the NCRN Coordinating Office
Nodes:
NCRN Coordinating Office
Carnegie-Mellon University
Cornell University
Duke University / National Institute of Statistical Sciences (NISS)
Northwestern University
University of Colorado at Boulder / University of Tennessee
University of Michigan
University of Missouri
University of Nebraska
Date:
Dec 14, 2015 to Dec 15, 2015
Address:
U.S. Bureau of Labor Statistics
2 Massachusetts Avenue, NE
Washington, DC 20212United States
Location:
20151214201512160NCRN Meeting Fall 2015ai1ec-2860@www.ncrn.cornell.edu20171214T020841ZConferences,Presentation,vilhuberInvited Seminar, John Abowd (Cornell University and U.S. Census Bureau)
2015121820151219+31.281688;+121.479984School of Economics, Shanghai University of Finance and Economics @ America Webster University, China Shanghai University of Finance and Economics, 369 Zhong Shan Bei Yi Lu, Hongkou Qu, Shanghai Shi, China, 2000800Abowd at Shanghai University of Finance and Economics: Invited SeminarNCRNai1ec-2861@www.ncrn.cornell.edu20171214T020841ZConferences,Presentation,vilhuberhttp://www.wise.xmu.edu.cn/meetings/LABOR2015/2015121920151221+24.437348;+118.0978552015 WISE International Symposium on Labor Economics @ Xiamen University, 422 Si Ming Nan Lu, Siming Qu, Xiamen Shi, Fujian Sheng, China, 3610060Abowd presents at 2015 WISE International Symposium on Labor Economics: Keynote speakerNCRN,WISEai1ec-106@www.vrdc.cornell.edu/info747x20171214T020841ZINFO7470
This class coincides with President’s Day. There will be no in-classroom activity on this day. The content of this section will be discussed on Feb 22, 2016.
Lecture notes
INFO7470-S3 PopulationsFramesSamples
20160215201602160Session 3: Universes, Populations, Frames, and Samplingfreeai1ec-2971@www.ncrn.cornell.edu20171214T020841ZConferences,Presentation,vilhuberLars Vilhuber presents at Statistics Canada Socio-Economic Workshop “Using Business Microdata for Economic Research” on the topic of “Synthetic Establishment Microdata”.
2016021920160220+45.407067;-75.734605Statistics Canada @ Jean Talon Building, 170 Tunney's Pasture Driveway, Ottawa, ON X9X 9X9, Canada0Lars Vilhuber presents at Statistics Canada Socio-Economic WorkshopfreeNCRNai1ec-144@www.vrdc.cornell.edu/info747x20171214T020841ZINFO747020160328201603290No class (Cornell Spring Break)ai1ec-3000@www.ncrn.cornell.edu20171214T020841ZConferences,Presentation,vilhuberhttps://casd.eu/documents/Programme-CASD-Event-2016.pdfTitle of the presentation: “Quelques développements en cours aux USA et au Canada”
Tickets: https://casd.eu/fr/event2016.
Tickets: https://casd.eu/fr/event2016.2016040620160407+48.84392;+2.356635Muséum national d'histoire naturelle @ 57 Rue Cuvier, 75005 Paris, France0Lars Vilhuber presents at “CASD Conference: Vos données au coeur de la datascience”externalLars Vilhuber,NCRNhttps://casd.eu/fr/event2016ai1ec-3163@www.ncrn.cornell.edu20171214T020841ZConferences,Presentation,vilhuberPoster presentation at North American DDI (NADDI) Conference, held in Edmonton, Alberta, CA on April 7, 2016. Download the poster from http://hdl.handle.net/1813/44704.
20160407201604080William Block presents on “Incorporating W3C’s DQV and PROV in CISER’s Data Quality Review and Reproduction of Results Service” at NADDIfreeNCRNai1ec-3007@www.ncrn.cornell.edu20171214T020841ZConferences,Presentation,vilhuberJohn Abowd will be giving two talks at the University of Nebraska-Lincoln, at the opening of the Central Plains Federal Statistical Research Data Center. The first talk is titled “Social Science Research in the Era of Restricted-Access Data”
2016042220160423+40.818253;-96.695225University of Nebraska-Lincoln @ Lincoln, NE 68588, USA0John Abowd: Social Science Research in the Era of Restricted-Access DatafreeJohn Abowd,NCRNcalendar.2214.field_date_with_zone.0@www.ncrn.info20171214T020841ZNCRN MeetingsProgram
A printable version of the program can be downloaded here.
Monday, May 9, 2016
10:00-10:15 Opening Remarks – NCRN Coordinating Office – Lars Vilhuber
10:15-10:30 Opening Remarks – John H. Thompson, Director, Census Bureau
10:30-11:30 Research Session I [Conference rooms 1-2] (Organizer: Lars Vilhuber)
Duke: ‘Itemwise missing at random modeling for incomplete multivariate data’ (Mauricio Sadinle and Jerry Reiter) (30 minutes)
CMU: “Assessing Respondent Attitudes Towards Geolocation in Online Surveys” (Laura Brandimarte) (30 minutes)
Nebraska: ‘The ATUS and SIPP-EHC: Recent Developments’ (Robert F. Belli) (30 minutes)
12:00-1:00 PI-only Meeting, by invitation only, working lunch at Census Bureau (separate room, catered lunch)
Parallel:
1:00-4:00 Independent meetings with Census Bureau staff
1:25-4:00 INFO7470 final (live) session on Synthetic Data [T10]
4:00-4:30 Meeting with Census Bureau Director, Deputy Director, Associate Director R&M, staff (by invitation only)
6:30- NCRN Dinner (Lebanese Taverna) [registration required]
Tuesday, May 10, 2016
9:00-10:00 Research Session II [Conference rooms 1-2] (Organizer: Lars Vilhuber)
Northwestern: “A 2016 View of 2020 Census Quality, Costs, Benefits” (Bruce Spencer)
Nebraska: ‘Data Quality in Time Diary Surveys’ (Ana Lucía Córdova Cazar)
10:00 Break
10:15-11:15 Research Session III [Conference rooms 1-2] (Organizer: Lars Vilhuber)
Cornell: ‘The Advantages and Disadvangages of Statistical Disclosure Limitation for Program Evaluation.’ (Iam Schmutte)
Michigan: “Developing job linkages for the Health and Retirement Study” (Maggie Levenstein)
11:15 Break
11:30-12:00 Research Session IV [Conference rooms 1-2] (Organizer: Lars Vilhuber)
Cornell: ‘Crowdsourcing Codebook Development and Enhancements in CED²AR – Progress on metadata’ (Lars Vilhuber, Bill Block) (Permanent link: http://hdl.handle.net/1813/43887)
End of meetings.
Registration
Registration is closed.
Lodging
Information on room blocks have been made available to NCRN PIs. If you are coming from out of town and are not affiliated with a NCRN node, please contact the NCRN Coordinating Office.
Nodes:
NCRN Coordinating Office
University of Nebraska
Cornell University
Northwestern University
Duke University / National Institute of Statistical Sciences (NISS)
University of Colorado at Boulder / University of Tennessee
University of Michigan
University of Missouri
Carnegie-Mellon University
Date:
May 09, 2016 to May 10, 2016
Address:
4600 Silver Hill Rd.
Suitland, MDUnited States
Attachments:
NCRNMeetingSpring2016Agenda.pdf
Location:
20160509201605110NCRN Spring 2016 Meetingai1ec-3138@www.ncrn.cornell.edu20171214T020841ZConferences,Presentation,vilhuberJohn M. Abowd (Cornell University and U.S. Census Bureau) presents at the Society of Government Economists on “The Fate of Empirical Economics When All Data are Private”.
20160513201605140John Abowd presents on “The Fate of Empirical Economics When All Data are Private”freeNCRNai1ec-3137@www.ncrn.cornell.edu20171214T020841ZConferences,Presentation,vilhuberJohn M. Abowd (Cornell University and U.S Census Bureau) presents at the BLS Commissioner’s Invited Seminar on “Four Challenges for Statistical Agencies”.
The presentation can be found on the NCRN Presentation archive at http://hdl.handle.net/1813/44639.
2016051620160517+38.898062;-77.00808Bureau of Labor Statistics @ 2 Massachusetts Ave NE, Washington, DC 20212, USA0Abowd presents on Four Challenges for Statistical AgenciesfreeNCRNai1ec-3134@www.ncrn.cornell.edu20171214T020841ZConferences,Presentation,vilhuberhttps://www.ncrn.info/event/spatial-and-spatio-temporal-design-and-analysis-official-statisticsJohn Abowd (Cornell University and U.S Census Bureau) presents “An Integrated Approach to Statistical Agency Modernization” at the Missouri-hosted workshop on “Workshop on Spatial and Spatio-Temporal Design and Analysis for Official Statistics”.
20160520201605210John Abowd on “An Integrated Approach to Statistical Agency Modernization”freeNCRNai1ec-3164@www.ncrn.cornell.edu20171214T020841ZConferences,Presentation,vilhuberhttp://iassist2016.org/For the presentation, see http://hdl.handle.net/1813/44703
20160602201606030Florio Arguillas presents on “Towards Fully Replicable Data Analysis in an Increasingly Connected World”freeNCRNai1ec-3167@www.ncrn.cornell.edu20171214T020841ZConferences,Presentation,vilhuberhttp://iassist2016.org/Develop tools and training modules for online access enabling researchers to work more effectively with official restricted access statistical files. For the presentation, see http://hdl.handle.net/1813/44705
20160603201606040Warren Brown presents on “Online Tools and Training for Access and Analysis of Restricted Government Data Files”freeNCRNai1ec-4542@www.vilhuber.com/lars20171214T020841ZConferencesJe préside la séance 2016 du Conseil scientifique du CASD.
2016061520160616+48.823906;+2.302602INSEE @ 6 Rue Legrand, 92240 Malakoff, France0Conseil scientifique du CASDfreethumbnail;https://i2.wp.com/www.vilhuber.com/lars/wp-content/uploads/2016/06/IMAG1338.jpg?resize=150,150&ssl=1;150;150;1,medium;https://i2.wp.com/www.vilhuber.com/lars/wp-content/uploads/2016/06/IMAG1338.jpg?fit=300,170&ssl=1;300;170;1,large;https://i2.wp.com/www.vilhuber.com/lars/wp-content/uploads/2016/06/IMAG1338.jpg?fit=1024,579&ssl=1;1024;579;1,full;https://i2.wp.com/www.vilhuber.com/lars/wp-content/uploads/2016/06/IMAG1338.jpg?fit=2688,1520&ssl=1;2688;1520;CASDai1ec-4556@www.vilhuber.com/lars20171214T020841ZvilhuberThe Cost of Provable Privacy: A Case Study on Linked Employer-Employee Data,
Samuel Haney, Ashwin Machanavajjhala, John Abowd, Matthew Graham, Mark Kutzbach and Lars Vilhuber.
20160623201606240Our work on “The Cost of Provable Privacy” at TPDP 2016freeTPDPai1ec-3173@www.ncrn.cornell.edu20171214T020841ZConferences,Presentation,vilhuberThe presentation can be found at http://hdl.handle.net/1813/44708
20160706201607070Lagoze presents on “Reproducible research at Census” at NISTfreeNCRN,Replicable,Reproducibleai1ec-4555@www.vilhuber.com/lars20171214T020841Zvilhuberhttps://www.vrdc.cornell.edu/computing-for-economists/Running the annual “(HP) Computing for Economists” workshop at Cornell again.
2016081520160818+42.443961;-76.501881Cornell University @ Ithaca, NY, USA0Computing for Economists 2016freeai1ec-3175@www.ncrn.cornell.edu20171214T020841ZConferences,Presentation,vilhuberJoint work with Kurt Lavetti. Presentation can be found at http://hdl.handle.net/1813/44709
2016091720160918+51.054342;+3.717424Ghent, Belgium0Schmutte presents “Estimating Compensating Wage Differentials with Endogenous Job Mobility”freeNCRNai1ec-3199@www.ncrn.cornell.edu20171214T020841ZConferences,Presentation,vilhuberBy invitation only.The goal of the workshop is to
Contemplate practical implementations of privacy preserving statistical methods by drawing together expertise of academic and governmental researchers
Produce short written memos that summarize concrete suggestions for practical applications to specific Census Bureau priority areas.
The workshop is organized by the Labor Dynamics Institute and Cornell NCRN node. Funding for the workshop is provided by the National Science Foundation (CNS-1012593) and the Alfred P. Sloan Foundation.
Proceedings were published as
Vilhuber, Lars, and Ian Schmutte. 2017. “Proceedings from the 2016 NSF-Sloan Workshop on Practical Privacy.” Labor Dynamics Institute, Cornell University. http://digitalcommons.ilr.cornell.edu/ldi/33/ or http://hdl.handle.net/1813/46197
2016101420161015+38.847071;-76.929454U.S. Census Bureau @ 4600 Silver Hill Rd, Suitland, MD 20746, USA0NSF–Sloan Workshop On Practical Privacy 2016freeNCRN,Sloan,TC-Largecalendar.2373.field_date_with_zone.0@www.ncrn.info20171214T020841ZNCRN MeetingsThe NCRN Meeting Fall 2016 is the opportunity to learn about the research done within the NSF-Census Research Network. Presentations by network researchers are open to the public.
Location: U.S. Census Bureau HQ
Note: Anyone who does not have a federal government badge will need to check in at the main gatehouse (across from the metro). If you are not a US Citizen, please register on Eventbrite using the ‘Foreign National’ option at least 2 weeks before the conference – by close of business Wednesday, October 19th.
Program
A printable version of the program can be downloaded here.
Monday, October 24, 2016
9:00-9:05 Opening Remarks [Census auditorium]- Lars Vilhuber, PI, NCRN Coordinating Office
9:05-9:15 Opening Remarks [Census auditorium] – John Thompson, Director, Census Bureau
9:15-11:15 Research Session I [Census auditorium] The Survey of Income and Program Participation and NCRN (Organizer: Lars Vilhuber)
“Audit trails, parallel retrieval, and the SIPP”, Jinyoung Lee (Nebraska) [Powerpoint] (45 minutes)
“Adaptive Survey Design, with Application to the SIPP” Kirstin Early, Steve Fienberg, and Jen Mankoff (Carnegie Mellon) (45 minutes)
Discussant: Jason Fields, U.S. Census Bureau (15 minutes)
General discussion (15 minutes)
11:15-12:00 Break and discussions
11:30-12:15 New to Census? Join the ‘Student tour’ by Renee Ellis. Meet outside of the auditorium
12:00-1:15 PI-only Meeting (invitation only) [Conference room 4] working lunch at Census Bureau (separate room, catered lunch)
12:15-1:15 Non-PI Lunch (Cafeteria, Host: Renee Ellis)
1:30-3:30 Research Session II [Census auditorium] Using private data for economic measurement (Organizer: Maggie Levenstein, Michigan)
“Using Account Data to Measure Spending and Income”, Matthew Shapiro (Michigan) (45 minutes)
“Scanner Data and Economic Statistics: A Unified Approach”, David Weinstein (Columbia University) [pdf] (45 minutes)
General discussion (30 minutes)
4:00-5:00 Meeting with Census Bureau Director, Deputy Director, Associate Director R&M, staff (invitation only) [8H008 – Director’s conference room]
6:30- NCRN Dinner (invitation only) [registration required]
Registration
Registration is closed; the event has ended.
Lodging
If you are coming from out of town and are not affiliated with a NCRN node, please contact the NCRN Coordinating Office.
Agenda has been modified from the original version: links have been modified.
Nodes:
NCRN Coordinating Office
Carnegie-Mellon University
Cornell University
Duke University / National Institute of Statistical Sciences (NISS)
Northwestern University
University of Colorado at Boulder / University of Tennessee
University of Michigan
University of Missouri
University of Nebraska
Date:
Oct 24, 2016
Address:
4600 Silver Hill Rd
Suitland, MD 20746United States
Attachments:
NCRNMeetingFall2016Agenda.pdf
Location:
20161024201610250NCRN Meeting Fall 2016ai1ec-3177@www.ncrn.cornell.edu20171214T020841ZConferences,Presentation,vilhuberhttps://www.newton.ac.uk/event/dlaw04Workshop on “Privacy: Recent Developments at the Interface Between Economics and Computer Science” at the Issac Newton Institute, Cambridge University. Joint work with John M. Abowd.
2016102820161029+52.20978;+0.100774Issac Newton Institute, Cambridge University @ 20 Clarkson Rd, Cambridge CB3, UK0Schmutte presents “Revisiting the Economics of Privacy: Population Statistics and Privacy as Public Goods”freeNCRN,Sloanai1ec-3277@www.ncrn.cornell.edu20171214T020841ZConferences,Presentation,vilhuberhttp://naddiconf.org/2017/The Cornell NCRN node presents results from the past 5 years on CED²AR, their metadata editor and presentation tool.
Conference website: http://naddiconf.org/2017/
2017040620170408+42.446997;-76.480085ILR Conference Center @ 14853, 140 Garden Ave, Ithaca, NY 14853, United States0Presentation at NADDI 2017freeCED2AR,DDI,Lars Vilhuber,NADDI,NCRN,Warren Brown,William Blockcalendar.2438.field_date_with_zone.0@www.ncrn.info20171214T020841ZNCRN MeetingsThe NCRN Meeting Spring 2017 is the opportunity to learn about the research done within the NSF-Census Research Network.
Location: U.S. Census Bureau HQ
Note: Anyone who does not have a federal government badge will need to check in at the main gatehouse (across from the metro). If you are not a US Citizen, please register on Eventbrite using the ‘Foreign National’ option at least 2 weeks before the conference – by close of business Wednesday, April 5th.
You will need a security pass if you bring in a laptop – you can get those at the desk in the main entrance.
Program
A printable version of the program can be downloaded here.
Monday, April 24, 2017
8:30-8:35 Opening Remarks – NCRN Coordinating Office – Lars Vilhuber
8:35-8:45 Opening Remarks – John Eltinge, Ass. Dir., Research and Methodology Directorate, Census Bureau
8:45-10:15 Research Session I [Census auditorium] Linkage and Geography
Jared Murray, ‘Probabilistic Record Linkage after Indexing, Blocking and Filtering’ (CMU). (25 minutes)
David Folch, “Neighbors: The MAF provides new insights into the spatial organization of the American population” (Colorado/Tennessee) (25 minutes)
Matthew Simpson, “A Multiscale Spatial Approach to Change of Statistics” (Missouri) (25 minutes)
Discussion (TBD) 15 minutes
10:15-10:45 Break
10:45-11:45 Research Session II: [Census auditorium] Looking Forward
Carol Caldwell, “2017 Economic Census: Towards Synthetic Data Set” (US Census Bureau) (25 minutes)
Chris Clifton, ‘Practical Issues in Anonymity’, (Purdue) (25 minutes)
Discussion (15 minutes)
12:15-1:15 PI-only Meeting (by invitation only) [Conference Room 3] working lunch at Census Bureau (separate room, catered lunch)
1:30-2:30 Research Session III [Census auditorium] Privacy and Confidentiality
Jerry Reiter, ‘Differentially Private Verification of regression model results’ (Duke) (25 minutes)
Kobbi Nissim, ‘Formal Privacy Models and Title 13’ (Georgetown) (25 minutes)
Discussion (15 minutes)
4:00 – 5:00 Meeting with Census Bureau (by invitation only) TO BE CONFIRMED Directory, Deputy Director, Associate Director R&M, staff [8H008 – Director’s conference room]
6:30 – NCRN Dinner (Lebanese Taverna) [registration required]
Registration
Registration is closed.
Nodes:
NCRN Coordinating Office
Carnegie-Mellon University
Cornell University
Duke University / National Institute of Statistical Sciences (NISS)
Northwestern University
University of Colorado at Boulder / University of Tennessee
University of Michigan
University of Missouri
University of Nebraska
Date:
Apr 24, 2017
Address:
4600 Silver Hill Rd
Suitland, MD 20746United States
Attachments:
FinalNCRNMeetingSpring2017Agenda.pdf
Location:
20170424201704250NCRN Meeting Spring 2017ai1ec-3240@www.ncrn.cornell.edu20171214T020841ZConferences,Presentation,vilhuberBy invitation onlyThe goal of the workshop is to
Contemplate practical implementations of privacy preserving statistical methods by drawing together expertise of academic and governmental researchers
Produce short written memos that summarize concrete suggestions for practical applications to specific Census Bureau priority areas.
The workshop is organized by the Labor Dynamics Institute and Cornell NCRN node. Funding for the workshop is provided by the National Science Foundation (CNS-1012593) and the Alfred P. Sloan Foundation.
This is a follow-up to the NSF–Sloan Workshop On Practical Privacy 2016.
Conference proceedings: Vilhuber, Lars, and Ian Schmutte. 2017. “Proceedings from the 2017 Cornell-Census- NSF- Sloan Workshop on Practical Privacy”, Labor Dynamics Institute Document 43, http://digitalcommons.ilr.cornell.edu/ldi/43 or http://hdl.handle.net/1813/52473.
Agenda
Start
Duration
Topic
9:00
(0h30)
Welcome (Lars Vilhuber), housekeeping plan,
Associate Director’s remarks (John M. Abowd)
9:30
(1h15)
2020 Census: Implementation issues using the redistricting data
(20min)
Presentation of current work and issues (Phil Leclerc)
(45min)
Discussion
(10min)
Summary
10:15
(0h10)
Coffee break
10:20
(1h15)
ACS and 2020 Census: Privacy for households or persons?
(20min)
Presentation of current work and issues (Jerry Reiter)
(30min)
Discussion
(10min)
Summary
11:35
(0h55)
Lunch (please choose lunch options here)
12:30
(1h15)
Demand for Privacy
(20min)
Presentation of current work and issues (Jenny Childs, Ian Schmutte)
(45min)
Discussion
(10min)
Summary
1:45
(0h15)
Coffee break
2:00
(1h10)
Economic Census 2017
(30min)
Presentation of current work and issues (Jenny Thompson)
(30min)
Discussion
(10min)
Summary
3:10
Workshop ends
The detailed program can be found here.
2017050820170509+38.847071;-76.929454U.S. Census Bureau @ 4600 Silver Hill Rd, Suitland, MD 20746, USA0Cornell-Census-NSF–Sloan Workshop On Practical Privacy 2017freeNCRN,Sloan,TC-Largeai1ec-3272@www.ncrn.cornell.edu20171214T020841ZConferences,Presentation,Privacy,vilhuberhttp://sigmod2017.org/Our paper “Utility Cost of Formal Privacy for Releasing National Employer-Employee Statistics” (Samuel Haney, Ashwin Machanavajjhala, John Abowd, Matthew Graham, Mark Kutzbach and Lars Vilhuber) will be presented at SIGMOD 2017.
(link to preprint forthcoming)
The conference: The annual ACM SIGMOD/PODS conference is a leading international forum for database researchers, practitioners, developers, and users to explore cutting-edge ideas and results, and to exchange techniques, tools, and experiences. The conference includes a fascinating technical program with research and industrial talks, tutorials, demos, and focused workshops. It also hosts a poster session to learn about innovative technology, an industrial exhibition to meet companies and publishers, and a careers-in-industry panel with representatives from leading companies.
Tickets: http://sigmod2017.org/.
Tickets: http://sigmod2017.org/.2017051420170517+41.872509;-87.624713Hilton Chicago @ 720 S Michigan Ave, Chicago, IL 60605, USA0Presentation of “Utility Cost of Formal Privacy for Releasing National Employer-Employee Statistics” at SIGMOD 2017externalconfidentiality protection,differential privacy,NCRNhttp://sigmod2017.org/ai1ec-3344@www.ncrn.cornell.edu20171214T020841ZConferences,Presentation,vilhuberVilhuber presents on “Confidentiality Protection and Physical Safeguards“. Presentation file is available at http://hdl.handle.net/1813/51487
2017060720170608-34.607465;-58.371252Banco de la Nación Argentina @ Av. Rivadavia 325, C1002AAB CABA, Argentina0Vilhuber presents at Seminário DATAFIRM LatAm – Datos administrativos para la investigación sobre productividadfreeconfidentiality protection,NCRN,Privacy,RDCai1ec-3346@www.ncrn.cornell.edu20171214T020841ZConferences,Presentation,vilhuberVilhuber participates in a workshop by Latin American government and research analysts and data providers, regarding the potential secure use of firm microdata.
2017060820170609-34.597168;-58.369993Offices of CAF in Argentina @ Av. Eduardo Madero 900, 1617ACV CABA, Argentina0Workshop: Herramientas prácticas para la apertura y uso seguro de microdatos de firmasfreeconfidentiality protection,firm data,NCRN,Privacy,RDC,Sloan,SynLBDai1ec-3347@www.ncrn.cornell.edu20171214T020841ZConferences,Presentation,vilhuberVilhuber presents on “Making confidential data part of reproducible research.” Presentation file to come.
2017062120170622+38.896319;-76.999304The National Academies of Science, Engineering, and Medicine @ 500 E St NE, Washington, DC 20002, USA0Vilhuber presents at Workshop on Transparency and Reproducibility in Federal StatisticsfreeConfidentiality,NCRN,Replicable,Reproducible25r1mj2k3k2kblpir5aegfstoo@google.com20171214T020841ZConferences,Presentation,vilhuberTime: TBD
Measures and Content for Studying Family Living Arrangements and Child Well-Being from SIPP 2014
2017110120171102TBD0Joint LDI-CISER-CPC Seminar: Jason Fields (U.S. Census Bureau)freeCensus@Cornell,NCRN6c2i6dhe6kkmhla87hu74d8kcu@google.com20171214T020841ZConferences,Presentation,vilhuberTime TBD
Joint LDI-CISER-CPC Seminar: Old Housing, New Needs: Are US Homes Ready for an Aging Population
2017110820171109TBD0Joint LDI-CISER-CPC Seminar: Jonathan Vespa (U.S. Census Bureau)freeCensus@Cornell,NCRNai1ec-497@www.vrdc.cornell.edu/info747x20171214T020841ZINFO747020171123201711240No class (Cornell Thanksgiving Recess)freeai1ec-533@www.vrdc.cornell.edu/info747x20171214T020841ZINFO7470
The class does not have a final exam. The last class at Cornell is on November 30. Check with your local coordinator about any local arrangements.
20171207201712080No final examfreecalendar.1551.field_date_with_zone.0@www.ncrn.info20171214T020841Z(all times Eastern Standard Time)
Speaker: Amy L. Griffin, University of New South Wales (UNSW) Canberra
Abstract: Recent changes to the US Census have led to more timely updates of demographic statistics that are used in the delivery and planning of many social and environmental programs. However, this timeliness has a tradeoff: increased uncertainty in the estimates for small area geographies such as census blocks and tracts. Although the Census Bureau publishes information about the uncertainty of the estimates, few end users engage with and utilize this information, perhaps because it comes in a difficult to use form; another column in a table with many columns. Many techniques for visualising uncertainty in attribute data have been proposed, but few have been empirically tested, and fewer still with real end users using an ecologically valid task. Here, we report on a broader research program directed to studying the visualisation of attribute uncertainty for ACS data, and report the results of an experiment undertaken with 55 urban planners in which they had to make spatial decisions using uncertain demographic estimates. We compared visualisation methods based on two metaphors for communicating uncertainty: the stoplight and sketchiness. The experimental task is one taken from a context of use study we conducted on urban planning. It required planners to define an area of contiguous census tracts that meets a particular threshold with respect to the attribute in question: percentage of households in poverty. We conclude with some thoughts about how to help urban planners work with uncertainty in ACS data more effectively. (joint work with Jason Jurjevich, Portland State University, Meg Merrick, Portland State University, Seth E Spielman, Colorado University at Boulder, Nicholas N Nagle, University of Tennessee-Knoxville, David C Folch, Florida State University) (archived presentation)
Location:
Carnegie Mellon: contact William Eddy (bill@cmu.edu)
Census Bureau headquarters: Room 1, contact Nancy Bates (nancy.a.bates@census.gov)
Cornell University, Ithaca campus: Ives 105, contact Lars Vilhuber (lars.vilhuber@cornell.edu)
Duke University: contact Jerry Reiter (jerry@stat.duke.edu)
University of Michigan: Room 3443 ISR-Thompson, contact Maggie Levenstein (maggiel@umich.edu)
University of Missouri: contact Scott Holan (holans@missouri.edu)
University of Nebraska-Lincoln: Room TBD: contact: Allan McCutcheon (amccutcheon1@unl.edu)
Northwestern University: contact Zach Seeskin (z-seeskin@u.northwestern.edu)
Streaming video: [click here] (link active about 5 minutes after start of seminar)
Nodes:
University of Colorado at Boulder / University of Tennessee
NCRN Coordinating Office
Date:
Feb 04, 2015, 3:00pm to 4:00pm EST
Address:
Canberra ACTAustralia
Video:
Attachments:
Presentation (PDF)
Location:
20150204T15000020150204T1600000NCRN Virtual Seminar – Visualizing Attribute Uncertainty in the ACS: An Empirical Study of Decision-Making with Urban PlannersNCRNcalendar.1552.field_date_with_zone.0@www.ncrn.info20171214T020841ZSpeaker: John M. Abowd (Cornell University)
Title: Revisiting the Economics of Privacy: Population Statistics and Confidentiality Protection as Public Goods (joint work with Ian Schmutte, University of Georgia)
Abstract:
We consider the problem of the public release of statistical information about a population–explicitly accounting for the public-good properties of both data accuracy and privacy loss. We first consider the implications of adding the public-good component to recently published models of private data publication under differential privacy guarantees using a Vickery-Clark-Groves mechanism and a Lindahl mechanism. We show that data quality will be inefficiently under-supplied. Next, we develop a standard social planner’s problem using the technology set implied by (ε, δ)-differential privacy with (α, β)-accuracy for the Private Multiplicative Weights query release mechanism to study the properties of optimal provision of data accuracy and privacy loss when both are public goods. Using the production possibilities frontier implied by this technology, explicitly parameterized interdependent preferences, and the social welfare function, we display properties of the solution to the social planner’s problem. Our results directly quantify the optimal choice of data accuracy and privacy loss as functions of the technology and preference parameters. Some of these properties can be quantified using population statistics on marginal preferences and correlations between income, data accuracy preferences, and privacy loss preferences that are available from survey data. Our results show that government data custodians should publish more accurate statistics with weaker privacy guarantees than would occur with purely private data publishing. Our statistical results using the General Social Survey and the Cornell National Social Survey indicate that the welfare losses from under-providing data accuracy while over-providing privacy protection can be substantial.
Location:
Carnegie Mellon: contact William Eddy (bill@cmu.edu)
Census Bureau headquarters: Room 1, contact Nancy Bates (nancy.a.bates@census.gov)
Cornell University, Ithaca campus: Ives 105, contact Lars Vilhuber (lars.vilhuber@cornell.edu)
Duke University: contact Jerry Reiter (jerry@stat.duke.edu)
University of Michigan: Room 3443 ISR-Thompson, contact Maggie Levenstein (maggiel@umich.edu)
University of Missouri: contact Scott Holan (holans@missouri.edu)
University of Nebraska-Lincoln: Room TBD: contact: Allan McCutcheon (amccutcheon1@unl.edu)
Northwestern University: contact Zach Seeskin (z-seeskin@u.northwestern.edu)
Streaming video: [click here] (link active about 5 minutes after start of seminar)
Nodes:
Cornell University
NCRN Coordinating Office
Date:
Mar 04, 2015, 3:00pm to 4:30pm EST
Address:
Berkeley, CAUnited States
Video:
Location:
20150304T15000020150304T1630000NCRN Virtual Seminar – Revisiting the Economics of Privacy: Population Statistics and Confidentiality Protection as Public GoodsNCRNcalendar.1552.field_date.0@www.ncrn.info20171214T020841ZConferences,NCRN Virtual Seminar,Presentation,vilhuberhttp://www.ncrn.info/event/ncrn-virtual-seminar-march-4-2015Speaker: John M. Abowd (Cornell University)
Title: Revisiting the Economics of Privacy: Population Statistics and Confidentiality Protection as Public Goods (joint work with Ian Schmutte, University of Georgia)
Abstract:
We consider the problem of the public release of statistical information about a population–explicitly accounting for the public-good properties of both data accuracy and privacy loss. We first consider the implications of adding the public-good component to recently published models of private data publication under differential privacy guarantees using a Vickery-Clark-Groves mechanism and a Lindahl mechanism. We show that data quality will be inefficiently under-supplied. Next, we develop a standard social planner’s problem using the technology set implied by (ε, δ)-differential privacy with (α, β)-accuracy for the Private Multiplicative Weights query release mechanism to study the properties of optimal provision of data accuracy and privacy loss when both are public goods. Using the production possibilities frontier implied by this technology, explicitly parameterized interdependent preferences, and the social welfare function, we display properties of the solution to the social planner’s problem. Our results directly quantify the optimal choice of data accuracy and privacy loss as functions of the technology and preference parameters. Some of these properties can be quantified using population statistics on marginal preferences and correlations between income, data accuracy preferences, and privacy loss preferences that are available from survey data. Our results show that government data custodians should publish more accurate statistics with weaker privacy guarantees than would occur with purely private data publishing. Our statistical results using the General Social Survey and the Cornell National Social Survey indicate that the welfare losses from under-providing data accuracy while over-providing privacy protection can be substantial.
Location:
Carnegie Mellon: contact William Eddy (bill@cmu.edu)
Census Bureau headquarters: Room 1, contact Nancy Bates (nancy.a.bates@census.gov)
Cornell University, Ithaca campus: Ives 105, contact Lars Vilhuber (lars.vilhuber@cornell.edu)
Duke University: contact Jerry Reiter (jerry@stat.duke.edu)
University of Michigan: Room 3443 ISR-Thompson, contact Maggie Levenstein (maggiel@umich.edu)
University of Missouri: contact Scott Holan (holans@missouri.edu)
University of Nebraska-Lincoln: Room TBD: contact: Allan McCutcheon (amccutcheon1@unl.edu)
Northwestern University: contact Zach Seeskin (z-seeskin@u.northwestern.edu)
Streaming video: [click here] (link active about 5 minutes after start of seminar)
Nodes:
Cornell University
Date:
Mar 04, 2015, 3:00pm to 4:30pm EST
Address:
Berkeley, CAUnited States
Location:
20150304T15000020150304T1630000NCRN Virtual Seminar – Revisiting the Economics of Privacy: Population Statistics and Confidentiality Protection as Public GoodsNCRNcalendar.1553.field_date_with_zone.0@www.ncrn.info20171214T020841ZSpeakers: Marlow Lemons (U.S. Census Bureau) and Paul Massell (U.S. Census Bureau)
Title: A Method to Improve Data Swapping at the U.S. Census Bureau (M. Lemons)
Abstract: Data swapping is one of several disclosure avoidance methods that the Census Bureau implements to uphold confidentiality mandated by law. The Center for Disclosure Avoidance Research (CDAR) is currently studying the use of n-cycle swapping as a means to protect respondent identity in large-scale data. N-cycle swapping, a variant of data swapping, uses permutations of size ‘n’ to swap data records rather than swapping them in pairs. In this talk, we will discuss the processes surrounding n-cycle swapping, CDAR’s current studies and challenges, and future projects and data products involving this disclosure avoidance technique. (archived presentation)
Title: Cell Suppression as used for Protecting Magnitude Data Tables (P. Massell)
Abstract: The most common data products released by the Economic Directorate of the Census Bureau are magnitude data tables. Common magnitude variables in these tables are ‘sales’ (aka ‘receipts’), and ‘number of employees’. In this method, an agency uses the p% rule for determining which cells reveal too much information about particular establishment or company value contributions to the cell. Such a cell is declared sensitive and is suppressed. However, since Census tables are typically additive, additional cells, called ‘secondary’ suppressions, must also be suppressed in additive to make it impossible for a table user to recovery the value of any sensitive cell. Using techniques from operations research, Census Bureau researchers developed methods for finding these secondary suppressions in a way that minimizes information loss from the table. Good software had been developed about 1990 for implementing cell suppression. We will discuss improvements to the method that have been implemented in the current version, such as better protection at the ‘company level’, handling of negative values, and improved processing of linked tables. (archived presentation)
Location:
Carnegie Mellon: contact William Eddy (bill@cmu.edu)
Census Bureau headquarters: Room 1, contact Nancy Bates (nancy.a.bates@census.gov)
Cornell University, Ithaca campus: Ives 105, contact Lars Vilhuber (lars.vilhuber@cornell.edu)
Duke University: contact Jerry Reiter (jerry@stat.duke.edu)
University of Michigan: Room 3443 ISR-Thompson, contact Maggie Levenstein (maggiel@umich.edu)
University of Missouri: contact Scott Holan (holans@missouri.edu)
University of Nebraska-Lincoln: Room TBD: contact: Allan McCutcheon (amccutcheon1@unl.edu)
Northwestern University: contact Zach Seeskin (z-seeskin@u.northwestern.edu)
Streaming video: [click here] (link active about 5 minutes after start of seminar)
Nodes:
NCRN Coordinating Office
Date:
Apr 01, 2015, 3:00pm to 4:30pm EDT
Address:
4600 Silver Hill Rd.
Suitland, MDUnited States
Video:
Attachments:
A Method to Improve Data Swapping at the U.S. Census Bureau (PDF)
Cell Suppression as used for Protecting Magnitude Data Tables
Location:
20150401T15000020150401T1630000NCRN Virtual Seminar – Center for Disclosure Avoidance ResearchNCRNai1ec-2658@www.ncrn.cornell.edu20171214T020841ZConferences,Presentation,vilhuberhttp://ecommons.library.cornell.edu/handle/1813/40172Ben Perry (Cornell/NCRN) presents joint work with Venkata Kambhampaty, Kyle Brumsted, Lars Vilhuber, & William C. Block.
Abstract: Recent years have shown the power of user-sourced information evidenced by the success of Wikipedia and its many emulators. This sort of unstructured discussion is currently not feasible as a part of the otherwise successful metadata repositories. Creating and augmenting metadata is a labor-intensive endeavor. Harnessing collective knowledge from actual data users can supplement officially generated metadata. As part of our Comprehensive Extensible Data Documentation and Access Repository (CED2AR) infrastructure, we demonstrate a prototype of crowdsourced DDI, using DDI-C and supplemental XML. The system allows for any number of network connected instances (web or desktop deployments) of the CED2AR DDI editor to concurrently create and modify metadata. The backend transparently handles changes, and frontend has the ability to separate official edits (by designated curators of the data and the metadata) from crowd-sourced content. We briefly discuss offline edit contributions as well. CED2AR uses DDI-C and supplemental XML together with Git for a very portable and lightweight implementation. This distributed network implementation allows for large scale metadata curation without the need for a hardware intensive computing environment, and can leverage existing cloud services, such as Github or Bitbucket.
20150410T11150020150410T114500+43.076145;-89.397711Pyle Center @ University of Wisconsin-Madison, 702 Langdon Street, Madison, WI 53706, USA0Presentation @ NADDI 2015: Crowdsourcing DDI Development: New Features from the CED2AR ProjectCED2AR,DDI,NADDI,NCRNai1ec-2659@www.ncrn.cornell.edu20171214T020841ZConferences,Presentation,vilhuberhttp://www.ssc.wisc.edu/naddi2015/abstracts.html#ctdMichelle Edwards (Cornell/CISER) presents on using DDI and CED²AR to “connect the dots”.
Abstract: The Cornell Institute for Social and Economic Research (CISER) data archive has been actively accepting Cornell social science and economic research data since 1981. Holdings range from US Census to New York centric studies to International demographic studies and many, many more. Researchers currently search the archive using a basic search across a limited number of Study level and File level descriptor tags. To enhance discoverability, CED2AR will be implemented to add Variable level and enhanced Study level metadata. CED2AR uses DDI 2.5 metadata standards for documenting the holdings, along with schema.org for microdata markup to allow search engines to parse the semantic information from the DDI metadata. New data deposits? Researcher data or new archive collections will be added using an online data deposit form to create Study level and File level metadata and provide upload capabilities for the data and program files. An API will be used to pass metadata gathered from the data deposit form to both the current archive structure as well as the CED2AR database, ensuring the integrity of both systems. Three processes: an online data deposit form, the archive holdings, and CED2AR, all linked through DDI 2.5 will create a new workflow for the CISER data archive. By connecting the dots with DDI, we will enhance discoverability and usability of the CISER data holdings.
20150410T14000020150410T143000+43.076287;-89.397794Pyle Center @ The Pyle Center, University of Wisconsin-Madison, 702 Langdon Street, Madison, WI 53706, USA0Presentation @ NADDI 2015: Connecting the Dots with DDI (and CED²AR)NADDI,NCRNai1ec-2880@www.ncrn.cornell.edu20171214T020841ZConferences,Presentation,vilhuberhttp://iassist2015.pop.umn.edu/program/posters#p23Poster Abstract: The Comprehensive Extensible Data Documentation and Access Repository (CED2AR), is an online repository for metadata on surveys, administrative microdata, and other statistical information. CED2AR runs directly from DDI 2.5 through a single, non-relational database. While the DDI schema is well developed for documentation purposes, it is not ideal for semantic web applications. Using the schema.org microdata markup, CED2AR allows search engines to parse semantic information from DDI. The solution further enhances the discoverability of DDI metadata, as the data are machine readable to several providers such as Google, Yahoo and Bing. The schema.org markup is not directly embedded within the DDI, so it doesn’t directly export when a user downloads a codebook. However, CED2AR can also run as a zero install desktop application. Users can simply download their own copy of CED2AR, quickly import codebooks, and instantly see the schema.org enhancements the system offers. The only prerequisites for the software is Java version 7, and an internet browser. This presentation will demonstrate the advantages schema.org adds to DDI, and the ease of deployment CED2AR allows
20150603T17150020150603T184500+44.973081;-93.244254University of Minnepapolis @ Willey Hall, 229 19th Ave S, Minneapolis, MN 55454, USA0IASSIST2015 Poster PresentationNCRNai1ec-2790@www.ncrn.cornell.edu20171214T020841ZConferences,Presentation,vilhuberhttp://www.amstat.org/meetings/JSM/2015/onlineprogram/AbstractDetails.cfm?abstractid=315820“Synthetic Longitudinal Business Databases for International Comparisons” — Joerg Drechsler, Institute for Employment Research ; Lars Vilhuber, Cornell University
International comparison studies on economic activity are often hampered by the fact that access to business microdata is very limited on an international level. A recently launched project tries to overcome these limitations by improving access to Business Censuses from multiple countries based on synthetic data. Starting from the synthetic version of the longitudinally edited version of the U.S. Business Register (the Longitudinal Business Database, LBD), the idea is to create similar data products in other countries by applying the synthesis methodology developed for the LBD to generate synthetic replicates that could be distributed without confidentiality concerns. In this paper we present some first results of this project based on German business data collected at the Institute for Employment Research.
http://www.amstat.org/meetings/JSM/2015/onlineprogram/AbstractDetails.cfm?abstractid=315820
20150811T14000020150811T155000+47.611389;-122.33168Joint Statistical Meetings (JSM) 2015 @ 800 Convention Pl, Seattle, WA 98101, USA0JSM 2015: Synthetic Longitudinal Business Databases for International ComparisonsCensus Bureau,Germany,IAB,LBD,NCRN,SynLBD,United Statesai1ec-2874@www.ncrn.cornell.edu20171214T020841ZConferences,Presentation,vilhuberSession: “Privacy preservation and the use of synthetic data for public use statistics – Contributed Papers“,
Chair: Jerry Reiter, Duke University;
Organizer: Lars Vilhuber, Cornell University
Discussant: John Abowd, Cornell University
8:35 AM
Synthetic Data Generation for Firm Links — Satkartar Kinney, NISS ; Jerry Reiter, Duke University
8:55 AM
Assessing the Data Quality of Public Use Tabulations Produced from Synthetic Data: Synthetic Business Dynamics Statistics— Lars Vilhuber, Cornell University ; Javier Miranda, U.S. Census Bureau
9:15 AM
Editing, Imputation, and Synthesis: A Public Use File for the Census of Manufactures — Hang Kim, NISS/Duke University ; Jerry Reiter, Duke University
9:35 AM
Differential Privacy and Verification of Results — David McClure ; Jerry Reiter, Duke University ; Ashwin Machanavajjhala, Duke University
9:55 AM
Discussant: John Abowd, Cornell University
10:15 AM
Floor Discussion
20150813T08300020150813T102000+47.611389;-122.33168Joint Statistical Meetings (JSM) 2015 @ 800 Convention Pl, Seattle, WA 98101, USA0JSM 2015 Session: Privacy Preservation and the Use of Synthetic Data for Public Use Statistics — Topic Contributed PapersNCRNai1ec-2788@www.ncrn.cornell.edu20171214T020841ZConferences,Presentation,vilhuberhttp://www.amstat.org/meetings/jsm/2015/onlineprogram/AbstractDetails.cfm?abstractid=316288“Assessing the Data Quality of Public Use Tabulations Produced from Synthetic Data: Synthetic Business Dynamics Statistics“, Lars Vilhuber, Cornell University; Javier Miranda, U.S. Census Bureau
Discussant: John Abowd, Cornell University
We describe and analyze a method that blends records from both observed and synthetic microdata into public-use tabulations on establishment statistics. The resulting tables use synthetic data only in potentially sensitive cells. We describe different algorithms, and present preliminary results when applied to the Census Bureau’s Business Dynamics Statistics and Synthetic Longitudinal Business Database, highlighting accuracy and protection afforded by the method when compared to existing public-use tabulations (with suppressions).
http://www.amstat.org/meetings/jsm/2015/onlineprogram/AbstractDetails.cfm?abstractid=316288
20150813T08300020150813T102000+47.611389;-122.33168Joint Statistical Meetings (JSM) 2015 @ 800 Convention Pl, Seattle, WA 98101, USA0JSM 2015: Assessing the Data Quality of Public Use Tabulations Produced from Synthetic Data: Synthetic Business Dynamics StatisticsfreeASA,Joint Statistical Meetings,JSM,NCRN,Seattle,SynLBDcalendar.1870.field_date_with_zone.0@www.ncrn.info20171214T020841ZDue to a schedule conflict, the seminar has been cancelled.
Speaker: Bimal Sinha (University of Maryland, Baltimore County)
Title: Noise Multiplication for Statistical Disclosure Control of Extreme Values in Log-normal Regression Samples
Abstract: Noise Multiplication for Statistical Disclosure Control of Extreme Values in Log-normal Regression Samples (Bimal Sinha) In this article multiplication of original data values by random noise is suggested as a disclosure control strategy when only the top part of the data is sensitive, as is often the case with income data. The proposed method can serve as an alternative to top coding which is a standard method in this context. Because the log-normal distribution usually fits income data well, the present investigation focuses exclusively on the log-normal. It is assumed that the log-scale mean of the sensitive variable is described by a linear regression on a set of non-sensitive covariates, and we show how a data user can draw valid inference on the parameters of the regression. An appealing feature of noise multiplication is the presence of an explicit tuning mechanism, namely, the noise generating distribution. By appropriately choosing this distribution, one can control the accuracy of inferences and the level of disclosure protection desired in the released data. Usually, more information is retained on the top part of the data under noise multiplication than under top coding. Likelihood based analysis is developed when only the large values in the data set are noise multiplied, under the assumption that the original data form a sample from a log-normal distribution. In this scenario, data analysis methods are developed under two types of data releases: (I) each released value includes an indicator of whether or not it has been noise multiplied, and (II) no such indicator is provided. A simulation study is carried out to assess the accuracy of inference for some parameters of interest. Since top coding and synthetic data methods are already available as disclosure control strategies for extreme values, some comparisons with the proposed method are made through a simulation study. The results are illustrated with a data analysis example based on 2000 U.S. Current Population Survey data. Furthermore, a disclosure risk evaluation of the proposed methodology is presented in the context of the Current Population Survey data example, and the disclosure risk of the proposed noise multiplication method is compared with the disclosure risk of synthetic data.
Location:
Carnegie Mellon: contact William Eddy (bill@cmu.edu)
Census Bureau headquarters: Room T5, contact Nancy Bates (nancy.a.bates@census.gov)
Cornell University, Ithaca campus: Ives 105, contact Lars Vilhuber (lars.vilhuber@cornell.edu)
Duke University: contact Jerry Reiter (jerry@stat.duke.edu)
University of Michigan: Room 3443 ISR-Thompson, contact Maggie Levenstein (maggiel@umich.edu)
University of Missouri: contact Scott Holan (holans@missouri.edu)
University of Nebraska-Lincoln: Room TBD: contact: Kristen Olson (kolson5@unl.edu)
Northwestern University: contact Zach Seeskin (z-seeskin@u.northwestern.edu)
Streaming video: [click here] (link active about 5 minutes after start of seminar)
Date:
Sep 02, 2015, 3:00pm to 4:30pm EDT
Address:
University of Maryland, Baltimore County
1000 Hilltop Circle
Baltimore, MD 21250United States
Location:
20150902T15000020150902T1630000NCRN Virtual Seminar – CANCELLED – Noise Multiplication for Statistical Disclosure Control of Extreme Values in Log-normal Regression SamplesNCRNai1ec-2848@www.ncrn.cornell.edu20171214T020841ZConferences,Presentation,vilhuberhttps://sites.stanford.edu/researchdatacenter/about-conference“Analyzing Earnings Inequality in the United States: Trends from Longitudinally Linked Employer-Employee Data”, John Abowd (NCRN, Cornell University), Kevin McKinney (U.S. Census Bureau), Nellie Zhao (Cornell University)
20150918T13300020150918T141500+37.432005;-122.175774Stanford University @ 291 Campus Drive, Stanford, CA 94305, USA0RDC 2015: “Analyzing Earnings Inequality in the United States: Trends from Longitudinally Linked Employer-Employee Data”NCRN,RDCai1ec-2849@www.ncrn.cornell.edu20171214T020841ZConferences,Presentation,vilhuberhttps://sites.stanford.edu/researchdatacenter/about-conference“Total Variability Measures for Selected Quarterly Workforce Indicators and LEHD Origin Destination Employment Statistics in OnTheMap”, Andrew Green (Cornell University), Kevin McKinney (U.S. Census Bureau), Lars Vilhuber (Cornell University), John Abowd (Cornell University)
Abstract
We report results from the first comprehensive total quality evaluation of three major indicators in the U.S. Census Bureau’s Longitudinal Employer-Household Dynamics (LEHD) Program Quarterly Workforce Indicators (QWI): beginning-of-quarter employment, full-quarter employment, and average monthly earnings of full-quarter employees. Beginning-of-quarter employment is also the main tabulation variable in the LEHD Origin-Destination Employment Statistics workplace reports as displayed in OnTheMap (OTM). The evaluation is conducted using the multiple threads generated by the edit and imputation models used in the LEHD Infrastructure File System. These threads conform to the Rubin (1987) multiple imputation model. Each implicate is the output of formal probability models that address coverage, edit and imputation errors. Design-based sampling variability and finite population corrections are also included in the evaluation. We derive special formulas for the Rubin total variability and its components that are consistent with the disclosure avoidance system used for QWI and LODES/OTM workplace reports. These formulas allow us to publish the complete set of detailed total quality measures for QWI and LODES. The analysis reveals that the three publication variables under study are estimated very accurately for tabulations involving at least 10 jobs. Tabulations involving three to nine jobs have acceptable quality. Tabulations involving one or two jobs, which are generally suppressed in the QWI, have substantial total variability but their publication in LODES allows the formation of larger custom aggregations, which will in general have the accuracy estimated for tabulations in the QWI of similar magnitude.
20150918T13300020150918T141500+37.432005;-122.175774Stanford University @ 291 Campus Drive, Stanford, CA 94305, USA0RDC 2015: “Total Variability Measures for Selected Quarterly Workforce Indicators and LEHD Origin Destination Employment Statistics in OnTheMap”NCRN,RDCcalendar.2063.field_date_with_zone.0@www.ncrn.info20171214T020841ZSpeaker: Maria De Yoreo (Duke University)
Title: Incorporating Conditionally Representative Auxiliary Information in Data Fusion
Abstract: In data fusion analysts seek to combine information from two databases comprised of disjoint sets of individuals, in which some variables appear in both databases and other variables appear in only one database. Most data fusion techniques rely on variants of conditional independence assumptions, which can lead to unreliable inferences if this assumption is not satisfied. We propose a data fusion technique that allows analysts to easily incorporate auxiliary information (glue) on the dependence structure of variables not observed jointly. Using simulations, we illustrate the benefits of leveraging the information in glue. We also perform a data fusion experiment with the goal to fuse two surveys from the book publisher HarperCollins, using glue obtained from the Internet polling company CivicScience. Due to the convenience sampling nature of the auxiliary online survey, we find that the glue is not representative of the population sampled by HarperCollins. This is a scenario very likely to be encountered in practice, and points to the more general problem of combining information from multiple data sources that are not all probability samples of the same population. We discuss current work in this direction. (archived presentation)
Paper: http://arxiv.org/abs/1506.05886.
Location:
Carnegie Mellon: contact William Eddy (bill@cmu.edu)
Census Bureau headquarters: Conference Room 1, contact Nancy Bates (nancy.a.bates@census.gov)
Cornell University, Ithaca campus: Ives 105, contact Lars Vilhuber (lars.vilhuber@cornell.edu)
Duke University: contact Jerry Reiter (jerry@stat.duke.edu)
University of Michigan: Room 3443 ISR-Thompson, contact Maggie Levenstein (maggiel@umich.edu)
University of Missouri: contact Scott Holan (holans@missouri.edu)
University of Nebraska-Lincoln: Room TBD: contact: Kristen Olson (kolson5@unl.edu)
Northwestern University: contact Zach Seeskin (z-seeskin@u.northwestern.edu)
Streaming video: [click here] (link active about 5 minutes after start of seminar)
Nodes:
Duke University / National Institute of Statistical Sciences (NISS)
Date:
Oct 07, 2015, 3:00pm to 4:30pm EDT
Address:
Durham, NC 27708United States
Video:
Attachments:
Presentation
Location:
20151007T15000020151007T1630000NCRN Virtual Seminar – Incorporating Conditionally Representative Auxiliary Information in Data FusionNCRNai1ec-2852@www.ncrn.cornell.edu20171214T020841ZConferences,Presentation,vilhuberhttp://economics.cornell.edu/seminars/joint-microeconomics-and-computer-science-workshop-john-abowd“Revisiting the Economics of Privacy: Population Statistics and Confidentiality Protection as Public Good”, John Abowd (Cornell University and U.S. Census Bureau), Ian Schmutte (University of Georgia)
Abstract
We consider the problem of the public release of statistical information about a population–explicitly accounting for the public-good properties of both data accuracy and privacy loss. We first consider the implications of adding the public-good component to recently published models of private data publication under differential privacy guarantees using a Vickery-Clark-Groves mechanism and a Lindahl mechanism. We show that data quality will be inefficiently under-supplied. Next, we develop a standard social planner’s problem using the technology set implied by (ε, δ)-differential privacy with (α, β)-accuracy for the Private Multiplicative Weights query release mechanism to study the properties of optimal provision of data accuracy and privacy loss when both are public goods. Using the production possibilities frontier implied by this technology, explicitly parameterized interdependent preferences, and the social welfare function, we display properties of the solution to the social planner’s problem. Our results directly quantify the optimal choice of data accuracy and privacy loss as functions of the technology and preference parameters. Some of these properties can be quantified using population statistics on marginal preferences and correlations between income, data accuracy preferences, and privacy loss preferences that are available from survey data. Our results show that government data custodians should publish more accurate statistics with weaker privacy guarantees than would occur with purely private data publishing. Our statistical results using the General Social Survey and the Cornell National Social Survey indicate that the welfare losses from under-providing data accuracy while over-providing privacy protection can be substantial.
20151019T161500+42.447255;-76.48225Cornell University @ Uris Hall, Ithaca, NY 14853, USA0Abowd @ Cornell Microeconomic Theory and Computer Science Workshop: “Revisiting the Economics of Privacy: Population Statistics and Confidentiality Protection as Public Good”NCRN1ai1ec-2878@www.ncrn.cornell.edu20171214T020841ZConferences,Presentation,vilhuberJohn M. Abowd; http://www.economics.cornell.edu/seminars/joint-microeconomics-and-computer-science-workshop-john-abowdJoint Microeconomics & Computer Science Workshop: John M. Abowd
Abstract: We consider the problem of the public release of statistical information about a population–explicitly accounting for the public-good properties of both data accuracy and privacy loss. We first consider the implications of adding the public-good component to recently published models of private data publication under differential privacy guarantees using a Vickery-Clark-Groves mechanism and a Lindahl mechanism. We show that data quality will be inefficiently under-supplied. Next, we develop a standard social planner’s problem using the technology set implied by (ε, δ)-differential privacy with (α, β)-accuracy for the Private Multiplicative Weights query release mechanism to study the properties of optimal provision of data accuracy and privacy loss when both are public goods. Using the production possibilities frontier implied by this technology, explicitly parameterized interdependent preferences, and the social welfare function, we display properties of the solution to the social planner’s problem. Our results directly quantify the optimal choice of data accuracy and privacy loss as functions of the technology and preference parameters. Some of these properties can be quantified using population statistics on marginal preferences and correlations between income, data accuracy preferences, and privacy loss preferences that are available from survey data. Our results show that government data custodians should publish more accurate statistics with weaker privacy guarantees than would occur with purely private data publishing. Our statistical results using the General Social Survey and the Cornell National Social Survey indicate that the welfare losses from under-providing data accuracy while over-providing privacy protection can be substantial.
Paper: https://ecommons.cornell.edu/handle/1813/40581
20151019T16150020151019T174500+42.447255;-76.48225498 Uris Hall @ Uris Hall, Ithaca, NY 14853, USA0Abowd presents Revisiting the Economics of Privacy: Population Statistics and Confidentiality Protection as Public GoodsNCRNai1ec-2829@www.ncrn.cornell.edu20171214T020841ZConferences,Presentation,vilhuberhttp://caed2015.sabanciuniv.edu“Usage and outcomes of the Synthetic Data Server,” Lars Vilhuber (NCRN, Cornell University) and John Abowd (NCRN, Cornell University)
The Synthetic Data Server (SDS) at Cornell University was set up to provide early access to new synthetic data products by the U.S. Census Bureau. These datasets are made available to interested researchers in a controlled environment, prior to a more generalized release. Over the past 5 years, 4 synthetic datasets were made available on the server, and over 100 users have accessed the server over that time period. This paper reports on interim outcomes of the activity: results of validation requests from a user perspective, functioning of the feedback loop due to validation and user input, and the role of the SDS as a access gateway to and educational tool for other mechanisms of accessing detailed person, household, establishment, and firm statistics.
Tickets: http://caed2015.sabanciuniv.edu/registration-form.
Tickets: http://caed2015.sabanciuniv.edu/registration-form.20151023T08300020151025T141500+41.03714;+28.98099Comparative Analysis of Enterprise Data (CAED) 2015 Conference @ Şht. Muhtar, taksim istanbul apart, 34435 Beyoğlu/İstanbul, Turkey0Vilhuber @ CAED 2015: “Usage and outcomes of the Synthetic Data Server”externalCAED,NCRN,SIPP Synthetic Beta,SynLBD,Synthetic,Synthetic Data Serverhttp://caed2015.sabanciuniv.edu/registration-formai1ec-2853@www.ncrn.cornell.edu20171214T020841ZConferences,Presentation,vilhuberhttp://www.cla.temple.edu/economics/ai1ec_event/john-abowd-cornell/?instance_id=52084“Revisiting the Economics of Privacy: Population Statistics and Confidentiality Protection as Public Good”, John Abowd (Cornell University and U.S. Census Bureau), Ian Schmutte (University of Georgia)
Abstract
We consider the problem of the public release of statistical information about a population–explicitly accounting for the public-good properties of both data accuracy and privacy loss. We first consider the implications of adding the public-good component to recently published models of private data publication under differential privacy guarantees using a Vickery-Clark-Groves mechanism and a Lindahl mechanism. We show that data quality will be inefficiently under-supplied. Next, we develop a standard social planner’s problem using the technology set implied by (ε, δ)-differential privacy with (α, β)-accuracy for the Private Multiplicative Weights query release mechanism to study the properties of optimal provision of data accuracy and privacy loss when both are public goods. Using the production possibilities frontier implied by this technology, explicitly parameterized interdependent preferences, and the social welfare function, we display properties of the solution to the social planner’s problem. Our results directly quantify the optimal choice of data accuracy and privacy loss as functions of the technology and preference parameters. Some of these properties can be quantified using population statistics on marginal preferences and correlations between income, data accuracy preferences, and privacy loss preferences that are available from survey data. Our results show that government data custodians should publish more accurate statistics with weaker privacy guarantees than would occur with purely private data publishing. Our statistical results using the General Social Survey and the Cornell National Social Survey indicate that the welfare losses from under-providing data accuracy while over-providing privacy protection can be substantial.
20151030T14300020151030T160000+39.981437;-75.15507Temple University RA580 @ Temple University, 1801 N Broad St, Philadelphia, PA 19122, USA0Abowd @ Temple University Economics Department Workshop: “Revisiting the Economics of Privacy: Population Statistics and Confidentiality Protection as Public Good”NCRNcalendar.2064.field_date_with_zone.0@www.ncrn.info20171214T020841ZSpeaker: Robert Colosi (U.S. Census Bureau)
Title: Presentation of 2020 Census Operational Plan
Abstract: The 2020 Census Operational Plan was baselined in October 2015. This high level review will highlight some major innovations documented in that plan. Further discussion includes future research to further refine the design and testing planned in the upcoming years.(archived presentation)
Location:
Carnegie Mellon: contact William Eddy (bill@cmu.edu)
Census Bureau headquarters: Room T5, contact Nancy Bates (nancy.a.bates@census.gov)
Cornell University, Ithaca campus: Ives 105, contact Lars Vilhuber (lars.vilhuber@cornell.edu)
Duke University: contact Jerry Reiter (jerry@stat.duke.edu)
University of Michigan: Room 3443 ISR-Thompson, contact Maggie Levenstein (maggiel@umich.edu)
University of Missouri: contact Scott Holan (holans@missouri.edu)
University of Nebraska-Lincoln: Room TBD: contact: Kristen Olson (kolson5@unl.edu)
Northwestern University: contact Zach Seeskin (z-seeskin@u.northwestern.edu)
Streaming video: [click here] (link active about 5 minutes after start of seminar)
Nodes:
NCRN Coordinating Office
Date:
Nov 04, 2015, 3:00pm to 4:30pm EST
Address:
4600 Silver Hill Rd.
Suitland, MDUnited States
Video:
Attachments:
Slides – Overview of 2020 Design
Location:
20151104T15000020151104T1630000NCRN Virtual Seminar – Presentation of 2020 Census Operational PlanNCRNai1ec-4170@www.vilhuber.com/lars20171214T020841ZPresentationPresentation is attached.
Additional info:
Metadata at Cornell: CED²AR-related activities, software, and presentations are available at http://www.ncrn.cornell.edu, in particular the CED²AR page, and the CED²AR demo site.
Self-deposit repositories:
Harvard Dataverse – at the time of this writing, creating a personal dataverse is free. Here’s mine.
ICPSR and openICPSR (for openICPSR: “Self-Deposit Package – $600 – Immediate access & DOI; long term storage; metadata review. Self-deposit package is free to current ICPSR members. ICPSR accepts individual deposits for replication data).
Papers on differential privacy and the economics of privacy by my colleagues: “Revisiting the Economics of Privacy” and “Economic Analysis and Statistical Disclosure Limitation“
Data
Quarterly Workforce Indicators (QWI) and paper
LEHD Origin Destination Employment Statistics (LODES) and paper
Other papers can be found in my bibliography
20151111T15000020151111T163000+45.512749;-73.560832UQAM DS-5650 @ 320 Rue Sainte-Catherine E, Montréal, QC H2X 1L7, Canada0Economics and the economics of privacy: new methods of accessing new datafreeai1ec-2918@www.ncrn.cornell.edu20171214T020841ZConferences,Presentation,vilhuberhttp://economie.esg.uqam.ca/seminaires-departementaux/2015.html?lang=fr#novembre20151111T15000020151111T163000+45.512749;-73.560832Université du Québec à Montréal @ 320 Rue Sainte-Catherine E, Montréal, QC H2X 1L7, Canada0Vilhuber @ Université du Québec à Montréal (UQAM): Economics and the economics of privacy: new methods of accessing new data (in French)NCRNai1ec-2835@www.ncrn.cornell.edu20171214T020841ZConferences,Presentation,vilhuberhttps://fcsm.sites.usa.gov/files/2014/11/Advance_Program_8262015.pdf“Total Variability Measures for Selected Quarterly Workforce Indicators and LEHD Origin Destination Employment Statistics in OnTheMap”, Kevin McKinney (U.S. Census Bureau), Lars Vilhuber (Cornell University and U.S. Census Bureau), John Abowd (Cornell University and U.S. Census Bureau), Andrew Green (Cornell University)
Abstract
We report results from the first comprehensive total quality evaluation of three major indicators in the U.S. Census Bureau’s Longitudinal Employer-Household Dynamics (LEHD) Program Quarterly Workforce Indicators (QWI): beginning-of-quarter employment, full-quarter employment, and average monthly earnings of full-quarter employees. Beginning-of-quarter employment is also the main tabulation variable in the LEHD Origin-Destination Employment Statistics workplace reports as displayed in OnTheMap (OTM). The evaluation is conducted using the multiple threads generated by the edit and imputation models used in the LEHD Infrastructure File System. These threads conform to the Rubin (1987) multiple imputation model. Each implicate is the output of formal probability models that address coverage, edit and imputation errors. Design-based sampling variability and finite population corrections are also included in the evaluation. We derive special formulas for the Rubin total variability and its components that are consistent with the disclosure avoidance system used for QWI and LODES/OTM workplace reports. These formulas allow us to publish the complete set of detailed total quality measures for QWI and LODES. The analysis reveals that the three publication variables under study are estimated very accurately for tabulations involving at least 10 jobs. Tabulations involving three to nine jobs have acceptable quality. Tabulations involving one or two jobs, which are generally suppressed in the QWI, have substantial total variability but their publication in LODES allows the formation of larger custom aggregations, which will in general have the accuracy estimated for tabulations in the QWI of similar magnitude.
20151201T13150020151201T150000+38.903656;-77.022947FCSM 2015 Research Conference @ 801 Mt Vernon Pl NW, Washington, DC 20001, USA0FCSM 2015: Total Variability Measures for Selected Quarterly Workforce Indicators and LEHD Origin Destination Employment Statistics in OnTheMapfreeFCSM,NCRNai1ec-2832@www.ncrn.cornell.edu20171214T020841ZConferences,Presentation,vilhuberhttps://fcsm.sites.usa.gov/files/2014/11/Advance_Program_8262015.pdf“Two Perspectives on Commuting and Workplace: A Microdata Comparison of Home to Work Flows Across Linked Survey and Administrative Files,” Andrew Green (U.S. Census Bureau, Cornell University), Mark Kutzbach (U.S. Census Bureau), Lars Vilhuber (U.S. Census Bureau, Cornell University)
20151201T15150020151201T170000+38.903656;-77.022947Federal Committee on Statistical Methodology (FCSM) 2015 Research Conference @ 801 Mt Vernon Pl NW, Washington, DC 20001, USA0FCSM 2015: Two Perspectives on Commuting and Workplace: A Microdata Comparison of Home to Work Flows Across Linked Survey and Administrative FilesFCSM,NCRNai1ec-2833@www.ncrn.cornell.edu20171214T020841ZConferences,Presentation,vilhuberhttps://fcsm.sites.usa.gov/files/2014/11/Advance_Program_8262015.pdf“Crowdsourcing Codebook Enhancements: A DDI-based Approach”
Benjamin Perry (Cornell University), Venkata Kambhampaty (Cornell University), Kyle Brumsted (McGill University), Lars Vilhuber (Cornell University), William Block (Cornell University)
20151202T08300020151202T101500+38.903656;-77.022947Federal Committee on Statistical Methodology (FCSM) 2015 Research Conference @ 801 Mt Vernon Pl NW, Washington, DC 20001, USA0Vilhuber presents at FCSM 2015: Crowdsourcing Codebook Enhancements: A DDI-based ApproachfreeFCSM,NCRNai1ec-2859@www.ncrn.cornell.edu20171214T020841ZConferences,Presentation,vilhuberhttps://fcsm.sites.usa.gov/reports/research/2015-research-conference/“Formal Privacy Protection for Data Products Combining Individual and Employer Frames”, Ashwin Machanavajjhala (Duke University), Samuel Haney (Duke University), Matthew Graham (U.S. Census Bureau), Mark Kutzbach (U.S. Census Bureau), Lars Vilhuber (Cornell University and U.S. Census Bureau), John Abowd (Cornell University and U.S. Census Bureau)
20151203T10300020151203T121500+38.903656;-77.022947Federal Committee on Statistical Methodology (FCSM) 2015 Research Conference @ 801 Mt Vernon Pl NW, Washington, DC 20001, USA0FCSM 2015: “Formal Privacy Protection for Data Products Combining Individual and Employer Frames”FCSM,NCRNai1ec-4240@www.vilhuber.com/lars20171214T020841ZPresentation,vilhuberSpeakers: Saki Kinney (RTI) and Lars Vilhuber (Cornell University)
Title: Synthetic Establishment and Firm Data
Abstract:
‘Assessing the Data Quality of Public Use Tabulations Produced from Synthetic Data: Synthetic Business Dynamics Statistics’ (Lars Vilhuber, Cornell)
We describe and analyze a method that blends records from both observed and synthetic microdata into public-use tabulations on establishment statistics. The resulting tables use synthetic data only in potentially sensitive cells. We describe different algorithms, and present preliminary results when applied to the Census Bureau’s Business Dynamics Statistics and Synthetic Longitudinal Business Database, highlighting accuracy and protection afforded by the method when compared to existing public-use tabulations (with suppressions).
‘Synthetic Data Generation for Firm Links’ (Saki Kinney, RTI)
In most countries, national statistical agencies do not release establishment-level business microdata, because doing so represents too large a risk to establishments’ confidentiality. Agencies potentially can manage these risks by releasing synthetic microdata, i.e., individual establishment records simulated from statistical models designed to mimic the joint distribution of the underlying observed data. Previously, we used this approach to generate a public-use version—now available for public use—of the U.S. Census Bureau’s Longitudinal Business Database (LBD), a longitudinal census of establishments dating back to 1976. While the synthetic LBD has proven to be a useful product, we now seek to improve and expand it by using new synthesis models and adding features. This paper describes our efforts to create the second generation of the SynLBD, including synthesis procedures that we believe could be replicated in other contexts.
Location:
Carnegie Mellon: contact William Eddy (bill@cmu.edu)
Census Bureau headquarters: Room T5, contact Nancy Bates (nancy.a.bates@census.gov)
Cornell University, Ithaca campus: Ives 105, contact Lars Vilhuber (lars.vilhuber@cornell.edu)
Duke University: contact Jerry Reiter (jerry@stat.duke.edu)
University of Michigan: Room 3443 ISR-Thompson, contact Maggie Levenstein (maggiel@umich.edu)
University of Missouri: contact Scott Holan (holans@missouri.edu)
University of Nebraska-Lincoln: Room TBD: contact: Kristen Olson (kolson5@unl.edu)
Northwestern University: contact Zach Seeskin (z-seeskin@u.northwestern.edu)
Streaming video: [click here] (link active about 5 minutes after start of seminar)
Nodes:
Cornell University
Duke University / National Institute of Statistical Sciences (NISS)
Date:
Jan 06, 2016, 3:00pm to 4:30pm EST
Address:
Ithaca, NY 14853United States
Location:
20160106T15000020160106T1630000NCRN Virtual Seminar – Synthetic establishment and firm datacalendar.2130.field_date_with_zone.0@www.ncrn.info20171214T020841ZSpeakers: Saki Kinney (RTI) and Lars Vilhuber (Cornell University)
Title: Synthetic Establishment and Firm Data
Abstract:
‘Assessing the Data Quality of Public Use Tabulations Produced from Synthetic Data: Synthetic Business Dynamics Statistics’ (Lars Vilhuber, Cornell)
We describe and analyze a method that blends records from both observed and synthetic microdata into public-use tabulations on establishment statistics. The resulting tables use synthetic data only in potentially sensitive cells. We describe different algorithms, and present preliminary results when applied to the Census Bureau’s Business Dynamics Statistics and Synthetic Longitudinal Business Database, highlighting accuracy and protection afforded by the method when compared to existing public-use tabulations (with suppressions). (archived presentation)
‘Synthetic Data Generation for Firm Links’ (Saki Kinney, RTI)
In most countries, national statistical agencies do not release establishment-level business microdata, because doing so represents too large a risk to establishments’ confidentiality. Agencies potentially can manage these risks by releasing synthetic microdata, i.e., individual establishment records simulated from statistical models designed to mimic the joint distribution of the underlying observed data. Previously, we used this approach to generate a public-use version—now available for public use—of the U.S. Census Bureau’s Longitudinal Business Database (LBD), a longitudinal census of establishments dating back to 1976. While the synthetic LBD has proven to be a useful product, we now seek to improve and expand it by using new synthesis models and adding features. This paper describes our efforts to create the second generation of the SynLBD, including synthesis procedures that we believe could be replicated in other contexts. (archived presentation)
Location:
Carnegie Mellon: contact William Eddy (bill@cmu.edu)
Census Bureau headquarters: Room T5, contact Nancy Bates (nancy.a.bates@census.gov)
Cornell University, Ithaca campus: Ives 105, contact Lars Vilhuber (lars.vilhuber@cornell.edu)
Duke University: contact Jerry Reiter (jerry@stat.duke.edu)
University of Michigan: Room 3443 ISR-Thompson, contact Maggie Levenstein (maggiel@umich.edu)
University of Missouri: contact Scott Holan (holans@missouri.edu)
University of Nebraska-Lincoln: Room TBD: contact: Kristen Olson (kolson5@unl.edu)
Northwestern University: contact Zach Seeskin (z-seeskin@u.northwestern.edu)
Streaming video: [click here] (link active about 5 minutes after start of seminar)
Nodes:
Cornell University
Duke University / National Institute of Statistical Sciences (NISS)
Date:
Jan 06, 2016, 3:00pm to 4:30pm EST
Address:
Ithaca, NY 14853United States
Attachments:
Presentation (Vilhuber)
Presentation (Kinney)
Location:
20160106T15000020160106T1630000NCRN Virtual Seminar – Synthetic establishment and firm dataNCRNai1ec-39@www.vrdc.cornell.edu/info747x20171214T020841ZINFO7470
We will introduce the teaching environment, and present the class itself. An overview of the U.S. statistical system is given.
Lecture notes
INFO7470-S1-2016-Course Introduction
INFO7470-S1-2016-Technical points
INFO7470-S1-2016-Overview of the U.S. Statistical System
20160201T13250020160201T1610000Session 1: Course Introduction and Overview of the U.S. Statistical Systemfreecalendar.2065.field_date_with_zone.0@www.ncrn.info20171214T020841ZThe February 3 NCRN Virtual Seminar was cancelled because of an unscheduled medical procedure. The presentation has been moved to the April 6 NCRN Virtual Seminar.
Nodes:
NCRN Coordinating Office
Date:
Feb 03, 2016, 3:00pm to 4:30pm EST
Address:
United States
Location:
20160203T15000020160203T1630000CANCELLED – NCRN Virtual Seminar: MicroclusteringNCRNai1ec-40@www.vrdc.cornell.edu/info747x20171214T020841ZINFO7470
Margo Anderson (University of Wisconsin – Milwaukee) presents on the history of the federal statistical system
Readings and other information
See the edX session page.
Lecture Notes
“Historical Perspectives on the U.S. Federal Statistical System“
20160208T13250020160208T1610000Session 2: History of the Federal Statistical Infrastructurefreeai1ec-107@www.vrdc.cornell.edu/info747x20171214T020841ZINFO7470
This lecture is a “flipped” lecture. However, we have the privilege of discussing in class (LIVE) a variety of topics on the federal statistical system with one of the foremost experts on it, Connie Citro (CNSTAT).
Lecture Notes
INFO7470-S4-Household Surveys
Discussion Notes (Connie Citro)
Citro – 2016-NCRN – 44 years of CNSTAT (PDF).
Also see National Research Council. 2013. “Principles and Practices for a Federal Statistical Agency: Fifth Edition.” Washington, DC: The National Academies Press. doi: 10.17226/18318. (referenced by Connie Citro).
20160222T13250020160222T1610000Session 4: Measuring People and Householdsfreeai1ec-108@www.vrdc.cornell.edu/info747x20171214T020841ZINFO7470
This lecture is a “flipped” lecture. Discussion of the materials viewed by students will occur on February 29, 2016.
Lecture Notes
INFO7470-S5 Economic Statistics
Updates: INFO7470-S5 Updates
Lab
The lab is posted on edX, was made available to registered students on Feb 22, 2016, and is due on March 1, 2016 at 5:01 UTC (12:01 AM EST).
20160229T13250020160229T1610000Session 5: Measuring Business and Economic Activityfreecalendar.2066.field_date_with_zone.0@www.ncrn.info20171214T020841ZSpeaker: Kristen Olson (Nebraska)
Title: ‘The effect of question and questionnaire characteristics on interviewer and respondent behaviors in CATI surveys.’
Abstract:
In this paper, we evaluate the joint effects of question, respondent and interviewer characteristics on two proxy indicators of data quality – response time and question misreading – in a telephone survey. We include question features traditionally examined, such as the length of the question and format of response options, and features that are related to the layout and format of interviewer-administered questions. First, we examine how these question features affect the time to ask and answer survey questions and how different interviewers vary in their administration of these questions. Second, we investigate how choices in visual design features in particular, that is design features that require interviewer decisions, contribute to interviewer question misreading. These two measures of question time and question misreading are both proxies for the risk of measurement error in responses to survey questions.
To examine these questions, we use paradata and behavior codes from the Work and Leisure Today (n=450, AAPOR RR3=6.3%) survey and use cross-classified random effects models. Overall, more of the variation in both response time and question misreading is due to question characteristics compared to respondent or interviewer attributes. Additionally, we find that question characteristics related to necessary survey design features and respondent confusion are the primary predictors of response time, with little effect of visual design features of the question. Our results for question misreading show a different pattern. Characteristics related to task complexity and visual design significantly affect question misreading, with little contribution of necessary survey design features. We conclude with implications for survey practice.(archived presentation)
Location:
Carnegie Mellon: contact William Eddy (bill@cmu.edu)
Census Bureau headquarters: Room T5, contact Nancy Bates (nancy.a.bates@census.gov)
Cornell University, Ithaca campus: Ives 105, contact Lars Vilhuber (lars.vilhuber@cornell.edu)
Duke University: contact Jerry Reiter (jerry@stat.duke.edu)
University of Michigan: Room 3443 ISR-Thompson, contact Maggie Levenstein (maggiel@umich.edu)
University of Missouri: contact Scott Holan (holans@missouri.edu)
University of Nebraska-Lincoln: Room TBD: contact: Kristen Olson (kolson5@unl.edu)
Northwestern University: contact Zach Seeskin (z-seeskin@u.northwestern.edu)
Streaming video: [click here] (link active about 5 minutes after start of seminar)
Nodes:
University of Nebraska
Date:
Mar 02, 2016, 3:00pm EST to Mar 03, 2016, 4:30pm EST
Address:
University of Nebraska
Lincoln, NE 68588United States
Video:
Attachments:
Presentation slides
Location:
20160302T15000020160303T1630000NCRN Virtual Seminar: The effect of question and questionnaire characteristics on interviewer and respondent behaviors in CATI surveysNCRNai1ec-109@www.vrdc.cornell.edu/info747x20171214T020841ZINFO7470
This lecture is a “flipped” lecture. Discussion of the materials viewed by students will occur on March 7, 2016.
Flipped Classroom
The recorded lectures on edX will be available to registered participants starting Feb 29, 2016.
Lab
The lab is posted on edX, will be available to registered participants starting Feb 29, 2016, and is due on March 14, 2016 at 21:00 UTC.
Lecture Notes
INFO7470-S6-JobStatistics – Part 1
INFO7470-S6-JobStatistics – Public-use QWI
INFO7470-S6-JobStatistics – LEHD sources
Updates: INFO7470-S6-JobStatistics – Updates
20160307T13250020160307T1610000Session 6: Measuring Jobsfreeai1ec-110@www.vrdc.cornell.edu/info747x20171214T020841ZINFO7470
This session is a live session, turned over to three guest presenters.Health statistics, energy statistics, agricultural statistics, others.
Jennifer Parker (NCHS) will present on health statistics (Lecture Notes: INFO7470-S7-Parker)
Richard Dunn (University of Connecticut) and Brent Hueth (University of Wisconsin-Madison) will present on agricultural statistics (Lecture Notes: INFO7470-S7-DunnHueth, additional materials, INFO7470-S7-Migrant Farm Labor in the Census of Agriculture)
Kristen Monaco and Nicole Nestoriak (BLS) will present on BLS data in the FSRDC (Lecture Notes: Session 7 – Monaco – BLS Data in the RDC)
Updates:
Recording
A recording of the live session will be available shortly afterwards.
20160314T13250020160314T1610000Session 7: Data from Other Statistical Agenciesfreeai1ec-111@www.vrdc.cornell.edu/info747x20171214T020841ZINFO7470
Part 1 will be “flipped classroom” on Geographic Information Systems (GIS) – basic geocoding, geographic concepts, and other topics. The recordings are from the 2013 INFO7470 lecture given by Michael Ratcliffe, of the Geography Division at the U.S. Census Bureau.
Part 2 will be about access to restricted access data. Students will be introduced to the research proposal mechanism of the Federal Statistical Research Data Center. This will also be “flipped”.
Part 3 is a live presentation on two particular aspects: how to access the RDC of the German Institute for Employment Research (Matthias Umkehrer), and considerations on requesting access to BLS data in the FSRDC (Kristen Monaco). For both topics, guest presenters from those institutions will present live in the videoconference classroom.
Lecture Notes
Geography: INFO7470-S8-Census Geography Concepts
Restricted Access Data: INFO7470-S8-Proposals, Kristen Monaco on BLS proposal review, Matthias Umkehrer on IAB access
Updates and Flipped Class questions: INFO7470-S8-Updates and flipped class questions
Additional links
IRS SOI Joint Statistical Research Program – with links to the 2014 Call for proposals (now closed)(local copy) and projects in 2012 and 2014
20160321T13250020160321T1610000Session 8: Census Geography – Restricted Access Datafreeai1ec-223@www.vrdc.cornell.edu/info747x20171214T020841ZINFO7470
This class has both “flipped” elements and live presentations.
The first (flipped) part considers alternate sources of data, such as registers, or “organic” data.
Part 2 is a live presentation on energy statistics, in particular those of the Energy Information Administration (EIA). Jacob Bournazian is our guest lecturer.
Part 3 switches gears, and discusses the need for and the requirements of replicable science. This part is a live lecture by Lars Vilhuber.
Lecture Notes
EIA presentation: INFO7470-S9-EIA-Background-2016
Register-based statistics: INFO7470-S9-Register-data
Alternate data sources: INFO7470-S9-Organic-data
Updates on the above: INFO7470-S9-Updates
Replicable Science: INFO7470-S9-Replicable Science
20160404T13250020160404T1610000Session 9: Alternate Sources of Data – EIA – Replicability of Researchfreecalendar.2067.field_date_with_zone.0@www.ncrn.info20171214T020841ZSpeaker: Beka Steorts (Assistant Professor of Statistical Science, Duke University)
Title: Microclustering: When the Cluster Sizes Grow Sublinearly with the Data Set
Abstract: Most generative models for clustering implicitly assume that the number of data points in each cluster grows linearly with the total number of data points. Finite mixture models, Dirichlet process mixture models, and Pitman–Yor process mixture models make this assumption, as do all other infinitely exchangeable clustering models. However, for some tasks, this assumption is undesirable. For example, when performing entity resolution, the size of each cluster is often unrelated to the size of the data set. Consequently, each cluster contains a negligible fraction of the total number of data points. Such tasks therefore require models that yield clusters whose sizes grow sublinearly with the size of the data set. We address this requirement by defining the microclustering property and introducing a new model that exhibits this property. We compare this model to several commonly used clustering models by checking model fit using real and simulated data sets. (archived presentation)
Location:
Carnegie Mellon: contact William Eddy (bill@cmu.edu)
Census Bureau headquarters: Room T5, contact Nancy Bates (nancy.a.bates@census.gov)
Cornell University, Ithaca campus: Ives 105, contact Lars Vilhuber (lars.vilhuber@cornell.edu)
Duke University: contact Jerry Reiter (jerry@stat.duke.edu)
University of Michigan: Room 3443 ISR-Thompson, contact Maggie Levenstein (maggiel@umich.edu)
University of Missouri: contact Scott Holan (holans@missouri.edu)
University of Nebraska-Lincoln: Room TBD: contact: Kristen Olson (kolson5@unl.edu)
Northwestern University: contact Zach Seeskin (z-seeskin@u.northwestern.edu)
Streaming video: [click here] (link active about 5 minutes after start of seminar)
Nodes:
Duke University / National Institute of Statistical Sciences (NISS)
NCRN Coordinating Office
Date:
Apr 06, 2016, 3:00pm to 4:30pm EDT
Address:
2099 Duke University Rd.
Durham, NC 27708United States
Video:
Attachments:
steortsMicroClusteringv2.pdf
Location:
20160406T15000020160406T1630000NCRN Virtual Seminar: MicroclusteringNCRNai1ec-112@www.vrdc.cornell.edu/info747x20171214T020841ZINFO7470
Formal models of edits and imputations
Missing data overview
Missing records – Frame or census – Survey
Missing items
Overview of different products
Overview of methods
Formal multiple imputation methods
Lecture Notes
INFO7470 S10 -Statistical Tools Edit and Imputation
Lab
The lab (an edit and imputation exercise) has been posted on the INFO7470x edX site. Your program needs to be uploaded by April 22, 2016 (this is clearly marked in the lab). You will then be asked to peer-review two other programs and answers between April 23 and April 29.
20160411T13250020160411T1610000Session 10: Statistical Tools – Edit and Imputationfreeai1ec-115@www.vrdc.cornell.edu/info747x20171214T020841ZINFO7470
Total quality evaluation – errors from coverage, sampling, edit, and imputation.
Introduction to record linking
What is record linking, what is it not, what is the theory?
Record linking: applications and examples – How do you do it, what do you need, what are the possible complications?
Examples of record linking
Lecture Notes
INFO7470 S11 -Updates
INFO7470 S11 -Statistical Tools Edit and Imputation Examples
INFO7470 S11-record-linking
20160418T13250020160418T1610000Session 11: Statistical Tools – Record Linkage and Total Quality Evaluationfreeai1ec-114@www.vrdc.cornell.edu/info747x20171214T020841ZINFO7470
Why must users of restricted-access data learn about confidentiality protection?
What is statistical disclosure limitation?
What are privacy-preserving data mining and differential privacy?
Basic methods for disclosure avoidance (SDL)
Rules and methods for model-based SDL
SDL-based noise methods
Differential privacy methods
Lecture Notes
INFO7470 S12 -Updates
INFO7470-S12-Statistical Disclosure Limitation
Supplementary Materials
For Updates: toy-example-imputation.xlsx
For SDL lecture: Randomized Response.xlsx
20160425T13250020160425T1610000Session 12: Statistical Tools – Disclosure Limitation Methodsfreeai1ec-2993@www.ncrn.cornell.edu20171214T020841ZConferences,Presentation,vilhuberhttp://www.bib.umontreal.ca/SS/atelier-idd/programme.htmLars Vilhuber will present on “Crowdsourcing et métadonnées : défis et perspectives” (in French). Presentation is available at http://hdl.handle.net/1813/43875
20160429T11000020160429T120000+45.503101;-73.576617McLennan Library, McGill University @ McTavish St, Montreal, QC H3A, Canada0Vilhuber gives Presentation at Annual Workshop of the Canadian Data Liberation InitiativefreeDDI,NCRNai1ec-113@www.vrdc.cornell.edu/info747x20171214T020841ZINFO7470
Part A: Spatial Analysis: This part of the lecture is a flipped class, consisting of a 2013 lecture given by Prof. Nicholas Nagle of University of Tennessee – Knoxville. You will find the video links on the edX class website.
Part B: Network Analysis: This part of the lecture is a live class.
Updates
INFO7470 S13 -Updates
Part A: Spatial Analysis
Topics
Basic Geocoding
Tools for Geocoding
Analysis Methods
Tools for Geographic Analysis
Lecture Notes
INFO7470 S13 – SpatialAnalysis – Nagle
About the Guest Lecturer
Nicholas Nagle, University of Tennessee – Knoxville
Nicholas Nagle is a GIScientist/geospatial analyst whose research centers on combining spatial data in order to produce more reliable geographic information. Prof. Nagle holds a joint faculty appointment with the Geographic Information Science and Technology group at Oak Ridge National Laboratory. He is currently working on a number of projects improving the availability and reliability of data from the US Census Bureau, developing methods to identify land cover change, and is working on a number of projects related to population and health, both in Tennessee and in developing countries.
Part B: Network Analysis
This part of the lecture is a live class.
Lecture Notes
INFO7470-S13-Statistical Tools-Hierarchical Models and Network Analysis
20160502T13250020160502T1610000Session 13: Geographic and Network Analysis Methodsfreeai1ec-2994@www.ncrn.cornell.edu20171214T020841ZConferences,Presentation,vilhuberSociety of Labor Economists; http://www.sole-jole.org/2016.htmlSession organized by John M. Abowd at the SOLE Conference 2016
Preliminary program:
A6: Issues in Data Privacy
Kobi Nissim: “TBD”
John M. Abowd and Ian M. Schmutte: “The Advantages and Disadvantages of Statistical Disclosure Limitation for Program Evaluation“
Lars Vilhuber and John M. Abowd: “Usage and Outcomes of the Synthetic Data Server“
20160506T08000020160506T093000+47.613684;-122.338241The Westin Seattle @ 1900 5th Ave, Seattle, WA 98101, USA0Abowd organizes Session at Society of Labor Economists Annual Meetings: A6: Issues in Data PrivacyfreeNCRNai1ec-4498@www.vilhuber.com/lars20171214T020841ZConferences,PresentationSociety of Labor Economists; http://www.sole-jole.org/2016.htmlSession organized by John M. Abowd at the SOLE Conference 2016
Preliminary program:
A6: Issues in Data Privacy
Kobi Nissim: “TBD”
John M. Abowd and Ian M. Schmutte: “The Advantages and Disadvantages of Statistical Disclosure Limitation for Program Evaluation“
Lars Vilhuber and John M. Abowd: “Usage and Outcomes of the Synthetic Data Server“
20160506T08000020160506T093000+47.613684;-122.338241The Westin Seattle @ 1900 5th Ave, Seattle, WA 98101, USA0Society of Labor Economists Annual Meetings: A6: Issues in Data Privacyfreeai1ec-4499@www.vilhuber.com/lars20171214T020841Zhttp://www.ncrn.info/event/ncrn-spring-2016-meetingOpening Remarks on behalf of the NCRN Coordinating Office at the NCRN Spring 2016 meetings – Lars Vilhuber
Tickets: https://www.eventbrite.com/e/ncrn-meeting-spring-2016-public-events-tickets-22247855936?ref=ecount.
Tickets: https://www.eventbrite.com/e/ncrn-meeting-spring-2016-public-events-tickets-22247855936?ref=ecount.20160509T10000020160509T101500+38.847071;-76.929454U.S. Census Bureau @ 4600 Silver Hill Rd, Suitland, MD 20746, USA0Opening remarks at NCRN Spring 2016 Meetingexternalhttps://www.eventbrite.com/e/ncrn-meeting-spring-2016-public-events-tickets-22247855936?ref=ecountai1ec-3008@www.ncrn.cornell.edu20171214T020841ZConferences,NCRN Meetings,Presentation,vilhuberhttp://www.ncrn.info/event/ncrn-spring-2016-meetingOpening Remarks on behalf of the NCRN Coordinating Office at the NCRN Spring 2016 meetings – Lars Vilhuber
Tickets: https://www.eventbrite.com/e/ncrn-meeting-spring-2016-public-events-tickets-22247855936?ref=ecount.
Tickets: https://www.eventbrite.com/e/ncrn-meeting-spring-2016-public-events-tickets-22247855936?ref=ecount.20160509T10000020160509T101500+38.847071;-76.929454U.S. Census Bureau @ 4600 Silver Hill Rd, Suitland, MD 20746, USA0Vilhuber gives Opening remarks at NCRN Spring 2016 MeetingexternalLars Vilhuber,NCRNhttps://www.eventbrite.com/e/ncrn-meeting-spring-2016-public-events-tickets-22247855936?ref=ecountai1ec-4500@www.vilhuber.com/lars20171214T020841Zhttp://www.ncrn.info/event/ncrn-spring-2016-meetingBenjamin Perry, Venkata Kambhampaty, Kyle Brumsted, Lars Vilhuber, & William C. Block: “Crowdsourcing Codebook Development and Enhancements in CED²AR”
Abstract: Recent years have shown the power of user­sourced information evidenced by the success of Wikipedia and its many emulators. This sort of unstructured discussion is currently not feasible as a part of the otherwise successful metadata repositories. Creating and augmenting metadata is a labor­intensive endeavor. Harnessing collective knowledge from actual data users can supplement officially generated metadata. As part of our Comprehensive Extensible Data Documentation and Access Repository (CED²AR) infrastructure, we demonstrate a prototype of crowdsourced DDI on actual codebooks. While the system itself is more general, the demonstrated implementation relies on a set of linked deployments of the basic software on web servers. The backend transparently handles changes, and frontend has the ability to separate official edits (by designated curators of the data and the metadata) from crowd­sourced content. The implementation allows a data curator, such as a statistical agency, to collect and incorporate improvements suggested by knowledgeable users in a structured way.
Tickets: https://www.eventbrite.com/e/ncrn-meeting-spring-2016-public-events-tickets-22247855936?ref=ecount.
Tickets: https://www.eventbrite.com/e/ncrn-meeting-spring-2016-public-events-tickets-22247855936?ref=ecount.20160509T11300020160509T120000+38.847071;-76.929454U.S. Census Bureau @ 4600 Silver Hill Rd, Suitland, MD 20746, USA0Crowdsourcing Codebook Development and Enhancements in CED²ARexternalhttps://www.eventbrite.com/e/ncrn-meeting-spring-2016-public-events-tickets-22247855936?ref=ecountai1ec-146@www.vrdc.cornell.edu/info747x20171214T020841ZINFO7470
Lecture notes
INFO7470-S14-Synthetic Data
INFO7470 S14 SDS
Links
RFA for training on SSB
Codebooks for
SSB
SynLBD
20160509T13250020160509T1610000Session 14: Synthetic Datafreeai1ec-3009@www.ncrn.cornell.edu20171214T020841ZConferences,NCRN Meetings,Presentation,vilhuberhttp://www.ncrn.info/event/ncrn-spring-2016-meetingJohn M. Abowd and Ian M. Schmutte : “The Advantages And Disadvantages Of Statistical Disclosure Limitation For Program Evaluation”
Abstract: This paper formalizes the manner in which statistical disclosure limitation (SDL) hinders empirical research in economics. We also highlight a hitherto unappreciated advantage of SDL, formal privacy models, and synthetic data systems: they can serve as a defense against model overfitting and false­discovery bias. More specifically, a synthetic data validation system can – and we argue should – be used in conjunction with systems in which researchers register their research design ahead of analysis. The key insight is that privacy­protected data can be used for model development while minimizing risk of model overfitting. To demonstrate these points, we develop a model in which the statistical agency collects data from a population, but publishes a version in which the data that have been intentionally distorted by some SDL process. We say the SDL process is ignorable if inferences based on the published data are indistinguishable from inferences based on the unprotected data. SDL is rarely ignorable. If the researcher has knowledge of the SDL model, she can conduct an SDL­aware analysis that explicitly corrects for the effects of SDL. If, as is often the case, if the SDL model is unknown, we describe circumstances under which SDL can still be learned.
[Presentation]
Tickets: https://www.eventbrite.com/e/ncrn-meeting-spring-2016-public-events-tickets-22247855936?ref=ecount.
Tickets: https://www.eventbrite.com/e/ncrn-meeting-spring-2016-public-events-tickets-22247855936?ref=ecount.20160510T10150020160510T104500+38.847071;-76.929454U.S. Census Bureau @ 4600 Silver Hill Rd, Suitland, MD 20746, USA0Schmutte presents on The Advantages and Disadvantages of Statistical Disclosure Limitation for Program EvaluationexternalIan Schmutte,John Abowd,NCRNhttps://www.eventbrite.com/e/ncrn-meeting-spring-2016-public-events-tickets-22247855936?ref=ecountai1ec-3010@www.ncrn.cornell.edu20171214T020841ZConferences,NCRN Meetings,Presentation,vilhuberhttp://www.ncrn.info/event/ncrn-spring-2016-meetingBenjamin Perry, Venkata Kambhampaty, Kyle Brumsted, Lars Vilhuber, & William C. Block: “Crowdsourcing Codebook Development and Enhancements in CED²AR”
Abstract: Recent years have shown the power of user­sourced information evidenced by the success of Wikipedia and its many emulators. This sort of unstructured discussion is currently not feasible as a part of the otherwise successful metadata repositories. Creating and augmenting metadata is a labor­intensive endeavor. Harnessing collective knowledge from actual data users can supplement officially generated metadata. As part of our Comprehensive Extensible Data Documentation and Access Repository (CED²AR) infrastructure, we demonstrate a prototype of crowdsourced DDI on actual codebooks. While the system itself is more general, the demonstrated implementation relies on a set of linked deployments of the basic software on web servers. The backend transparently handles changes, and frontend has the ability to separate official edits (by designated curators of the data and the metadata) from crowd­sourced content. The implementation allows a data curator, such as a statistical agency, to collect and incorporate improvements suggested by knowledgeable users in a structured way.
Available: https://ecommons.cornell.edu/handle/1813/43887
Tickets: https://www.eventbrite.com/e/ncrn-meeting-spring-2016-public-events-tickets-22247855936?ref=ecount.
Tickets: https://www.eventbrite.com/e/ncrn-meeting-spring-2016-public-events-tickets-22247855936?ref=ecount.20160510T11300020160510T120000+38.847071;-76.929454U.S. Census Bureau @ 4600 Silver Hill Rd, Suitland, MD 20746, USA0Vilhuber presents on Crowdsourcing Codebook Development and Enhancements in CED²ARexternalCED2AR,DDI,Lars Vilhuber,NCRN,William Blockhttps://www.eventbrite.com/e/ncrn-meeting-spring-2016-public-events-tickets-22247855936?ref=ecountai1ec-4501@www.vilhuber.com/lars20171214T020841ZConferences,Presentationhttp://economistesquebecois.ca/files/documents/1v/e3/v-29-mars-b.pdfLars Vilhuber will present at the Association of Québecois Economists (Association des économistes québecois) in a session on “Les données massives (big data), un outil supplémentaire pour une décision publique éclairée” (Big data, a supplementary tool for informed public decision making)
20160518T08300020160518T110000+46.81005;-71.215879Centre de Congrès de Québec @ 1000 Boulevard René-Lévesque E, Ville de Québec, QC G1A 1B4, Canada0Vilhuber on “Big data, a supplementary tool for informed public decision making”freeai1ec-3003@www.ncrn.cornell.edu20171214T020841ZConferences,Presentation,vilhuberhttp://economistesquebecois.ca/files/documents/1v/e3/v-29-mars-b.pdfLars Vilhuber will present at the Association of Québecois Economists (Association des économistes québecois) in a session on “Les données massives (big data), un outil supplémentaire pour une décision publique éclairée” (Big data, a supplementary tool for informed public decision making)
20160518T08300020160518T110000+46.81005;-71.215879Centre de Congrès de Québec @ 1000 Boulevard René-Lévesque E, Ville de Québec, QC G1A 1B4, Canada0Vilhuber on “Big data, a supplementary tool for informed public decision making”freeNCRNai1ec-3058@www.ncrn.cornell.edu20171214T020841ZConferences,Presentation,vilhuberhttps://www.amstat.org/meetings/jsm/2016/onlineprogram/ActivityDetails.cfm?SessionID=212828Lars Vilhuber chairs session at JSM which includes multiple papers with NCRN contribution (presenter bolded, NCRN participants in red italics):
2:05 PM
Robustness of Employer List Linking to Methodological Variation — Mark J. Kutzbach, U.S. Census Bureau ; Graton Gathright, U.S. Census Bureau ; Andrew Green, U.S. Census Bureau/Cornell University ; Kristin McCue, U.S. Census Bureau ; Holly Monti, U.S. Census Bureau ; Ann Rodgers, University of Michigan ; Lars Vilhuber, Cornell University ; Nada Wasi, University of Michigan ; Christopher Wignall, Amazon.com
2:25 PM
Two Perspectives on Commuting and Workplace: A Microdata Comparison of Home-to-Work Flows Across Linked Survey and Administrative Files— Andrew Green, Cornell University/U.S. Census Bureau ; Mark J. Kutzbach, U.S. Census Bureau ; Lars Vilhuber, Cornell University
2:45 PM
Developing Job Linkages for the Health and Retirement Study — Kristin McCue, U.S. Census Bureau ; John M. Abowd, U.S. Census Bureau/Cornell University ; Margaret Levenstein, University of Michigan ; Matthew Shapiro, University of Michigan ; Ann Rodgers, University of Michigan ; Nada Wasi, University of Michigan ; Dhiren Patki, University of Michigan
20160731T14000020160731T153000McCormick Conference Center @ CC-W181b0JSM Session: Employer List Linking: Methods, Implementation, and Usage of Probabilistic Matches for Enhancing Workforce StatisticsfreeDEV10,John Abowd,JSM,Lars Vilhuber,Michigan,NCRNai1ec-3059@www.ncrn.cornell.edu20171214T020841ZConferences,Presentation,vilhuberhttps://www.amstat.org/meetings/jsm/2016/onlineprogram/ActivityDetails.cfm?SessionID=212311John M. Abowd (former NCRN-Cornell PI) and other NCRN PIs present:
2:05 PM
An Integrated Approach to Providing Access to Confidential Social Science Data — Jerome Reiter, Duke University
2:30 PM
The Challenge of Reproducible Science and Privacy Protection for Statistical Agencies — John M. Abowd, U.S. Census Bureau/Cornell University
2:55 PM
Spatio-Temporal Change of Support with Application to American Community Survey Multi-Year Period Estimates — Scott H. Holan, University of Missouri ; Jonathan R. Bradley, University of Missouri ; Christopher Wikle, University of Missouri
20160801T14000020160801T155000McCormick Conference Center @ CC-W185bc0JSM Session: Advances in Statistical Methods for Dissemination and Analysis of Official StatisticsfreeJSM,NCRNai1ec-3060@www.ncrn.cornell.edu20171214T020841ZConferences,Presentation,vilhuberhttps://www.amstat.org/meetings/jsm/2016/onlineprogram/ActivityDetails.cfm?SessionID=212561Lars Vilhuber chairs a session organized by the Committee on Privacy and Confidentiality and Aleksandra Slavkovic, Penn State University:
2:05 PM
Connections Between Privacy Definitions and Arbitrage-Free Pricing Functions — Daniel Kifer, Penn State University
2:25 PM
Differentially Private Statistical Inference and Hypothesis Testing — Vishesh Karwa, Carnegie Mellon University
2:45 PM
Learning with Differential Privacy: Stability, Learnability, and the Sufficiency and Necessity of ERM Principle — Yu-Xiang Wang, Carnegie Mellon University ; Jing Lei, Carnegie Mellon University ; Stephen E. Fienberg, Carnegie Mellon University
3:05 PM
Performance Bounds for Graphical Record Linkage: Can record linkage bounds provide guidance for private synthetic data release? — Rebecca Steorts, Duke University ; Matt Barnes, Carnegie Mellon University ; Willie Neisweigner, Carnegie Mellon University
3:25 PM
Discussant: Adam Smith, Penn State University
20160801T14000020160801T155000+32.320532;-90.180583McCormick Conference Center0JSM Session: Statistical Foundations of Data PrivacyfreeJSM,NCRN,Privacyai1ec-3067@www.ncrn.cornell.edu20171214T020841ZConferences,Presentation,vilhuberJohn M. Abowd, NCRN Cornell and now Associate Director for Research and Methodology and Chief Scientist at the U.S. Census Bureau is the2016 Recipient of Julius Shiskin Memorial Award for Economic Statistics. He will speak on September 6, 2016 at the WSS JULIUS SHISKIN MEMORIAL AWARD SEMINAR on “How Will Statistical Agencies Operate When All Data are Private?”
Time: 1 – 3 p.m.
Location: Auditorium, U.S. Census Bureau, 4600 Silver Hill Road, Suitland, Maryland, available through Webex.
Abstract: The dual problems of respecting citizen privacy and protecting the confidentiality of their data—Ken Prewitt’s famous “don’t ask/don’t tell” dictum—have become hopelessly conflated in the “Big Data” era. There are orders of magnitude more data outside an agency’s firewall than inside it—compromising the integrity of traditional statistical disclosure limitation methods. And increasingly the information processed by the agency was “asked” in a context wholly outside the agency’s operations—blurring the distinction between what was asked and what is published. Already private businesses like Microsoft, Google and Apple recognize that cybersecurity (safeguarding the integrity and access controls for internal data) and privacy protection (ensuring that what is published does not reveal too much about any person or business) are two sides of the same coin. This is a paradigm-shifting moment for statistical agencies. This talk will examine how statistical agencies can respond in manner consistent with their missions.
20160906T13000020160906T150000+38.847071;-76.929454U.S. Census Bureau Auditorium @ 4600 Silver Hill Rd, Suitland, MD 20746, USA0John M. Abowd, WSS JULIUS SHISKIN MEMORIAL AWARD SEMINAR, “How Will Statistical Agencies Operate When All Data are Private?”freeNCRNai1ec-3225@www.ncrn.cornell.edu20171214T020841ZConferences,Presentation,vilhuberhttp://cirano.qc.ca/en/events/688Lars Vilhuber speaks about “Disclosure Limitation and Confidentiality Protection in Linked Data” at the Center for Interuniversity Research and Analysis of Organizations‘s conference on “Facilitate the access to Quebec data: How and to what ends?” The conference is jointly organized with the Quebec inter-University Centre for Social Statistics (QICSS). The presentation relies on joint work with John M. Abowd and Ian M. Schmutte.
[Presentation]
20161130T08300020161130T140000+45.501102;-73.576669Centre interuniversitaire de recherche en analyse des organisations @ 1130 Rue Sherbrooke O #1400, Montréal, QC H3A, Canada0Lars Vilhuber: “Disclosure Limitation and Confidentiality Protection in Linked Data”freeConfidentiality,Ian Schmutte,John Abowd,Lars Vilhuber,NCRN,Privacy,Sloanai1ec-3285@www.ncrn.cornell.edu20171214T020841ZConferences,Presentation,vilhuberhttp://cirano.qc.ca/en/events/688Lars Vilhuber speaks about “Disclosure Limitation and Confidentiality Protection in Linked Data” at the Center for Interuniversity Research and Analysis of Organizations‘s conference on “Facilitate the access to Quebec data: How and to what ends?” The conference is jointly organized with the Quebec inter-University Centre for Social Statistics (QICSS). The presentation relies on joint work with John M. Abowd and Ian M. Schmutte.
[Presentation]
20161130T08300020161130T140000+45.501102;-73.576669Centre interuniversitaire de recherche en analyse des organisations @ 1130 Rue Sherbrooke Ouest #1400, Montréal, QC H3A, Canada0Lars Vilhuber: “Disclosure Limitation and Confidentiality Protection in Linked Data”freeconfidentiality protection,Lars Vilhuber,NCRN,Sloan,statistical disclosure limitation,Synthetic Data Server,TC-Largeai1ec-377@www.vrdc.cornell.edu/info747x20171214T020841ZINFO7470Tentative: We will introduce the teaching environment, and present the class itself. An overview of the U.S. statistical system is given.
20170130T13250020170130T1610000Session 1: Course Introduction and Overview of the U.S. Statistical Systemfreeai1ec-3263@www.ncrn.cornell.edu20171214T020841ZConferences,Presentation,vilhuberUS Census Bureau, CENTER FOR ECONOMIC STUDIES and CENTER FOR DISCLOSURE AVOIDANCE RESEARCH“Confidentiality Protection and Physical Safeguards: A Review”
Lars Vilhuber, PhD
Abstract:
Confidentiality protection is a multi-layered concept, involving statistical (cryptographic) methods and physical safeguards. When providing access to researchers (both internal to the agency and external academic), a tension arises between the level of trust vis-à-vis the researcher, the statistical disclosure limitation applied to the data visible to the researcher; and the physical access mechanisms used by the researcher. This presentation will review systems used by national and private research organizations around the world, putting them into the relevant legal and societal context.
20170209T10300020170209T120000+38.847071;-76.929454U.S. Census Bureau @ 4600 Silver Hill Rd, Suitland, MD 20746, USA0Vilhuber presents “Confidentiality Protection and Physical Safeguards: A Review”freeConfidentiality,confidentiality protection,NCRN,Sloanai1ec-3269@www.ncrn.cornell.edu20171214T020841ZConferences,Presentation,vilhuberhttps://cep.gov/meetings/2017-02-24.htmlTestimony to the U.S. Commission on Evidence-based Policymaking (including video)
20170224T10000020170224T120000+38.89295;-77.04776The National Academy of Sciences (NAS) Building, Lecture Room @ 2101 Constitution Ave NW, Washington, DC 20418, USA0Vilhuber provides Testimony to the Commission on Evidence-based PolicymakingfreeCEP,Commission on Evidence-based Policy,NCRN,Sloanai1ec-3278@www.ncrn.cornell.edu20171214T020841ZConferences,Presentation,vilhuberhttps://docs.google.com/document/d/1on41QIJwt4yBebNBGG7XoYOHLEkAmlQNRCUSwElfc9I/edit?usp=sharingIn this seminar, we discuss with interested parties the conditions necessary to implement the SynLBD approach, with the goal of providing other statistical agencies a straightforward toolkit to implement the same procedure on their own data. Our hope is that by implementing similar procedures on comparable business microdata, new research both within and across countries can be enabled. The ideal end result is a series of country-specific datasets on establishments and/or firms available within the same computing environment. We discuss the data and software requirements for the lowest-cost approach, the disclosure protection statistics already implemented that can be used to achieve release of the data in this way, the validation procedures that an agency should agree to, and the likely cost of maintaining such procedures. The seminar brings together academics working on cutting-edge methods for the protection of privacy in statistical databases, and researchers and implementers at statistical agencies that have started or are interested in starting a similar project.
Five sessions will touch on the full lifecycle of a SynLBD development and implementation, and will follow the same pattern. We will first discuss existing implementations and experiences, and will then as a group discuss issues as they pertain to the broader community. Emphasis should be on discussing open issues, specific solutions to specific problems. Proceedings will be published later.
For more details, please see the full agenda.
Proceedings
Vilhuber, Lars; Kinney, Saki; Schmutte, Ian M., 2017. “Proceedings from the Synthetic LBD International Seminar”, Labor Dynamics Institute Document 44, available at http://digitalcommons.ilr.cornell.edu/ldi/44/ or http://hdl.handle.net/1813/52472
Documents
Overview of the SynLBD methodology
Link to presentation. Contains excerpts from
S. Kinney, “Presentation: Synthetic Data Generation for Firm Links,” NSF Census Research Network – NCRN-Cornell, 1813:50054, 2016. [Abstract] [URL] [Bibtex]
In most countries, national statistical agencies do not release establishment-level business microdata, because doing so represents too large a risk to establishments’ confidentiality. Agencies potentially can manage these risks by releasing synthetic microdata, i.e., individual establishment records simulated from statistical models designed to mimic the joint distribution of the underlying observed data. Previously, we used this approach to generate a public-use version—now available for public use—of the U.S. Census Bureau’s Longitudinal Business Database (LBD), a longitudinal census of establishments dating back to 1976. While the synthetic LBD has proven to be a useful product, we now seek to improve and expand it by using new synthesis models and adding features. This paper describes our efforts to create the second generation of the SynLBD, including synthesis procedures that we believe could be replicated in other contexts.
@TechReport{kinney-2016-ecommons,
title = {Presentation: Synthetic Data Generation for Firm Links},
author = {Kinney, Saki},
institution = {NSF Census Research Network – NCRN-Cornell },
year = {2016},
number = {1813:50054},
Abstract = {In most countries, national statistical agencies do not release establishment-level business microdata, because doing so represents too large a risk to establishments’ confidentiality. Agencies potentially can manage these risks by releasing synthetic microdata, i.e., individual establishment records simulated from statistical models designed to mimic the joint distribution of the underlying observed data. Previously, we used this approach to generate a public-use version—now available for public use—of the U.S. Census Bureau’s Longitudinal Business Database (LBD), a longitudinal census of establishments dating back to 1976. While the synthetic LBD has proven to be a useful product, we now seek to improve and expand it by using new synthesis models and adding features. This paper describes our efforts to create the second generation of the SynLBD, including synthesis procedures that we believe could be replicated in other contexts.},
keywords = {confidentiality; US Longitudinal Business Database; synthetic data},
owner = {vilhuber},
URL = {http://hdl.handle.net/1813/50054}
}
Inputs to the SynLBD process
Link to presentation. Based on Drechsler and Vilhuber (2014).
Confidentiality of the SynLBD
Link to presentation. Contains excerpts from
S. Kinney, “Presentation: Synthetic Data Generation for Firm Links,” NSF Census Research Network – NCRN-Cornell, 1813:50054, 2016. [Abstract] [URL] [Bibtex]
In most countries, national statistical agencies do not release establishment-level business microdata, because doing so represents too large a risk to establishments’ confidentiality. Agencies potentially can manage these risks by releasing synthetic microdata, i.e., individual establishment records simulated from statistical models designed to mimic the joint distribution of the underlying observed data. Previously, we used this approach to generate a public-use version—now available for public use—of the U.S. Census Bureau’s Longitudinal Business Database (LBD), a longitudinal census of establishments dating back to 1976. While the synthetic LBD has proven to be a useful product, we now seek to improve and expand it by using new synthesis models and adding features. This paper describes our efforts to create the second generation of the SynLBD, including synthesis procedures that we believe could be replicated in other contexts.
@TechReport{kinney-2016-ecommons,
title = {Presentation: Synthetic Data Generation for Firm Links},
author = {Kinney, Saki},
institution = {NSF Census Research Network – NCRN-Cornell },
year = {2016},
number = {1813:50054},
Abstract = {In most countries, national statistical agencies do not release establishment-level business microdata, because doing so represents too large a risk to establishments’ confidentiality. Agencies potentially can manage these risks by releasing synthetic microdata, i.e., individual establishment records simulated from statistical models designed to mimic the joint distribution of the underlying observed data. Previously, we used this approach to generate a public-use version—now available for public use—of the U.S. Census Bureau’s Longitudinal Business Database (LBD), a longitudinal census of establishments dating back to 1976. While the synthetic LBD has proven to be a useful product, we now seek to improve and expand it by using new synthesis models and adding features. This paper describes our efforts to create the second generation of the SynLBD, including synthesis procedures that we believe could be replicated in other contexts.},
keywords = {confidentiality; US Longitudinal Business Database; synthetic data},
owner = {vilhuber},
URL = {http://hdl.handle.net/1813/50054}
}
Validation Servers
Link to presentation. Contains excerpts from
L. Vilhuber and J. M. Abowd, “Presentation: SOLE 2016: Usage and outcomes of the Synthetic Data Server,” NSF Census Research Network – NCRN-Cornell, 1813:43883, 2016. [Abstract] [URL] [Bibtex]
The Synthetic Data Server (SDS) at Cornell University was set up to provide early access to new synthetic data products by the U.S. Census Bureau. These datasets are made available to interested researchers in a controlled environment, prior to a more generalized release. Over the past 5 years, 4 synthetic datasets were made available on the server, and over 100 users have accessed the server over that time period. This paper reports on interim outcomes of the activity: results of validation requests from a user perspective, functioning of the feedback loop due to validation and user input, and the role of the SDS as an access gateway to and educational tool for other mechanisms of accessing detailed person, household, establishment, and firm statistics.
@TechReport{Vilhuber2016-cy,
title = ‘Presentation: {SOLE} 2016: Usage and outcomes of the Synthetic
Data Server’,
author = ‘Vilhuber, Lars and Abowd, John M’,
abstract = ‘The Synthetic Data Server (SDS) at Cornell University was set
up to provide early access to new synthetic data products by
the U.S. Census Bureau. These datasets are made available to
interested researchers in a controlled environment, prior to a
more generalized release. Over the past 5 years, 4 synthetic
datasets were made available on the server, and over 100 users
have accessed the server over that time period. This paper
reports on interim outcomes of the activity: results of
validation requests from a user perspective, functioning of the
feedback loop due to validation and user input, and the role of
the SDS as an access gateway to and educational tool for other
mechanisms of accessing detailed person, household,
establishment, and firm statistics.’,
conference = ‘SOLE 2016’,
institution = {NSF Census Research Network – NCRN-Cornell },
year = {2016},
number = {1813:43883},
URL = {http://hdl.handle.net/1813/43883}
}
Other recommended readings
L. Vilhuber, J. M. Abowd, and J. P. Reiter, “Synthetic establishment microdata around the world,” Statistical Journal of the International Association for Official Statistics, vol. 32, iss. 1, pp. 65-68, 2016. [Abstract] [DOI] [Bibtex]
In contrast to the many public-use microdata samples available for individual and household data from many statistical agencies around the world, there are virtually no establishment or firm microdata available. In large part, this difficulty in providing access to business micro data is due to the skewed and sparse distributions that characterize business data. Synthetic data are simulated data generated from statistical models. We organized sessions at the 2015 World Statistical Congress and the 2015 Joint Statistical Meetings, highlighting work on synthetic establishment microdata. This overview situates those papers, published in this issue, within the broader literature.
@article{VilhuberAbowdReiter:Synthetic:SJIAOS:2016,
title = {Synthetic establishment microdata around the world},
journal = {Statistical Journal of the International Association for Official Statistics},
author = {Lars Vilhuber and John M. Abowd and Jerome P. Reiter},
year=2016,
volume={32},
number={1},
pages={65-68},
doi={10.3233/SJI-160964},
abstract={In contrast to the many public-use microdata samples available for individual and household data from many statistical agencies around the world, there are virtually no establishment or firm microdata available. In large part, this difficulty in providing access to business micro data is due to the skewed and sparse distributions that characterize business data. Synthetic data are simulated data generated from statistical models. We organized sessions at the 2015 World Statistical Congress and the 2015 Joint Statistical Meetings, highlighting work on synthetic establishment microdata. This overview situates those papers, published in this issue, within the broader literature.},
}
S. K. Kinney, J. P. Reiter, A. P. Reznek, J. Miranda, R. S. Jarmin, and J. M. Abowd, “Towards Unrestricted Public Use Business Microdata: The Synthetic Longitudinal Business Database,” International Statistical Review, vol. 79, iss. 3, pp. 362-384, 2011. [Abstract] [DOI] [URL] [Bibtex]
In most countries, national statistical agencies do not release establishment-level business microdata, because doing so represents too large a risk to establishments’ confidentiality. One approach with the potential for overcoming these risks is to release synthetic data; that is, the released establishment data are simulated from statistical models designed to mimic the distributions of the underlying real microdata. In this article, we describe an application of this strategy to create a public use file for the Longitudinal Business Database, an annual economic census of establishments in the United States comprising more than 20 million records dating back to 1976. The U.S. Bureau of the Census and the Internal Revenue Service recently approved the release of these synthetic microdata for public use, making the synthetic Longitudinal Business Database the first-ever business microdata set publicly released in the United States. We describe how we created the synthetic data, evaluated analytical validity, and assessed disclosure risk.
@ARTICLE{Kinney2011-ic,
title = ‘Towards Unrestricted Public Use Business Microdata: The
Synthetic Longitudinal Business Database’,
author = ‘Kinney, Satkartar K and Reiter, Jerome P and Reznek, Arnold P
and Miranda, Javier and Jarmin, Ron S and Abowd, John M’,
journal = {International Statistical Review},
year = {2011},
volume = {79},
pages = {362–384},
number = {3},
doi = {10.1111/j.1751-5823.2011.00153.x},
issn = {1751-5823},
keywords = {Economic census, data confidentiality, synthetic data, disclosure
limitation},
owner = {vilhuber},
publisher = {Blackwell Publishing Ltd},
timestamp = {2012.09.04},
abstract = {In most countries, national statistical agencies do not release establishment-level
business microdata, because doing so represents too large a risk
to establishments’ confidentiality. One approach with the potential
for overcoming these risks is to release synthetic data; that is,
the released establishment data are simulated from statistical models
designed to mimic the distributions of the underlying real microdata.
In this article, we describe an application of this strategy to create
a public use file for the Longitudinal Business Database, an annual
economic census of establishments in the United States comprising
more than 20 million records dating back to 1976. The U.S. Bureau
of the Census and the Internal Revenue Service recently approved
the release of these synthetic microdata for public use, making the
synthetic Longitudinal Business Database the first-ever business
microdata set publicly released in the United States. We describe
how we created the synthetic data, evaluated analytical validity,
and assessed disclosure risk.},
url = {http://dx.doi.org/10.1111/j.1751-5823.2011.00153.x}
}
J. Drechsler and L. Vilhuber, “A First Step Towards A German SynLBD: Constructing A German Longitudinal Business Database,” Statistical Journal of the IAOS: Journal of the International Association for Official Statistics, vol. 30, 2014. [Abstract] [DOI] [URL] [Bibtex]
One major criticism against the use of synthetic data has been that the efforts necessary to generate useful synthetic data are so in- tense that many statistical agencies cannot afford them. We argue many lessons in this evolving field have been learned in the early years of synthetic data generation, and can be used in the development of new synthetic data products, considerably reducing the required in- vestments. The final goal of the project described in this paper will be to evaluate whether synthetic data algorithms developed in the U.S. to generate a synthetic version of the Longitudinal Business Database (LBD) can easily be transferred to generate a similar data product for other countries. We construct a German data product with infor- mation comparable to the LBD – the German Longitudinal Business Database (GLBD) – that is generated from different administrative sources at the Institute for Employment Research, Germany. In a fu- ture step, the algorithms developed for the synthesis of the LBD will be applied to the GLBD. Extensive evaluations will illustrate whether the algorithms provide useful synthetic data without further adjustment. The ultimate goal of the project is to provide access to multiple synthetic datasets similar to the SynLBD at Cornell to enable comparative studies between countries. The Synthetic GLBD is a first step towards that goal.
@Article{SJIAOS-2014b,
Title = {{A First Step Towards A {German} {SynLBD}: {C}onstructing A {G}erman {L}ongitudinal {B}usiness {D}atabase}},
Author = {J{‘o}rg Drechsler and Lars Vilhuber},
Journal = {Statistical Journal of the IAOS: Journal of the International Association for Official Statistics},
Year = {2014},
Volume = {30},
Abstract = {One major criticism against the use of synthetic data has been that the efforts necessary to generate useful synthetic data are so in- tense that many statistical agencies cannot afford them. We argue many lessons in this evolving field have been learned in the early years of synthetic data generation, and can be used in the development of new synthetic data products, considerably reducing the required in- vestments. The final goal of the project described in this paper will be to evaluate whether synthetic data algorithms developed in the U.S. to generate a synthetic version of the Longitudinal Business Database (LBD) can easily be transferred to generate a similar data product for other countries. We construct a German data product with infor- mation comparable to the LBD – the German Longitudinal Business Database (GLBD) – that is generated from different administrative sources at the Institute for Employment Research, Germany. In a fu- ture step, the algorithms developed for the synthesis of the LBD will be applied to the GLBD. Extensive evaluations will illustrate whether the algorithms provide useful synthetic data without further adjustment. The ultimate goal of the project is to provide access to multiple synthetic datasets similar to the SynLBD at Cornell to enable comparative studies between countries. The Synthetic GLBD is a first step towards that goal.},
DOI = {10.3233/SJI-140812},
Keywords = {confidentiality; comparative studies; US Longitudinal Business Database; synthetic data},
Owner = {vilhuber},
Timestamp = {2014.03.24},
URL = {http://iospress.metapress.com/content/X415V18331Q33150}
}
Funding
Funding for the workshop is provided by the National Science Foundation (CNS-1012593, SES-1131848) and the Alfred P. Sloan Foundation. The organizers thank the National Academies’ Committee on National Statistics for hosting the seminar.
20170509T09000020170509T140000+38.896556;-77.019424National Academy of Sciences @ 500 5th St NW, Washington, DC 20001, USA0Synthetic Longitudinal Business Data International User SeminarfreeNCRN,Sloan,SynLBD,TC-Largeai1ec-3241@www.ncrn.cornell.edu20171214T020841ZConferences,Presentation,vilhuberhttp://www.isi2017.org/Together with a few others from around the world, Lars Vilhuber will be presenting on results from a synthetic data validation cycle at the International Statistical Institute’s World Statistical Congress.
20170718T10300020170718T123000+31.629472;-7.981084Palais des Congrès - Mansouri Eddahbi @ Marrakesh, Morocco0Synthetic Datasets for Statistical Disclosure Control – Research and Applications Around the WorldfreeNCRN,SynLBD,TC-Largeai1ec-503@www.vrdc.cornell.edu/info747x20171214T020841ZINFO7470
We will introduce the teaching environment (technical and organizationally), and present the class itself.
Lecture notes
INFO7470 2017 Course Introduction (PPTX, PDF)
INFO7470-S1-2016-Course Introduction
INFO7470-S1-2016-Technical points
INFO7470-S1-2016-Overview of the U.S. Statistical System
–>
20170824T16250020170824T1800000Session 0: Course Introductionfreeai1ec-413@www.vrdc.cornell.edu/info747x20171214T020841ZINFO7470
An overview of the U.S. statistical system is given.
Lecture notes
INFO7470-S1-2016-Overview of the U.S. Statistical System
20170831T16250020170831T1800000Session 1: Overview of the U.S. Statistical Systemfreeai1ec-420@www.vrdc.cornell.edu/info747x20171214T020841ZINFO7470
Margo Anderson (University of Wisconsin – Milwaukee) presents on the history of the federal statistical system (flipped classroom). She will be present to discuss the lecture.
Readings and other information
Anderson, Margo. The American Census: A Social History, Second Edition. Yale University Press, 2015.
Anderson, Margo J., and Seltzer, William. “Federal Statistical Confidentiality and Business Data: Twentieth Century Challenges and Continuing Issues’.” Journal of Privacy and Confidentiality 1.1 (2009): 7-52, 55-58.
Lecture Notes
“Historical Perspectives on the U.S. Federal Statistical System”
About the Guest Lecturer
Margo Anderson, University of Wisconsin – Milwaukee
Margo Anderson is Distinguished Professor of History & Urban Studies at the University of Wisconson – Milwaukee. She specializes in American social, urban and women’s history and has research interests in both urban history and the history of the social sciences and the development of statistical data systems, particularly the census. Her publications include Who Counts? The Politics of Census Taking in Contemporary America (2001), coauthored with Stephen E. Fienberg, and a coedited volume with Victor Greene, Perspectives on Milwaukee’s Past (University of Illinois Press, 2009). Her most recent publication, of particular relevance to this class, is The American Census: A Social History, Second Edition. Yale University Press, 2015. More information about Margo can be found at her University of Wisconsin-Milwaukee website and her personal website.
20170907T16250020170907T1800000Session 2: History of the Federal Statistical Infrastructurefreeai1ec-422@www.vrdc.cornell.edu/info747x20171214T020841ZINFO7470
This class coincides with FSRDC system’s annual conference. There will be no in-classroom activity at most sites on this day (please check with local coordinator). The content of this section will be discussed on Sept 21, 2017, so students should take the time to view the materials on edX during this week.
Lecture notes
INFO7470-S3 PopulationsFramesSamples
20170914T16250020170914T1630000Session 3: [No class] Universes, Populations, Frames, and Samplingfreeai1ec-424@www.vrdc.cornell.edu/info747x20171214T020841ZINFO7470
This lecture is a “flipped” lecture.
Discussion lead
Warren Brown, Cornell University
Warren A. Brown is Senior Research Associate at Cornell University where he directs the Program on Applied Demographics and is the Research Director of the Cornell site of the New York Federal Statistical Research Data Center, a consortium of research institutions in the New York metropolitan area and upstate New York. He is also the 2015-2016 President of the Association of Public Data Users (APDU) and serving on the National Academy of Science’s Standing Committee on Reengineering Census Operations. His teaching, research and outreach efforts involve him with the application of demographic information to areas such as strategic planning for workforce and economic development, consumer behavior and market analysis, households and housing market analysis, regional transportation planning, hospitality and recreation industries, health services for the elderly, and environmental protection. He is an expert on the American Community Survey.
Lecture Notes
INFO7470-S4-Household Surveys
Discussion Notes
Also see National Research Council. 2013. ‘Principles and Practices for a Federal Statistical Agency: Fifth Edition.‘ Washington, DC: The National Academies Press. doi: 10.17226/18318.
–>
20170921T16250020170921T1800000Session 4: Measuring People and Householdsfreeai1ec-428@www.vrdc.cornell.edu/info747x20171214T020841ZINFO7470
This lecture is a “flipped” lecture.
Lecture Notes
INFO7470-S5 Economic Statistics
Updates: INFO7470-S5 Updates
Lab
The lab will be posted on edX.
20170928T16250020170928T1800000Session 5: Measuring Business and Economic Activityfreeai1ec-432@www.vrdc.cornell.edu/info747x20171214T020841ZINFO7470
This lecture is a “flipped” lecture.
Flipped Classroom
The recorded lectures on edX will be available to registered participants starting Feb 29, 2016.–>
Lab
The lab will be posted on edX.
Lecture Notes
INFO7470-S6-JobStatistics – Part 1
INFO7470-S6-JobStatistics – Public-use QWI
INFO7470-S6-JobStatistics – LEHD sources
Updates: INFO7470-S6-JobStatistics – Updates
20171005T16250020171005T1800000Session 6: Measuring Jobsfreeai1ec-441@www.vrdc.cornell.edu/info747x20171214T020841ZINFO7470
Health statistics, energy statistics, agricultural statistics, others. Registered-based statistics, organic data.
Discussion leads
Erica Groshen, Cornell University, will take part in the discussion.
Erica L. Groshen is currently a visiting scholar at the ILR School of Cornell University. She is the former Commissioner of the U.S. Bureau of Labor Statistics (BLS), which is the principal federal statistical agency responsible for measuring U.S.labor market activity, working conditions and inflation. Her term ended on January 27, 2017. Previously, Groshen served as a vice president of the Federal Reserve Bank of New York. Dr. Groshen’s research focuses on jobless recoveries, regional labor markets, wage rigidity and dispersion, the male-female wage differential, service-sector employment, and the role of employers in labor market outcomes.
She has served as a member of the BLS Data Users’ Advisory Committee and the Census Bureau’s 2010 Census Advisory Committee and also as an American Economic Association representative to the Census Advisory Committee of Professional Associations. On behalf of the New York Fed, she initiated the effort to form the consortium of thirteen research institutions that created the New York Census Research Data Center at Baruch College in 2006. Groshen received a bachelor’s degree in economics and mathematics from the University of Wisconsin-Madison and a Ph.D. in economics from Harvard University.
Brent Hueth, University of Wisconsin-Madison, will be discussing topics related to agricultural statistics.
Brent Hueth is Director of the University of Wisconsin Center for Cooperatives, with an appointment as associate professor in the Department of Agricultural and Applied Economics. Brent has published in top economics journal including the American Journal of Agricultural Economics, the Journal of Regulatory Economics, the Journal of Economic Behavior and Organization, and the Journal of Economics and Management Strategy. Brent is a Research Fellow at the Institute for Exceptional Growth Companies, and Executive Director of the Census Bureau’s Research Data Center at the University of Wisconsin—Madison. Brent’s research and teaching focus on agricultural markets, cooperative enterprise, and economic development. (More info)
Lecture Notes
Health statistics (Lecture Notes: INFO7470-S7-Parker, Jennifer Parker (NCHS))
Agricultural statistics (Lecture Notes: INFO7470-S7-DunnHueth, additional materials, INFO7470-S7-Migrant Farm Labor in the Census of Agriculture, Richard Dunn (University of Connecticut) and Brent Hueth (University of Wisconsin-Madison))
EIA presentation: INFO7470-S9-EIA-Background-2016 (Jacob Bournazian (EIA))
Register-based statistics: INFO7470-S9-Register-data
Alternate data sources: INFO7470-S9-Organic-data
Updates by Erica Groshen on working with BLS data: INFO7470 2017 Groshen BLS
Updates on the above: INFO7470-S9-Updates
–>
20171012T16250020171012T1800000Session 7: Data from Other Statistical Agencies and Other Sourcesfree41rg443ri65i1dv8g4eqncckjt@google.com20171214T020841ZConferences,Presentation,vilhuberMichael Ratcliffe will be presenting “Maintaining an Accurate Address List: Reengineering Address Canvassing through the Use of Multiple Sources and Methods” and discussing topics related to the definitions, in the past, now, and in the future, of geography for census data collection purposes. This presentation is part of INFO7470 (https://www.vrdc.cornell.edu/info747x/) but all are welcome.
20171019T16250020171019T180000Ives 1090INFO7470: Michael Ratcliffe: Maintaining an Accurate Address List: Reengineering Address Canvassing through the Use of Multiple Sources and MethodsCensus@Cornell,NCRNai1ec-449@www.vrdc.cornell.edu/info747x20171214T020841ZINFO7470
This will be “flipped classroom” on Geographic Information Systems (GIS) – basic geocoding, geographic concepts, and other topics.
Discussion lead
Michael Ratcliffe, U.S. Census Bureau
Assistant Division Chief for Geocartographic Products and Criteria, Geography Division, U.S. Census Bureau.
Lecture Notes
Geography: INFO7470-S8-Census Geography Concepts
20171019T16250020171019T1800000Session 8: Census Geographyfree710vv7t2mqp0fq39alskkd8gf1@google.com20171214T020841ZConferences,Presentation,vilhuberBarbara Downs (U.S. Census Bureau) will be discussing how best to access data in the FSRDC system. This presentation is part of INFO7470 (https://www.vrdc.cornell.edu/info747x/) but all are welcome.
20171026T16250020171026T180000Ives 1090INFO7470: Restricted Access Data in the FSRDC systemCensus@Cornell,NCRNai1ec-527@www.vrdc.cornell.edu/info747x20171214T020841ZINFO7470
Flipped classrom about access to restricted access data. Students will be introduced to the research proposal mechanism of the Federal Statistical Research Data Center, including data from the Census Bureau, NCHS, and BLS.
Discussion will focus on how to access various restricted access data sets. Guest presenters may be present live in the videoconference classroom.
The presentation on replicable science is moved to next week a later date.
Lecture Notes
Restricted Access Data: INFO7470-S8-Proposals, Kristen Monaco on BLS proposal review, Matthias Umkehrer on IAB access
Replicable Science: INFO7470-S9-Replicable Science
–>
Updates and Flipped Class questions: INFO7470-S8-Updates and flipped class questions
–>
Additional links
IRS SOI Joint Statistical Research Program – with links to the 2014 Call for proposals (now closed)(local copy) and projects in 2012 and 2014
20171026T16250020171026T1800000Session 9: Restricted Access Datafree71avfhg2318ulqfrobab79rbg0@google.com20171214T020842ZConferences,Presentation,vilhuberMeasures and Content for Studying Family Living Arrangements and Child Well-Being from SIPP 2014
20171101T13150020171101T144500MVR G870CANCELLED: Jason Fields (U.S. Census Bureau)freeCensus@Cornell,NCRNai1ec-470@www.vrdc.cornell.edu/info747x20171214T020842ZINFO7470
The class is both flipped classroom and live presentation.
Presentation
We discuss the need for and the requirements of replicable science (in general, and in restricted-access environments). This part is a live lecture by Lars Vilhuber.
Introduction to record linking
What is record linking, what is it not, what is the theory?
Record linking: applications and examples – How do you do it, what do you need, what are the possible complications?
Examples of record linking
Lecture Notes
INFO7470-S10-Primer_for_Programs (PDF) or (Powerpoint)
Large-scale Data Linkage from Multiple Sources: Methodology and Research Challenges
Discussion lead
John M. Abowd, U.S. Census Bureau and Cornell University, will lead the discussion.
John M. Abowd is currently the Associate Director for Research and Methodology and Chief Scientist, United States Census Bureau, on leave from Cornell University. At Cornell, he is the Edmund Ezra Day Professor of Economics, Professor of Statistics and Information Science at Cornell University, and the Director of the Labor Dynamics Institute (LDI) at Cornell. He previously served as a Distinguished Senior Research Fellow at the United States Census Bureau (1998-2015). He is also a Research Associate at the National Bureau of Economic Research (NBER, Cambridge, MA), Research Affiliate at the Centre de Recherche en Economie et Statistique (CREST, Paris, France), Research Fellow at the Institute for Labor Economics (IZA, Bonn, Germany), and Research Fellow at IAB (Institut für Arbeitsmarkt-und Berufsforschung, Nürnberg, Germany). He is the outgoing President (2014-2015) and Fellow of the Society of Labor Economists, a past Chair (2013) of the Business and Economic Statistics Section and a Fellow of the American Statistical Association. He is an Elected Member of the International Statistical Institute and a Fellow of the Econometric Society. He previously served on the National Academies’ Committee on National Statistics (2010- 2016) and on the American Economic Association’s Committee on Economic Statistics. He served as Director of the Cornell Institute for Social and Economic Research (CISER) from 1999 to 2007.
20171102T16250020171102T1800000Session 10: Replication and Statistical Tools – Record Linkagefree0qq31173uvmqa53karqfabocho@google.com20171214T020842ZConferences,Presentation,vilhuberJoint LDI-CISER-CPC Seminar: Old Housing, New Needs: Are US Homes Ready for an Aging Population
20171108T13150020171108T144500MVR G870LDI-CISER-CPC Seminar: Jonathan Vespa (U.S. Census Bureau)Census@Cornell,NCRNai1ec-469@www.vrdc.cornell.edu/info747x20171214T020842ZINFO7470
Formal models of edits and imputations
Missing data overview
Missing records – Frame or census – Survey
Missing items
Overview of different products
Overview of methods
Formal multiple imputation methods
Lecture Notes
INFO7470 S11 -Statistical Tools Edit and Imputation (Powerpoint)
INFO7470 S11 -Statistical Tools Edit and Imputation Examples
Extra lecture
INFO7470-2017-S11-Replication-in-RDC (Powerpoint)
Lab
The lab (an edit and imputation exercise) is posted on the INFO7470x edX site. You will need to create a program, and upload the program (language of your choice) to edX. A toy example is illustrated in a video on the edX site, you can download the spreadsheet toy-example-imputation.xlsx here.
20171109T16250020171109T1800000Session 11: Statistical Tools – Edit and Imputationfreeai1ec-484@www.vrdc.cornell.edu/info747x20171214T020842ZINFO7470
Why must users of restricted-access data learn about confidentiality protection?
What is statistical disclosure limitation?
What are privacy-preserving data mining and differential privacy?
Basic methods for disclosure avoidance (SDL)
Rules and methods for model-based SDL
SDL-based noise methods
Synthetic data
Differential privacy methods
Lecture Notes
INFO7470 S12 -Updates
–>
INFO7470-S12-Statistical Disclosure Limitation
INFO7470-S14-Synthetic Data
INFO7470 S14 SDS
Supplementary Materials
For SDL lecture: Randomized Response.xlsx
–>
Codebooks for
SSB
SynLBD
20171116T16250020171116T1800000Session 12: Statistical Tools – Disclosure Limitation Methods – Synthetic Datafree41nop0mqdsa7u9m53aipqj654l@google.com20171214T020842ZConferences,Presentation,vilhuberPart Time Employment and Firm-level Labor Demand over the Business Cycle.
20171129T11450020171129T131500Ives 1150Joint LDI-CISER-Macroeconomics Seminar: Larry WarrenfreeCensus@Cornell,NCRNai1ec-491@www.vrdc.cornell.edu/info747x20171214T020842ZINFO7470
Flipped class
Part A: Spatial Analysis (Nicholas Nagle of University of Tennessee – Knoxville)
Part B: Network Analysis (John Abowd, Cornell University)
INFO7470 S13 -Updates
–>
Part A: Spatial Analysis
Topics
Basic Geocoding
Tools for Geocoding
Analysis Methods
Tools for Geographic Analysis
Lecture Notes
INFO7470 S13 – SpatialAnalysis – Nagle
About the Guest Lecturer
Nicholas Nagle, University of Tennessee – Knoxville
[avatar user=’nagle’ size=’thumbnail’ align=’left’ ]
[author_bio username=’nagle’ ]
Part B: Network Analysis
This part of the lecture is a live class.
Lecture Notes
INFO7470-S13-Statistical Tools-Hierarchical Models and Network Analysis
About the Guest Lecturer
John Abowd, Cornell University and now U.S. Census Bureau
[avatar user=’John Abowd’ size=’thumbnail’ align=’left’ ]
[author_bio username=’John Abowd’ ]
–>
20171130T16250020171130T1800000Session 13: Statistical Tools – Geographic and Network Analysis Methodsfree0o9avlnf69140odumi485gqoee@google.com20171214T020842ZConferences,Presentation,vilhuberCyclical Labor Market Sorting
20171206T13000020171206T143000Ives 1150Joint LDI-CISER-Macroeconomics Seminar: Henry HyattCensus@Cornell,NCRN7thsdd1813n8u8jgb0hpnligrv@google.com20171214T020842ZConferences,Presentation,vilhuberUpstream, Downstream: Diffusion and Impact of the Universal Product Code
20171213T13000020171213T143000Ives 1150Joint LDI-CISER-Industrial Organization Workshop: Emek BaskerCensus@Cornell,NCRN