Bad Data: A pandemic and privacy fears threaten the very purpose of the 2020 census
By Qian Cai, Research Director of Demographics Research Group, University of Virginia
For the Census Bureau, the timing of national shutdowns due to the pandemic could not have been much worse.
Stay-at-home orders in March coincided with the period when millions of Americans received their census questionnaires in the mail. But large numbers of Americans moved from where they normally live to somewhere else – in with relatives with spare rooms, back home from college or even released from prisons. These highly unusual circumstances are likely to result in failures to count, double-counting or counting in the wrong place portions of the population.
Disruption from the pandemic adds to existing worries around the accuracy of this year’s census data, including the introduction of a technique to protect residents’ privacy and a potentially low response rate stemming from distrust in the government. I am a demographer working with local governments, businesses and nonprofits, and this combination of factors makes me deeply concerned about how accurate census data will be when it’s released in 2021.
Communities rely on accurate data for a range of essential services, whether it’s determining the needs for hospital beds and vaccine doses, social programs for seniors or the unemployed, or evaluating wide-ranging health, economic and social impacts of the pandemic.
Good data in
People who work with statistics know that there needs to be “good data in” in order to get “good data out.” In the context of the census, good data in means “counting everyone once, only once, and in the right place.” The decennial census gathers data from every household in the nation to accomplish this enormous undertaking.
People are supposed to report where they were living on April 1. Yet many left their usual residences to move in with parents, adult children, other relatives or friends; some fled to second homes; nearly 20 million college students vacated dorms or apartments; tens of thousands of inmates were granted early release; and nursing homes experienced high death rates from COVID-19, leading to no responses from deceased people who should have been counted on April 1.
The pandemic led the U.S. Census Bureau to extend the deadline for gathering data from July to October. Prolonging the census-taking period may generate confusion about where and how people should be counted. This may introduce an increased number of recollection errors, diminishing data accuracy.
Further, Census Bureau field operations suspended in late March, and only recently resumed a gradual reengagement. In August, census takers will begin to knock on the doors of about one-third of the households nationwide that have not answered the census. But it may be harder to get complete and accurate information this year if people are reluctant to speak with census takers in person over health and safety concerns around the pandemic.
Finally, the Trump administration’s positions on immigration may further depress participation or distort results. Nearly 14% of the U.S. population are foreign-born, and more than 80% of the foreign-born are racial/ethnic minorities from Latin America, Asia and Africa, according to my calculations from the Census Bureau’s latest American Community Survey data.
The administration’s proposed citizenship question was eventually scrapped from the 2020 census, but in its place Trump signed an executive order to collect information about citizenship status through other means. Fear remains, not only among immigrants and their families, but also among naturalized as well as U.S.-born citizens with immigrant parents. This, in addition to the announcement of a plan to close U.S. borders in late April because of the pandemic, sent unsettling signals and may further diminish census participation.
In short, both pandemic and policy-related forces threaten the goal of getting good data in.
Good data out
“Good data out” means that the data collected by the census is carefully processed and truthfully reported. Census results are the benchmark for federal, state and local data and the gold standard for what we can know about the country’s residents.
The Census Bureau is obligated to prioritize both data accuracy and individual privacy protection. In order to achieve near-absolute privacy protection, the bureau is implementing a new data processing measure called “differential privacy,” which distorts community data including age, gender, race/ethnicity, relationship, family type, homeownership, household size and vacancy rate. By reporting numbers that are distorted, the technique is designed to make it harder to identify specific individuals, particularly by combining census data with other sources of information.
National and state totals will be reported accurately, which is critical for congressional apportionment. But the process of shuffling data to protect privacy at county, city and town levels as well as among different age or racial groups means the data will be incoherent or even erroneous.
Bad data will have bad consequences. For example, next year when health officials use the fresh census data to determine COVID-19 death rates among the African American population, they need to divide the total number of deaths of African Americans from COVID-19 in a given jurisdiction by the total African American population there. Because of differential privacy, the denominator with the local African American population from the census will not be accurate, and as a result, there could be wildly inconsistent or even implausible results.
Census Bureau officials have said that injecting “noise” into the data is needed to ensure privacy, and that the technique gives data scientists a good understanding of the level of uncertainty in the data. But other researchers have shown differential privacy to be ill-suited, harmful, untested and unproven.
Similar to an athletic team’s record bearing an asterisk marking a sullied season, the 2020 census will bear the unfortunate impact of the pandemic. Much is beyond the Census Bureau’s control, but this decennial census will also carry a second asterisk, due to Census Bureau decisions to trade data accuracy for privacy.
Originally published on The Conversation as Pandemic, privacy rules add to worries over 2020 census accuracy