WASHINGTON — The Census Bureau says it has improved its ability to give accurate data while protecting the privacy of its 2020 questionnaire responses, but experts worry they won’t be able to test the agency’s strategy before it is finalized.
The tweaks to the new method are critical to an accurate population count, one that will affect legislative mapmaking and the distribution of $1.5 trillion in federal funds.
The bureau made changes to an algorithm that adds “noise” to census data, a policy referred to as differential privacy, after researchers argued last year that a public test showed the policy made census data unusable. They found large errors, such as graveyards populated with living residents, and small ones, such as age distributions that skewed older or younger at small geographies.
The Census Bureau last month released a “report card” showing it had gotten more accurate while still preserving the privacy of census responses. But without another full public test and more technical details, researchers fear they won’t know whether the report card represents conscious policy tradeoffs or glitches in the system.
David Van Riper, spatial analysis director for the Minnesota Population Center at the University of Minnesota, said metrics show the Census Bureau has made progress on things such as the accuracy of age distributions but the results are off in other areas.
“It’s hard to deal with metrics without seeing what’s driving the changes,” Van Riper said. “Having the new metrics is a good step but is not sufficient for assessing how good this new algorithm is.”
The Census Bureau adopted its differential privacy policy after research showed existing methods, such as randomly swapping members of households, failed to do enough to protect the identity of individual participants. Privacy researchers at a conference last year also said they feared census responses could be cross-referenced with other datasets to identify individuals.
The differential privacy algorithm changes the data based on what’s called a privacy loss budget — the lower the budget, the noisier the data, and the higher the budget, the more accurate. Currently, only state-level population totals and a few other measures are kept constant, according to Census Bureau officials.
Since publishing the dataset last year, the Census Bureau said it has made several changes meant to address problems with last year’s test run using 2010 data. However, the agency doesn’t plan to produce another public trial run.
“Unfortunately, the tabulation, documentation and quality control processes required for public releases of data products are enormously time and labor intensive,” Michael Hawes, the agency’s senior adviser for data access and policy, said in a statement. “With the 2020 Census now underway, we are unable to support the release of another full demonstration product.”
Hawes said the agency may do an “alternative file release” to provide researchers more information before finalizing the rules. He said the agency intends in September to decide which population levels will not be altered by the algorithm, then finalize the privacy budget and other specifications by March 2021.
Without another release, researchers will have to trust the agency’s report of its progress on hammering out glitches and its balance between privacy and accuracy. That means they won’t know whether they can use the data for thousands of decisions, ranging from legislative mapmaking to the distribution of federal funds, until after census data has been released next year.
“This is one of the most important datasets, if not the most important data set for the nation,” said Alexis Santos, a Pennsylvania State University professor and demographer. “We need to do it in a way that the data are still usable so that we can use it to draw maps and study the population.”
Santos said last year’s public test dataset, applied on a batch of 2010 census results, proved to be too inaccurate. The public test data distorted race and ethnicity data in particular, potentially hiding disparities people use to study health impacts and policing by race.
“A lot more people are beginning to understand the structural differences that various people have faced for years — really since the founding of the country. And census data is the way we look at that,” Santos said.
Organizations such as the National Conference of State Legislatures have called for another test run, raising concerns that differential privacy will complicate legislative mapmaking.
NCSL’s director for elections and redistricting, Wendy Underhill, said the current iterations of differential privacy may make it difficult for states to meet their constitutional requirements when drawing up new congressional districts.
“If differential privacy does not have accurate population totals at block level, it is hard for districts to be built that we are sure are of equal population,” she said.
Underhill said it may be possible to show that an area’s actual population is different from what census data claims, potentially opening up new avenues for litigation. The addition of noise to data may also complicate drawing districts under the Voting Rights Act, which prohibits racial discrimination at the polls.
A key part of figuring out whether an area needs to protect voter rights depends on analyzing racially polarized voting trends at the precinct level, said Loyola University Law School professor Justin Levitt.
“That is a really small unit, and differential privacy makes a big difference in really small units,” Levitt said. “It’s going to make it hard to show voting is as polarized as it actually is on the ground.”
Levitt said some in the civil rights community have pushed the Census Bureau to merge some very rare combinations of data — like people over 100, or people of five or more races — to cut down on the amount of data released.
The executive order that President Donald Trump issued last year after dropping a citizenship question from the questionnaire only adds to the uncertainty of how accurate the census data will be. That order required the Census Bureau to compile citizenship data for the entire country at the most detailed geographic level, which Levitt said will only take away from the privacy loss budget and make the entire census less accurate.
“(The executive order) is adding to the concern that at the most local geographies, the data will be noisy enough to be problematic,” Levitt said. “Every additional reduction in the precise data available helps with accuracy.”