Differential Privacy and the Upcoming Process of Redistricting
A Commentary By Teresa A. Sullivan and Qian Cai
New method of protecting privacy in census may cause problems for drawing districts.
KEY POINTS FROM THIS ARTICLE
— The U.S. Census Bureau is required by law to protect the confidentiality of census respondents.
— The bureau is using a new method called “differential privacy” as part of the 2020 census to fuzz up the data in order to prevent individual respondents from being potentially identified.
— However, the use of differential privacy may cause problems in the upcoming redistricting process by injecting inaccurate information into the granular census data required to draw districts of equal sizes and to ensure fair racial representation.
Differential privacy creates challenge for redistricting
The U.S. Census Bureau is charged under Title 13 of the U.S. Code to protect the confidentiality of census respondents and to ensure that their data remain private for 72 years. Since 1850, the Census Bureau has implemented privacy measures in every decade, but the 2020 census provides new challenges in ensuring privacy. In particular, new efforts to secure individual privacy in the released census data create a tradeoff between privacy and accuracy that is problematic for redistricting.
Because of the large number of commercial databases, social media sites, and other sources of digital information, analysts have access to many sources of individual data besides the census. Even some government records that are subject to Freedom of Information Act requests may contain such information (such as driver license records and voting records). This greater availability of data, along with high-power computing, pose the possibility of reverse identification: that is, a person armed with publicly available data may be able to identify a unique individual respondent in the census data. The Census Bureau considers this possibility an unacceptable risk given its responsibilities under Title 13.
The solution the bureau intends to use is called differential privacy, a technique developed by data scientists and described in the literature. The bureau is using differential privacy in such a way that only state population totals remain intact, but any populations below the state level (say, for a town, city, or county) and the characteristics of the individuals within the jurisdiction are changed with “noise” injection to fuzz up the actual data. A variable called ε measures the level of noise that is injected into the data. A higher value of ε indicates less loss of accuracy; a lower value indicates more loss of accuracy. The noise injection, even with the highest ε value the bureau has applied, results in many cases not only in less accurate data, but in inconsistent or even illogical data.
The state populations for the apportionment of Congress, which have already been released, are not subjected to the differential privacy process. Differential privacy is not needed for these data, whose use is to document “the whole number of persons in each state” as required by the 14th Amendment to the U.S. Constitution. Neither voting-age population nor race/ethnicity information is required for reapportionment.
The data for redistricting are a different story. The Census Bureau plans to use differential privacy when these data, called Public Law 94-171 redistricting data, are released. The release date is currently Aug. 16; the statutory deadline of March 31 was missed because of the effects the pandemic had on completing the original count and quality checks of the data. Interested parties to redistricting are typically concerned with total population, voting-age population (18+), and race and ethnicity of these populations in small geographic areas, such as census block or census block group. Race and Hispanic origin are known to be associated with party affiliation and voting behavior. All these variables will be injected with statistical noise under conditions of differential privacy.
Just how much accuracy will be lost in redistricting? On April 26, 2021, the Census Bureau issued a test dataset for analysts. This test dataset consisted of the 2010 census redistricting data as originally issued, and then, for comparison, the same data, only using differential privacy. Because the census block, which is roughly the size of a city block, is the basic geographic unit for redistricting, the comparisons typically focus on the census block level.
The Demographics Research Group of UVA’s Weldon Cooper Center analyzed the test data for Virginia and found significant inaccuracy at the census block level. Some findings are listed below:
— Nearly a quarter of the census blocks had a population change of more than 10%.
— Nearly 2,500 census blocks had only children (ages 0-17) but no adults (ages 18+).
–Populations in 1,255 census blocks were completely erased to 0.
In addition, beyond census blocks, we identified significant outliers at larger geographies as well. For example:
— Among Virginia’s towns, the number of Black or African Americans was found to be inflated as high as nine times, American Indians four times, Asians four times, Hispanics six times, and More Than Two Races 16 times. On the flip side, these various racial/ethnic groups were found to be completely erased to zero in various towns, representing a 100% reduction.
— Among Virginia’s counties, Black or African Americans can be reduced by 100% (meaning entirely removed), American Indians by 62%, Asians by 56%, and Hispanics by 33%.
Is differential privacy a fatal flaw for redistricting? Some analysts are not too concerned because the very commercial databases that prompted the use of differential privacy are also available to redistricting commissions and committees. Voting records, for example, may provide the type of information that redistricters need. But redistricting still relies on the census counts to ensure equal size of the districts and fair racial representation.
Others have sounded the alarm, most notably Alabama, which has filed suit in federal court to prevent the use of differential privacy in the data released for redistricting. Sixteen state attorneys general of both red and blue persuasions filed an amicus brief in support of Alabama. The fundamental argument Alabama makes is that the Census Bureau will “provide the States purposefully flawed population tabulations…. [the Bureau] will force Alabama to redistrict using results that purposefully count people in the wrong place.” The filing alleges that the decision to use differential privacy was arbitrary and capricious, and a violation of the Administrative Procedure Act as well as a violation of the Census Act and the due process and equal protection rights of the plaintiffs.
Whatever decision is made in the Alabama case, however, social scientists who analyze census data, especially for small geographic areas, will find the issue of differential privacy a recurring concern in their analysis.
 Jaewoo Lee and Chris Clifton (2011). “How Much is Enough? Choosing ε for Differential Privacy”. IN: Lai X., Zhou, J., Li H. (eds) Information Security. ISC 2011. Lecture Notes in Computer Science, vol. 7001. Springer. https://doi.org/10.1007/978-3-642-24861-0_22.
 Complaint in Alabama v. U.S. Dep’t of Commerce, No. 3:21-cv-211 (M.D. Ala.), available at https://www.brennancenter.org/sites/default/files/2021-03/Complaint_%202021-03-11_0.pdf. The Court has approved the motion for a three-judge panel. As a result of the three-judge panel, the eventual decision could be appealed directly to the U.S. Supreme Court.
Teresa A. Sullivan is President Emerita and University Professor of Sociology at the University of Virginia. Qian Cai is the Director of the Demographics Research Group at the Weldon Cooper Center of the University of Virginia.
See Other Political Commentary.
This article is reprinted from Sabato's Crystal Ball.
Views expressed in this column are those of the author, not those of Rasmussen Reports. Comments about this content should be directed to the author or syndicate.
Rasmussen Reports is a media company specializing in the collection, publication and distribution of public opinion information.
We conduct public opinion polls on a variety of topics to inform our audience on events in the news and other topics of interest. To ensure editorial control and independence, we pay for the polls ourselves and generate revenue through the sale of subscriptions, sponsorships, and advertising. Nightly polling on politics, business and lifestyle topics provides the content to update the Rasmussen Reports web site many times each day. If it's in the news, it's in our polls. Additionally, the data drives a daily update newsletter and various media outlets across the country.
Some information, including the Rasmussen Reports daily Presidential Tracking Poll and commentaries are available for free to the general public. Subscriptions are available for $4.95 a month or 34.95 a year that provide subscribers with exclusive access to more than 20 stories per week on upcoming elections, consumer confidence, and issues that affect us all. For those who are really into the numbers, Platinum Members can review demographic crosstabs and a full history of our data.
To learn more about our methodology, click here.