1 Introduction

Context of the problem. Every ten years, the US Census Bureau conducts and publishes a national census of population, which is used to apportion congressional seats among all the states. Each seat is won by the party with the most votes in a certain district (first-past-the-post system). In each state, all districts must have the same number of constituents, and this number is roughly the same across states. Each state also draws district lines for its own legislatureFootnote 1. This system aims at representing every voter fairly, however 37 states out of 50 leave the redrawing of district boundaries to politicians [20]. This results in gerrymandering, an expression coined in 1812 as a portmanteau when Governor of Massachussets Elbridge Thomas Gerry redistricted the state in such a way that one district looked like a mythological salamander [23], see Fig. 1.

According to Black’s law dictionary [8], gerrymandering is “the process of dividing a state with such a geographical arrangement as to accomplish an ulterior or unlawful purpose, as, for instance, to secure a majority for a given political party in districts where the result would be otherwise if they were divided according to obvious natural lines”. Indeed, the legislator in charge of redistricting a state will try to maximize the number of majorities won by his or her party, which distorts the representative weight of voters. This problem is not specific to the US, as described in [7]. The two basic strategies of gerrymandering are called cracking and packing: cracking favorable voters across as many districts as possible to get tight majorities in each, and packing unfavorable voters into homogeneous districts, so that their votes are diluted. Whereas racial gerrymandering—redistricting in order to decrease or increase political representation of racial minorities—tends to be censored (Shaw v. Reno, 1993) as it infringes on the Voting Right Act of 1965 [29]Footnote 2, partisan gerrymandering— redistricting based on the political orientation of voters—is not judiciable in the Supreme Court, as recently confirmed in Lamone v. Benisek (2019)Footnote 3.

Fig. 1
figure 1

The original gerrymander

Contributions. A problem of optimal partisan gerrymandering consists in designing district boundaries in order to get the best political advantage for one party. The present paper considers a model where the gerrymanderer chooses a division of the state into N equal-sized districts under the constraint that the average proportion of favorable and unfavorable voters over districts is equal to the overall proportion of such voters in the state. Results of elections are subject to uncertainty: the outcome of each district is a normally distributed random variable centered around the proportion of favorable voters. The aim of the gerrymanderer is to maximize the expected number of districts won.

The contributions of this paper are the following. Our first result is a characterization of the value of this optimization problem given the number N of districts and the variance \(\sigma ^2\) of the uncertainty. This value is related to the concavification (least concave majorant) of the payoff function under uniform districting. The value of the gerrymandering problem with N districts is not equal to this concave closure, but it converges to it as the number of districts goes to infinity. It is interesting to note that concavifying a function is also the way to solve Bayesian persuasion games. This is no coincidence: simple binary games of Bayesian persuasion such as the “prosecutor-judge” of [17] can be seen as gerrymandering problems, as noticed by [19]. The optimal signal for the prosecutor boils down to “packing” negative signals and “cracking” positive ones in order to have the best chances to convince the judge just enough. More complex Bayesian persuasion games could be seen as gerrymandering problems as well if we complexify the model (uncertainty, more than two parties etc.).

Second, we introduce concepts of representation, influence and fairness, which measure the relative weight of a vote under a districting method. We consider two methods: uniform districting, or full cracking, and communityFootnote 4districting, or full packing. We show that uniform districting is a very unfair method (and even the most unfair when \(\sigma = 0\)), while community districting is very fair (and even the fairest method when \(\sigma = 0\) and \(N = +\infty \)). This implies that surprisingly, optimal partisan gerrymandering is fairer than drawing districts in order to maximize political heterogeneity, while minimizing political heterogeneity tends to be very fair. Obviously, under our definition of fairness, the fairest method is full proportionality at the state level. We also evaluate the fairness of optimal gerrymandering and propose a measure according to which optimal gerrymandering is quite unfair and closer to uniform districting than to community districting.

Related literature. Our paper relates to literature on optimal gerrymandering. Our model differs from the seminal model of [25] in several ways. First, they use only one common random parameter instead of several independent ones. Second, they attribute a continuous value from \(-1\) (extreme left) to 1 (extreme right) to each voter, instead of our binary setting. Third, they consider two objectives: maximizing the expected number of seats and maximizing the probability of getting a majority of seats. Fourth, they discuss the feasibility of designing fair districts but do not introduce a measure of fairness. [27] adds a geographical constraint to the previous model and provides practical formulas for gerrymandering optimally with a concrete example. [14] consider a bias towards one party or another and conclude that optimal gerrymandering is based on the opinion of the median voter. Contrary to us, they assume the number of voters to be discrete. The approach of [13] is quite similar: voters are uniformly distributed on a spectrum from extreme left to extreme right, and the result is affected by a noisy signal. They also propose numerous extensions to their model such as risk aversion and specific goals for each district. Unlike most papers in the literature, they conclude that cracking is never an optimal tactic. [26] show that finding an optimal districting with geographical constraints is an NP-hard problem. [12] get the same conclusion and provide a in polynomial time algorithm for finding the optimal districting in a simpler version of the model. [30] take population instability into account and show that the gerrymanderer can benefit from it. The model of [19] consider general density of voters, aggregate shocks and utility function. This setup is flexible enough to encompass several models cited above. Their optimal solution, called “segregate-pair”, is a generalization of cracking and packing. Independently of our work, [19] also remark a parallel between gerrymandering and Bayesian persuasion, with the simplest version of gerrymandering (binary voters and no uncertainty) being equivalent to the “prosecutor-judge” game of [17]. Notice that [19] consider a model with a continuum of districts and use recent results from Bayesian persuasion [10, 18]. One contribution of our work is to solve completely a problem with finite set of districts.

This paper is also related to the literature studying the impact of gerrymandering on fairness of voting systems. The notion of fairness is common in the literature, with a relative consensus on defining fairness as proportionality. [9] define a metric called the gerrymandering power of a party and show that it decreases when voters are increasingly segregated, while the outcome becomes more representative. This is close to our conclusion that community districting is a very fair method. [15] study gerrymandering as a specific problem of public choice where a policy bias emerges because the median choice of the districts (“the median of the median”) does not necessarily correspond to the median choice statewise. [16] discuss the applicability of measuring partisan bias to help the Court decide over the constitutionnality of a redistricting proposal. [28] introduce a measure of fairness called “efficiency gap” that captures the wasted votes of each party. By computing it to redistrictings between 1972 and 2012, they observe that the gap has increased significatively in the recent years in favor of the Republicans. [22] offer a methodology for deciding whether a districting map favors one party and apply it to the case of Moldavia (the literature rarely focuses on gerrymandering outside the US). Some papers suggest new voting mechanisms to correct the unfairness created by gerrymandering. [11] proposes a mechanism involving both parties inspired by cake-cutting mechanisms to ensure a fair districting.

In this line, [3] presents a solution called Fair Majority Voting: the total number of seats given to each party is based on their total share of votes and each candidate must have a score high enough in his or her district to get the seat. This system has the advantage of ensuring a representative outcome, while preserving local elections with single candidates. Michel Balinski has written many articles on the Gerrymandering problem [1, 2, 4, 5]. [3] is among his many contributions on the impact of voting rules on democratic representation of opinions, see [6].

Organization. The paper is organized as follows. The model is described in Section 2. Our results on optimal gerrymandering are in 3, Section 4 presents our notion of fairness. Section 5 concludes.

2 Model

Consider a state where there are two political parties, blue and red, and a continuum of voters of mass 1 with a proportion \(p\in [0,1]\) of blue voters. A partisan gerrymanderer who favors the blue party has to cut the state into \(N\ge 1\) districts. A district \(i=1,\dots , N\) is a subset of voters with a proportion \(p_i\) of blue voters. All districts are of equal size and any proportion of voters is achievable within a district. A vector of proportions \(\vec p=(p_1,\dots ,p_N)\) is a feasible N -districting if \(\frac{1}{N}\sum _{i=1}^Np_i=p\). We denote \(\mathcal {D}_N(p)\subseteq [0,1]^N\) the set of feasible N-districtings with aggregate proportion p.

The voting system is first-past-the-post: a party wins a certain district if it receives at least half of the votes. However, the actual result of the election is imperfectly represented by the proportion of blue voters. We assume that district i with proportion \(p_i\) is won by the blue party if \(p_i + \sigma \epsilon _i \ge 0.5\) where \((\epsilon _1,\dots ,\epsilon _N)\) are i.i.d. random variables distributed from the standard Gaussian \(\mathcal {N}(0,1)\) and \(\sigma >0\) is a variance parameter. For \(p\in [0,1]\) and \(\epsilon \sim \mathcal {N}(0,1)\), denote the probability of winning by

$$\begin{aligned} F^\sigma (p)=\mathbb {P}(p+\sigma \epsilon \ge 0.5)=\Phi \Big (\frac{p-0.5}{\sigma }\Big ) \end{aligned}$$

where \(\Phi \) is the cumulative distribution function of \(\mathcal {N}(0,1)\). In the limit case without noise \(\sigma =0\), we let \(F^0(p)=\mathbf {1}{\{p\ge 0.5\}}\), which means that we assume the blue party also wins a district in case of tied vote.

For any feasible districting \(\vec p\), we denote \({\mathcal {F}}^\sigma _N(\vec p)=\frac{1}{N}\sum _{i=1}^N F^\sigma (p_i)\) the payoff of the gerrymanderer, that is, the expected number of districts won. Optimal partisan gerrymandering consists in maximizing the payoff over feasible N-districtings:

$$\begin{aligned} V_N^\sigma (p)=\max \bigg \{{\mathcal {F}}^\sigma _N(\vec p) : \frac{1}{N}\sum _{i=1}^N p_i=p\bigg \} \end{aligned}$$

Any solution to this maximization problem is an optimal districting. Note that \(V_1^\sigma (p)=F^\sigma (p)\le V_N^\sigma (p)\) since the unique 1-districting can be replicated by the uniform N-districting \((p,\dots ,p)\). An anti-optimal districting is one that minimizes the payoff over feasible N-districtings:

$$\begin{aligned} v_N^\sigma (p)=\min \bigg \{{\mathcal {F}}^\sigma _N(\vec p) : \frac{1}{N}\sum _{i=1}^N p_i=p\bigg \} \end{aligned}$$

Denote \({\mathrm {cav}}f\) (resp. \({\mathrm {vex}}f\)) the smallest concave function above f (resp. the largest convex function below f): \({\mathrm {cav}}f(p)=\sup \big \{\sum _i\lambda _if(p_i) : \sum _i\lambda _i p_i=p\big \}\) and \({\mathrm {vex}}f(p)=\inf \big \{\sum _i\lambda _if(p_i) : \sum _i\lambda _i p_i=p\big \}\). Any feasible districting yields a payoff in between \({\mathrm {cav}}F^\sigma (p)\) and \({\mathrm {vex}}F^\sigma (p)\). We thus have the following inequality:

$$\begin{aligned} {\mathrm {vex}}F^\sigma (p)\le v_N^\sigma (p)\le {\mathcal {F}}^\sigma _N(\vec p)\le V_N^\sigma (p)\le {\mathrm {cav}}F^\sigma (p). \end{aligned}$$
(1)

The function \({\mathrm {cav}}F^\sigma \) is an upper bound on the payoff of the gerrymanderer (and \({\mathrm {vex}}F^\sigma \) is a lower bound). In the next section, we characterize the optimal payoff and the difference with the upper bound.

Comments on the model First, we have assumed away geographical constraints, we only require that districts have the same number of votes and we assume that any proportion of voters can be achieved in a district. These assumptions are very close to being satisfied in the US where voting districts are only required to be of equal size and connected. All districts contain at least thousands of voters, so any proportion can be achieved on a connected territory to a \(10^{-3}\) accuracy. More precisely, US congressional districts represent around 711,000 people according to the 2010 census while the representation of State legislative districts varies widely depending on the population and type of election (Senate or House of Representatives): from around 931,000 residents for the Senate of California to around 7,100 residents for the House of Representatives of North Dakota [21].

Second, we model uncertainty at the district level so that the outcome of the vote is the proportion plus some noise. The noise \(\epsilon _i\) accounts for the uncertainty of political surveys, swing voters, individual changes of opinions due to unexpected events (e.g. Covid-19)Footnote 5. We choose the normal distribution with fixed variance to get a natural specification of the model for which we obtain tractable characterizations. Although \(p_i+\sigma \epsilon _i\) can take negative values, for numbers which are reasonable in practice (e.g. p above \(20\%\) and \(\sigma \) below \(5\%\)) the probability of negative numbers is negligible. Given the uncertainty, the aim of the gerrymanderer is to maximize the expected number of districts won by its partyFootnote 6.

3 Optimal districting

3.1 The noiseless case

In this section we assume \(\sigma =0\) and consider \(F^0(p)=\mathbf {1}{\{p\ge 0.5\}}\). This is equivalent to considering that the gerrymanderer has perfect information over voters’ choices. We then have

$$\begin{aligned} {\mathrm {cav}}F^0(p) = \left\{ \begin{array}{ll} 2p &{} \text{ if } p< 0.5 \\ 1 &{} \text{ if } p \ge 0.5 \end{array} \right. \quad {\mathrm {vex}}F^0(p) = \left\{ \begin{array}{ll} 0 &{} \text{ if } p < 0.5 \\ 2p - 1 &{} \text{ if } p \ge 0.5 \end{array} \right. \end{aligned}$$

The set of payoffs of feasible districting lies within the convex hull of these two curves, see Fig. 2 for an illustration.

Fig. 2
figure 2

Feasible payoffs with \(N=5\). The feasible payoffs are the black dashed lines

Denote \(\mathcal {G}_N(F^0)\) the correspondence of feasible payoffs:

$$\begin{aligned} \mathcal {G}_N(F^0)=\Big \{(p,v)\in [0,1]^2 : \exists \vec p\in \mathcal {D}_N(p) \text { s.t. } \mathcal {F}^0_N(\vec p)=v\Big \}, \end{aligned}$$

and \(\mathrm {co}\, (F^0)\) the convex hull of the graph of \(F^0\).

Lemma 1

The feasible payoffs are

$$\begin{aligned} \mathcal {G}_N(F^0) = \mathrm {co}\, (F^0) \cap \Big \{\left( p,\frac{i}{N}\right) : p \in [0,1], i=1,\dots , N \Big \}. \end{aligned}$$

Proof

The inclusion \(\subseteq \) is straightforward, so we only prove that:

$$\begin{aligned} \forall p \in [0,1], \forall n \text { s.t } {\mathrm {vex}}F^0(p) \le \frac{n}{N} \le {\mathrm {cav}}F^0(p), \exists \vec {p} \in \mathcal {D}_N(p), \text { such that } \mathcal {F}^0_N(\vec {p}) = \frac{n}{N}. \end{aligned}$$

Fix such p and n, we want to find a feasible \(\vec p\) such that \(\forall i \le n, \ p_i \ge 0.5\), and \(\forall i > n, \ p_i < 0.5\). Consider the case \(p\le 0.5\). We have \(\frac{n}{N} \le {\mathrm {cav}}F^0(p)=2p\) so \(p \ge \frac{n}{2N}\). Define then

$$\begin{aligned} q:=\frac{Np-0.5n}{N-n}=p-\frac{n}{N-n}(0.5-p). \end{aligned}$$

We have \(q\le p\) and since \(p \ge \frac{n}{2N}\), q is greater or equal to 0. Let \(\vec p\) be the districting where \(p_i=0.5\) for \(i\le n\) and \(p_i=q\) for \(i>n\). This is feasible since

$$\begin{aligned} p=\frac{n}{N}0.5+\frac{N-n}{N}q. \end{aligned}$$

When \(p<0.5\), \(q<0.5\) as well and there are exactly n districts won. When \(p=0.5\) one can write

$$\begin{aligned} p=\frac{n}{N}(0.5+\delta )+\frac{N-n}{N}\left( q-\frac{n}{N-n}\delta \right) \end{aligned}$$

for a small \(\delta >0\) and the payoff is \(\frac{n}{N}\). The case \(p>0.5\) is symmetric. \(\square \)

The optimal payoff for the gerrymanderer is as follows.

Proposition 1

$$\begin{aligned} V_N^0(p) = \left\{ \begin{array}{ll} \frac{\lfloor 2Np \rfloor }{N} &{} \text{ if } p < 0.5 \\ 1 &{} \text{ if } p \ge 0.5. \end{array} \right. \end{aligned}$$

Proof

If \(p\ge 0.5\), then the uniform districting is optimal, so suppose \(p<0.5\) and let \(n=\lfloor 2Np\rfloor \). Consider the districting \(\vec {p}\) such that

$$\begin{aligned} p_i = \left\{ \begin{array}{ll} 0.5 &{}\text { if } i\le n \\ Np - \frac{n}{2} &{}\text { if } i = n+1\\ 0 &{}\text { if } i >n+1. \end{array} \right. \end{aligned}$$

This is a feasible districting since \(p = \frac{n}{N}0.5 + \frac{1}{N}\big (Np - \frac{n}{2} \big ) \). Also,

$$\begin{aligned} 2Np< n+1 \Rightarrow Np - \frac{n}{2}< 0.5 \Rightarrow p_{n+1} < 0.5. \end{aligned}$$

Thus, there are exactly n districts won and the payoff is \(\frac{n}{N}\). Having \(n+1\) districts with \(p_i\)’s greater or equal to 0.5 is not feasible since \(2Np < n+1 \Rightarrow \frac{n+1}{N}0.5>p\), so the gerrymanderer cannot win more that n districts. \(\square \)

The graph of \(V_N^0(p)\) is illustrated on Fig. 3 with \(N = 5\).

Fig. 3
figure 3

Optimal payoff function

To illustrate the optimal districting, consider Figs. 4 and 5. On the right panel, we see that for a proportion \(p = 0.34\), a possible optimal districting consists in putting most blue voters in districts \(d_1,d_2\) and \(d_3\) so that their proportion reach 0.5 (cracking tactic), putting the blue leftovers in \(d_4\) and letting \(d_5\) empty of blue voters, thus containing 100% red voters (packing tactic). When \(p_0 \ge 0.5\), the cracking tactic is sufficient. Notice that the optimal districting is not unique since the location of the leftover (here 4% of voters) is indifferent.

Fig. 4
figure 4

Uniform districting with \(p = 0.34\)

Fig. 5
figure 5

An optimal districting with \(p = 0.34\)

Note that \(v_N^\sigma (p) = 1-V_N^\sigma (1-p)\) for all \(p\in [0,1]\backslash \{\frac{1}{2N},\frac{2}{2N},\dots ,\frac{1}{2}\}\), since the tie-breaking rule breaks the symmetry between the two parties. From the formula of \(V_N(p)\), it is immediate that the optimal payoff approaches the upper bound \({\mathrm {cav}}F^0(p)\) (and the anti-optimal payoff approaches the lower bound \({\mathrm {vex}}F^0(p)\)) as the number of districts tends to infinity.

Corollary 1

$$\begin{aligned} \forall p\in [0,1], \lim _{N\rightarrow +\infty }V_N^0(p)={\mathrm {cav}}F^0(p), \lim _{N\rightarrow +\infty }v_N^0(p)={\mathrm {vex}}F^0(p). \end{aligned}$$

Thus, \({\mathrm {cav}}F^0(p)\) is the payoff that the gerrymanderer can obtain in the hypothetical case where infinitely many districts are possible.

3.2 The noisy case \(\sigma >0\)

The probability \(F^\sigma (p)\) of winning a district with proportion p is depicted on Figs. 6 and 7. When \(\sigma \) tends to 0, the curve converges to the step function \(F^0\).

Fig. 6
figure 6

\(F^\sigma \) with \(\sigma = 5\%\)

Fig. 7
figure 7

\(F^\sigma \) with \(\sigma = 1\%\)

We characterize the upper and lower bounds \({\mathrm {cav}}F^\sigma \) and \({\mathrm {vex}}F^\sigma \). Define the following function

$$\begin{aligned} {\mathrm {s}^\sigma }(p)=\frac{F^\sigma (p)-F^\sigma (0)}{p} \end{aligned}$$

as the slope between \((p, F^\sigma (p))\) and \((0, F^\sigma (0))\).

Proposition 2

  1. 1.

    For each \( \sigma >0\), there exists a unique \(p^* \in (0,1]\), such that \({\mathrm {s}^\sigma }(p^*)=(F^\sigma )'(p^*),\) moreover, \({\mathrm {s}^\sigma }(p)\) is maximized at \(p^*\).

  2. 2.

    We have, \({\mathrm {cav}}F^\sigma (p) = {\left\{ \begin{array}{ll} F^\sigma (0)+{\mathrm {s}^\sigma }(p^*)p &{} \text{ if } p <p^* \\ F^\sigma (0) &{} \text{ if } p \ge p^* \end{array}\right. }\), \({\mathrm {vex}}F^\sigma (p) = {\left\{ \begin{array}{ll} F^\sigma (p) &{} \text{ if } p \le p^*\\ 1- F^\sigma (0) -{\mathrm {s}^\sigma }(p^*)(1-p)&{} \text{ if } p >p^*. \end{array}\right. }\)

When \(p \ge p^*\), \(F^\sigma (p)={\mathrm {cav}}F^\sigma (p)\) therefore the uniform districting is optimal. Solving numerically gives \(p^* = 0.5890\) for \(\sigma = 5\%\) and \(p^* = 0.5246\) for \(\sigma = 1\%\), \(p^*\) tends to 0.5 when \(\sigma \) tends to 0. See Figs. 8 and 9 for an illustration.

Proof

1. The function \({\mathrm {s}^\sigma }\) is differentiable and \(({\mathrm {s}^\sigma })'(p) = \frac{-G(p)}{p^2}\) with \(G(p)=F^\sigma (p)-F^\sigma (0)-p(F^\sigma )'(p)\). Also, \(G'(p)=-p(F^\sigma )''(p)= \frac{p(p - 0.5)}{\sigma ^2 \sqrt{2\pi }}e^{-(\frac{1}{\sigma }(p - 0.5))^2/2}\) since \(F^\sigma (p)=\Phi (\frac{p-0.5}{\sigma })\) with \(\Phi \) the c.d.f. of \(\mathcal {N}(0,1)\). It follows that G is strictly decreasing on [0, 0.5] and strictly increasing on [0.5, 1]. We have \(G(0)=0\), \(G(0.5)<0\) and

$$\begin{aligned} G(1) = \Phi \Big (\frac{1}{2 \sigma }\Big ) - \frac{1}{\sigma } \Phi '\Big (\frac{1}{2\sigma }\Big ) - \Phi \Big (\frac{-1}{2 \sigma }\Big ) = 2 \Phi \Big (\frac{1}{2 \sigma }\Big ) - \frac{1}{\sigma } \Phi '\Big (\frac{1}{2\sigma }\Big ) - 1. \end{aligned}$$

To see that \(G(1)>0\), let \(H(x) = 2 \Phi (x) - \frac{x}{2}\Phi '(x) - 1\). H is differentiable and

$$\begin{aligned} H'(x)= \frac{3}{2\sqrt{2\pi }}e^{-x^2/2} + \frac{x^2}{2\sqrt{2\pi }}e^{-x^2/2} > 0. \end{aligned}$$

H is thus strictly increasing and \(H(0) = 2\Phi (0)- 1 = 0\). This implies \(\forall x> 0, \ H(x) > 0\) and thus \(G(1) > 0\). From the intermediate value theorem, there exists a unique \(p^*\) such that \(G(p^*) = 0\). At this point, \({\mathrm {s}^\sigma }\) is maximized and \({\mathrm {s}^\sigma }(p^*)=(F^\sigma )'(p^*)\).

2. Observe that \(F^\sigma \) is concave on [0.5, 1], thus also on \([p^*,1]\). The line joining \((0,F^\sigma (0))\) to \((p^*,F^\sigma (p^*))\) is above the graph of \(F^\sigma \). Hence \({\mathrm {cav}}F^\sigma \) coincides with this line on \([0,p^*]\) and with \(F^\sigma \) on \([p^*,1]\). The formula for \({\mathrm {vex}}F^\sigma \) is obtained by symmetry. \(\square \)

Fig. 8
figure 8

\(F^\sigma \) and its closures with \(\sigma = 5\%\)

Fig. 9
figure 9

\(F^\sigma \) and its closures with \(\sigma = 1\%\)

To characterize the optimal gerrymandering payoff \(V_N^\sigma (p)\), we introduce the following sequence of functions. For \(n=1,\dots , N\) and \(p \in [0,\frac{n}{N}]\), define

$$\begin{aligned} F_n(p) = \frac{n}{N}\Big [F^\sigma \Big (\frac{Np}{n}\Big ) - F^\sigma (0)\Big ] + F^\sigma (0) = {\mathrm {s}^\sigma }\Big (\frac{Np}{n}\Big )p + F^\sigma (0) \end{aligned}$$

Notice that \(F_N = F^\sigma \).

Lemma 2

For each \(n=2,\dots , N\), there is a unique \(p^*_n\in \big (0,\frac{n}{N}\big )\) such that \(F_{n-1}(p^*_n) = F_n(p^*_n)\). Moreover, this sequence is such that for each n, \(p^*_n\in \big (\frac{n-1}{N}p^*, \frac{n}{N}p^*\big )\) and:

  1. 1.

    \(p^*_2 = \frac{1}{N}\);

  2. 2.

    \(F_{n-1}(p) > F_n(p)\) for \(p < p^*_n\), \(F_{n-1}(p) < F_n(p)\) for \(p > p^*_n\);

  3. 3.

    For \(n=2,\dots , N-1\), \(\frac{N}{n}p^*_n< \frac{N}{n+1}p^*_{n+1}< p^*< \frac{N}{n}p^*_{n+1} < \frac{N}{n-1}p^*_n\).

Proof

Since \({\mathrm {s}^\sigma }\) is increasing on \([0,p^*]\) and then decreasing, for each \(n=2,\dots , N\), \({\mathrm {s}^\sigma }(\frac{N}{n-1}p)\) is also increasing then decreasing over its domain. If follows that the graphs of \({\mathrm {s}^\sigma }(\frac{N}{n-1}p)\) and of \({\mathrm {s}^\sigma }(\frac{N}{n}p)\) cross at a unique \(p^*_n \in (0,\frac{n-1}{N})\). More precisely:

  • For \(0< p < \frac{n-1}{N}p^*\), \({\mathrm {s}^\sigma }(\frac{N}{n-1}p)\) and \({\mathrm {s}^\sigma }(\frac{N}{n}p)\) strictly increase, but \({\mathrm {s}^\sigma }(\frac{N}{n-1}p) < {\mathrm {s}^\sigma }(\frac{N}{n}p)\) because the latter increases faster.

  • For \(\frac{n-1}{N}p^* \le p \le \frac{n}{N}p^*\), \({\mathrm {s}^\sigma }(\frac{N}{n-1}p)\) strictly decreases while \({\mathrm {s}^\sigma }(\frac{N}{n}p)\) increases. The former ranges from \((F^\sigma )'(p^*)\) to \({\mathrm {s}^\sigma }(\frac{n}{n-1}p^*)\), while the latter ranges from \({\mathrm {s}^\sigma }(\frac{n-1}{n}p^*) < (F^\sigma )'(p^*)\) to \((F^\sigma )'(p^*) > {\mathrm {s}^\sigma }(\frac{n}{n-1}p^*)\). From the intermediate value theorem, there is a unique \(p^*_n\) in \((\frac{n-1}{N}p^*,\frac{n}{N}p^*)\) such that \({\mathrm {s}^\sigma }(\frac{N}{n-1}p^*_n) = {\mathrm {s}^\sigma }(\frac{N}{n}p^*_n)\).

  • For \(\frac{n}{N}p^*< p < \frac{n-1}{N}\), both \({\mathrm {s}^\sigma }(\frac{N}{n-1}p)\) and \({\mathrm {s}^\sigma }(\frac{N}{n}p)\) decrease, \({\mathrm {s}^\sigma }(\frac{N}{n-1}p)\) decreases faster and has a smaller value at \(\frac{n}{N}p^*\). Hence they do not cross on this interval.

  1. 1.

    \(p^*_2\) is the only proportion such that

    $$\begin{aligned} F_1(p^*_2)= & {} F_2(p^*_2) \Leftrightarrow \frac{1}{N}\Big [F^\sigma (Np^*_2) - F^\sigma (0)\Big ] = \frac{2}{N}\Big [F^\sigma \Big (\frac{N}{2}p^*_2\Big ) - F^\sigma (0)\Big ]\\\Leftrightarrow & {} F^\sigma (Np^*_2) + F^\sigma (0) = 2F^\sigma \Big (\frac{N}{2}p^*_2\Big ) \end{aligned}$$

    The latter inequality is satisfied for \(p^*_2 = \frac{1}{N}\) since \(F^\sigma (1) = 1 - F^\sigma (0)\) and \(F^\sigma (0.5) = 0.5\).

  2. 2.

    This point follows from \(F_{n-1}(p) - F_n(p) = [{\mathrm {s}^\sigma }(\frac{N}{n-1}p) - {\mathrm {s}^\sigma }(\frac{N}{n}p)]p\).

  3. 3.

    For \(n=2,\dots , N-1\), the function \(p\mapsto {\mathrm {s}^\sigma }(\frac{N}{n}p) - {\mathrm {s}^\sigma }(\frac{N(n-1)}{n^2}p)\) is obtained from \(q\mapsto {\mathrm {s}^\sigma }(\frac{N}{n-1}q) - {\mathrm {s}^\sigma }(\frac{N}{n}q)\) by setting \(q=\frac{n-1}{n}p\). Since \({\mathrm {s}^\sigma }(\frac{N}{n}q) ={\mathrm {s}^\sigma }(\frac{N}{n-1}q)\) for \(q = p^*_n\), the functions \({\mathrm {s}^\sigma }(\frac{N}{n}p)\) and \({\mathrm {s}^\sigma }(\frac{N(n-1)}{n^2}p)\) cross at the point \(p = \frac{n}{n-1}p^*_n\).

    Now, \({\mathrm {s}^\sigma }(\frac{N(n-1)}{n^2}p)\) and \({\mathrm {s}^\sigma }(\frac{N}{n+1}p)\) are obtained from \({\mathrm {s}^\sigma }(\frac{N}{n}p)\) by change of variable with respective coefficients \(\frac{n-1}{n}\) and \(\frac{n}{n+1}\). We have \(\frac{n-1}{n}< \frac{n}{n+1} < 1\), therefore \({\mathrm {s}^\sigma }(\frac{N}{n}p)\) reaches first its peak at point \(p = \frac{n}{N}p^*\), then \({\mathrm {s}^\sigma }(\frac{N}{n}p)\) and \({\mathrm {s}^\sigma }(\frac{N}{n+1}p)\) cross at point \(p = p^*_{n+1}\), then \({\mathrm {s}^\sigma }(\frac{N}{n}p)\) and \({\mathrm {s}^\sigma }(\frac{N(n-1)}{n^2}p)\) cross at point \(p = \frac{n}{n-1}p^*_n\). Hence \(\frac{n}{N}p^*< p^*_{n+1} < \frac{n}{n-1}p^*_n \) which implies \(p^*< \frac{N}{n}p^*_{n+1} < \frac{N}{n-1}p^*_n\).

    Similarly, \({\mathrm {s}^\sigma }(\frac{N}{n}p)\) and \({\mathrm {s}^\sigma }(\frac{N(n+1)}{n^2}p)\) cross at \(p = \frac{n}{n+1}p^*_n\), \({\mathrm {s}^\sigma }(\frac{N(n+1)}{n^2}p)\) and \({\mathrm {s}^\sigma }(\frac{N}{n-1}p)\) are obtained from \({\mathrm {s}^\sigma }(\frac{N}{n}p)\) by change of variable with respective coefficients \(\frac{n+1}{n}\) and \(\frac{n}{n-1}\) with \(\frac{n}{n-1}> \frac{n+1}{n} > 1\). Therefore \({\mathrm {s}^\sigma }(\frac{N}{n-1}p)\) and \({\mathrm {s}^\sigma }(\frac{N}{n}p)\) first cross at \(p = p^*_n\), then \({\mathrm {s}^\sigma }(\frac{N(n+1)}{n^2}p)\) and \({\mathrm {s}^\sigma }(\frac{N}{n}p)\) cross at \(p = \frac{n}{n+1}p^*_n\), then \({\mathrm {s}^\sigma }(\frac{N}{n}p)\) reaches its peak at \(p = \frac{n}{N}p^*\). Thus \(p^*_n<\frac{n}{n+1}p^*_{n+1} < \frac{n}{N}p^*\) implying \(\frac{N}{n}p^*_n< \frac{N}{n+1}p^*_{n+1} < p^*\).

\(\square \)

These functions are illustrated on Fig. 10 for \(N = 5\) and \(\sigma = 5\%\). Figure 11 shows the lines \(L_n\) going through \((0,F^\sigma (0))\) with slope \({\mathrm {s}^\sigma }(\frac{N}{n}p^*_n)\) which is also equal to \({\mathrm {s}^\sigma }(\frac{N}{n-1}p^*_n)\). \(L_n\) is the line of indifference between the districtings \((\frac{N}{n-1}p,\dots ,\frac{N}{n-1}p,0,\dots ,0)\) and \((\frac{N}{n}p,\dots ,\frac{N}{n}p,0,\dots ,0)\).

Fig. 10
figure 10

The functions \(F_n\) when \(1 \le n \le 5\)

Fig. 11
figure 11

The lines \(L_n\) when \(2 \le n \le 5\)

Theorem 1

The optimal gerrymandering payoff \(V_N^\sigma (p)\) is given by:

$$\begin{aligned} \forall n= & {} 1,\dots , N, \forall p \in [p^*_n,p^*_{n+1}),\; V_N^\sigma (p) = F_n(p)\\&= \frac{n}{N}\Big [F^\sigma \Big (\frac{N}{n}p\Big ) - F^\sigma (0)\Big ] + F^\sigma (0) \end{aligned}$$

where by convention \(p_1^* = 0\), and \(V_N^\sigma (p)=F^\sigma (p)\) on \( [p^*_N,1]\).

Proof

Finding the optimal gerrymandering is trivial for \(p=0\). Also for \(p\ge p^*\), \(F^\sigma (p)={\mathrm {cav}}F^\sigma (p)\) so the uniform districting is optimal. Fix thus \(n\in \{1,\dots , N\}\) and \(p\in [p^*_n,p^*_{n+1})\). Consider the maximization problem

$$\begin{aligned} \max \bigg \{\frac{1}{N}\sum _{i=1}^{N}F^\sigma (p_i) : \forall i, 0\le p_i \le 1, \sum _{i=1}^{N} p_i = Np\bigg \} \end{aligned}$$

and write its Lagrangian: \(\forall \lambda \in \mathbb {R}\), \(\forall \vec {\mu } = (\mu _1,\dots ,\mu _N)\), \( \vec {\nu } = (\nu _1,\dots ,\nu _N) \in \mathbb {R}_+^N\),

$$\begin{aligned} \mathcal {L}(\vec {p},\lambda ,\vec {\mu },\vec {\nu }) = \frac{1}{N}\sum _{i=1}^{N} F^\sigma (p_i) -\lambda \Big (\sum _{i=1}^{N} p_i - Np_0\Big ) + \sum _{i=1}^{N}\mu _ip_i+\sum _{i=1}^{N}\nu _i(1-p_i) \end{aligned}$$

Denoting \(f^\sigma := (F^\sigma )'\), the KKT necessary conditions are:

$$\begin{aligned} \left\{ \begin{array}{ll} \forall i, \ \frac{1}{N}f^\sigma (p_i) - \lambda + \mu _i - \nu _i = 0 \\ \sum _{i=1}^{N} p_i = Np \\ \forall i, \ \mu _i \ge 0 \text { and }\nu _i \ge 0 \\ \forall i, \ 0 \le p_i \le 1 \\ \forall i, \ \mu _i p_i = 0 \text { and } \nu _i (1-p_i )= 0 \end{array}\right. \end{aligned}$$

Without loss of generality, let us assume \( p_1 \ge \dots \ge p_N\) and let m be the number of strictly positive \(p_i\)’s so that \(\vec {p} = (p_1,\dots ,p_m,0,\dots ,0)\). Notice that for \(p_i = 1\), the first equation gives \(\frac{1}{N}f^\sigma (p_i) = \lambda + \nu _i\), for \(0< p_i < 1\), we have \(\frac{1}{N}f^\sigma (p_i) = \lambda \) and for \(p_i = 0\), we have \(\frac{1}{N}f^\sigma (p_i) = \lambda - \mu _i\).

Suppose first that there exists a \(p_i\) s.t. \(0< p_i < 1\). Then it is impossible that any \(p_j\) be equal to 1. Indeed, since \(f^\sigma \) is a Gaussian curve centered in 0.5, \(\frac{1}{N}f^\sigma (1) = \lambda + \nu _j \ge \lambda = \frac{1}{N}f^\sigma (p_i)\) implies that 1 is in between \(p_i\) and \(1 - p_i\), which is absurd. It follows that \( \nu _j=0\). Second, suppose that all \(p_i\)’s are 0 or 1. Since \(p\in (0,1)\), the \(p_i\)’s cannot all be 0 or all be 1. For any ij such that \(p_i = 1\) and \(p_j = 0\), we have

$$\begin{aligned} \frac{1}{N}f^\sigma (1) = \lambda + \nu _i \ge \lambda - \mu _j = \frac{1}{N}f^\sigma (0) \end{aligned}$$

Since, \(f^\sigma (0)=f^\sigma (1)\), this implies \(\mu _i = \nu _i = 0\). Therefore, in both cases \(\nu _i = 0\) for all i. Thus \(\forall p_i > 0\), \(\frac{1}{N}f^\sigma (p_i) = \lambda \). Denoting \( p_{\lambda }\) the unique solution in [0.5, 1] of \(\frac{1}{N}f^\sigma (p) = \lambda \), we know that \(p_i>0\) is either \(p_\lambda \) or \(1-p_\lambda \).

We assume for now \(p \ge \frac{1}{2N}\) and check the second-order necessary KKT conditions. We have \(\frac{\partial ^2 \mathcal {L}}{\partial p_i p_j} = 0\) for \(i \ne j\) and \(\frac{\partial ^2 \mathcal {L}}{\partial p_i^2} = \frac{1}{N}(f^\sigma )'(p_i)\). It is thus necessary that \((f^\sigma )'(p_i)\le 0\) for \(p_i>0\). Since \((f^\sigma )'(p_\lambda )<0\) and \((f^\sigma )'(1-p_\lambda )>0\), the only possible districting is

$$\begin{aligned} \vec {p} = (\underbrace{p_{\lambda },\dots ,p_{\lambda }}_{m \text { times }},0,\dots ,0). \end{aligned}$$

The feasibility constraint gives:

$$\begin{aligned} \frac{m}{N}p_{\lambda } + \frac{N-m}{N}\times 0 = p \Leftrightarrow p_{\lambda } = \frac{Np}{m}. \end{aligned}$$

Since \(0.5 \le p_{\lambda } < 1\), we have \(Np < m \le 2Np\), thus if we let \(m_{\min }\) and \(m_{\max }\) be the minimal and maximal values of m:

$$\begin{aligned} m_{\min } := \lfloor Np \rfloor +1\le m\le m_{\max } := \min (\lfloor 2Np \rfloor , N) \end{aligned}$$

and this interval of m’s is not empty. It follows that

$$\begin{aligned} p_{\lambda } \in \Big \{ \frac{Np}{m_{\max }}, \frac{Np}{m_{\max } - 1}, \dots , \frac{Np}{m_{\min }}\Big \} \end{aligned}$$

Since \(p \in [p^*_n,p^*_{n+1})\), we have \(m_{\min }\le n\le m_{\max }\). Indeed, \(n \le N\) and we know from Lemma 2 that \(\frac{n}{2N} < \frac{n}{N}p^* \le p^*_n \le p\), which implies \(n \le 2Np\Rightarrow n \le \lfloor 2Np \rfloor \), hence \(n \le m_{\max }\). Also, when \(n \le N-1\), we have \(p < p^*_{n+1} \le \frac{n}{N} \ \Rightarrow \ n > Np\), hence \(n \ge \lfloor Np\rfloor +1 = m_{\min }\). If \(n=N\), we have \(n = N > Np\) hence \(n \ge m_{\min }\) by the same argument. Now,

$$\begin{aligned} {\mathcal {F}}^\sigma _N(\vec {p})= & {} \frac{1}{N}\sum _{i=1}^{N} F^\sigma (p_i) = \frac{m}{N}[F^\sigma (p_{\lambda }) - F^\sigma (0)] + F^\sigma (0) \\= & {} \frac{m}{N}\left[ F^\sigma \big (\frac{Np}{m}\big ) - F^\sigma (0)\right] + F^\sigma (0) = F_m(p) \end{aligned}$$

and we also know from the previous lemma that for \(p \in [p^*_n,p^*_{n+1})\), \(F_n(p)> F_{n+1}(p)> \dots > F_N(p)\) and \(F_n(p) \ge F_{n-1}(p)> \dots > F_{n_{\min }}(p)\) where \(n_{\min }\) is the smallest integer such that \(\frac{Np}{n_{\min }} < 1\). Also, \(F_n(p) > F_{n-1}(p)\) for \(p \in (p^*_n,p^*_{n+1})\). In this case, the optimal districting is obtained for \(m=n\). For \(p = p^*_n\), there are two solutions: for \(m = n\) and for \(m = n-1\).

Lastly, consider the case \(p< \frac{1}{2N} < p^*_2\), i.e. \(p_0 \in (p^*_1,p^*_2)\). As \(p_{\lambda }\) is unfeasible, we know that we must have \(p_i = 1 - p_{\lambda } < 0.5\) for \(i \le m\), hence the feasibility constraint implies

$$\begin{aligned} 1 - p_{\lambda } = \frac{Np}{m} \Rightarrow \frac{Np}{m} < 0.5 \ \Rightarrow \ m > 2Np\Rightarrow m \ge \lfloor 2Np\rfloor + 1 = 1. \end{aligned}$$

It means that all the \(F_n\) are defined on such an interval, and we know from the lemma that when \(p < p^*_2\), \(F_1(p) > F_n(p)\) for any \(n \ge 2\). The unique solution is thus obtained for \(m=1\). \(\square \)

The proof also shows that for p different from any \(p^*_n, n=2,\dots ,N\), the optimal districting is unique once proportions are ordered and up to permutations of districts with equal proportions. It will be denoted \(\vec p^*\) without risk of confusion. The uniform and optimal districtings are shown on Fig. 12 and 13 for \(N=5\) and \(p_0 = 0.4\). Figure 14 displays \(V_N^\sigma \) (in blue) when \(N=5\) and \(\sigma = 5\%\).

Fig. 12
figure 12

Uniform districting with \(p = 0.40\)

Fig. 13
figure 13

Optimal districting with \(p = 0.40\)

Fig. 14
figure 14

The optimal payoff function

Note that in the noisy case, the symmetry between optimality and anti-optimality holds everywhere: \(\forall p \in [0,1], v_N^\sigma (p) = 1 - V_N^\sigma (1-p)\). We deduce the optimal and anti-optimal payoffs with infinitely many districts.

Corollary 2

$$\begin{aligned} \forall p\in [0,1], \lim _{N\rightarrow +\infty }V_N^\sigma (p)={\mathrm {cav}}F^\sigma (p), \lim _{N\rightarrow +\infty }v_N^\sigma (p)={\mathrm {vex}}F^\sigma (p). \end{aligned}$$

Proof

The function \(V_N^\sigma (p)\) coincides with \({\mathrm {cav}}F^\sigma (p)\) for \(p\ge p^*\). Consider \(p\in [0,p^*]\) and n such that \(p^*_n\le p<p^*_{n+1}\). From Lemma 2, we know that for each m, \(\frac{m-1}{N}p^*\le p^*_m\le \frac{m}{N}p^*\), thus

$$\begin{aligned} \frac{n-1}{N}p^*\le p^*_n\le p\le p^*_{n+1}\le \frac{n+1}{N}p^* \end{aligned}$$

and it follows that

$$\begin{aligned} \frac{p}{p^*}-\frac{1}{N}\le \frac{n}{N}\le \frac{p}{p^*}+\frac{1}{N}. \end{aligned}$$

Thus for fixed p, letting N tend to infinity and n such that \(p^*_n\le p<p^*_{n+1}\), implies that \(\frac{n}{N}\) tends to \(\frac{p}{p^*}\). Passing to the limit in \(V_N^\sigma (p) = \frac{n}{N}\Big [F^\sigma \Big (\frac{N}{n}p\Big ) - F^\sigma (0)\Big ] + F^\sigma (0)\) gives

$$\begin{aligned} \lim _{N\rightarrow +\infty }V_N^\sigma (p) = \frac{p}{p^*}[F^\sigma (p^*) - F^\sigma (0)] + F^\sigma (0) \end{aligned}$$

as desired. Remark that since \(|\frac{n}{N}-\frac{p}{p^*}|\le \frac{1}{N}\), the convergence is uniform in p. The result for \(v_N^\sigma (p)\) is obtained by symmetry. \(\square \)

4 Fairness

4.1 Representation, Influence and Perfect Fairness

We now introduce measures of quality of districtings in terms of efficiency for the gerrymanderer and of fairness of representation of parties.

Definition 1

  • The representation of the blue party in districting \(\vec {p}\) is the ratio of the payoff of the districting to the aggregate proportion: \(R_N(\vec p)={\mathcal {F}}^\sigma _N(\vec p)/p\).

    The representation of the red party in districting \(\vec {p}\) is \(\overline{R}_N(\vec p) = (1-{\mathcal {F}}^\sigma _N(\vec p))/(1-p)\).

    The optimal representation of the blue party is given by an optimal districting: \(R_N^*(p) = V^\sigma _N(p)/p\). The anti-optimal representation of the red party is \(\overline{R}_N^*(p) =(1 - V^\sigma _N(p))/(1-p)\).

  • The influence of the blue party is the ratio of the representations of the two parties:

    $$\begin{aligned} \forall p \in (0,1), \ \forall \vec {p} \in \mathcal {D}_N(p), \quad I_N(\vec {p}) = \frac{R_N(\vec {p})}{\overline{R}_N(\vec {p})} = \left\{ \begin{array}{ll} \frac{{\mathcal {F}}^\sigma _N(\vec {p})(1 - p)}{(1 - {\mathcal {F}}^\sigma _N(\vec {p}))p} &{} \text{ if } {\mathcal {F}}^\sigma _N(\vec {p}) < 1 \\ \infty &{} \text{ if } {\mathcal {F}}^\sigma _N(\vec {p}) = 1 \end{array}\right. \end{aligned}$$

    The influence of the red party is: \(\overline{I}_N(\vec {p}) = 1/I_N(\vec {p}) = \overline{R}_N(\vec {p})/R_N(\vec {p})\).

    The optimal (resp. anti-optimal) influence of the blue party is \(I^*_N(p) = R^*_N(p) /\overline{R}^*_N(p) \) and \(\overline{I}^*_N(p) = 1/I^*_N(p) = R^*_N(p) /\overline{R}^*_N(p) \).

  • Districting \(\vec {p}\) yields a perfectly fair representation if each party has the same representation, \(R_N(\vec {p}) = \overline{R}_N(\vec {p})\), or equivalently both parties’ influence is equal to one, \(I_N(\vec {p}) = \overline{I}_N(p) = 1\).

Representation has a natural interpretation: if due to gerrymandering, 25% of blue voters get 50% of the seats in the House, they double their political representation compared to their importance. Influence tells how much the vote of a blue voter weighs against that of a red voter. For instance, if 25% of blue voters get 50% of the seats compared to 75% of red voters who also get 50% of the seats, that means the influence of a blue voter is 3 times that of a red voter. Now, if instead 55% of blue voters get 100% of the seats and red voters get nothing, the influence of a blue voter is infinitely superior to that of a red voter.

We have the following properties.

Properties 1

  1. 1.

    Optimal districtings yield higher representation and influence than any other districting: for any p and \(\vec p\in \mathcal {D}_N(p)\),

    $$\begin{aligned} R^*_N(p)\ge R_N(\vec p), \;\;\overline{R}^*_N(p)\le & {} \overline{R}_N(\vec p)\\ I^*_N(p)\ge I_N(\vec p), \;\; \overline{I}^*_N(p)\le & {} \overline{I}_N(\vec p) \end{aligned}$$
  2. 2.

    \(\forall p \in [0,1], \ \forall \vec {p} \in \mathcal {D}_N(p), \ R_N(\vec {p}) \le \frac{{\mathrm {cav}}F^\sigma (p)}{p} \ \) and \( \ I_N(\vec {p}) \le \frac{{\mathrm {cav}}F^\sigma (p)(1-p)}{(1-{\mathrm {cav}}F^\sigma (p))p}\)

  3. 3.

    Districting \(\vec {p} \in \mathcal {D}_N(p)\) yields a perfectly fair representation iff \(R_N(\vec {p}) = 1\) iff \(\overline{R}_N(\vec {p}) = 1\).

Proof

The first two properties are straightforward. To see the third one, note that \(\vec {p}\) yields a perfectly fair representation iff

$$\begin{aligned} {\mathcal {F}}^\sigma _N(\vec {p})/p {=} (1 {-} {\mathcal {F}}^\sigma _N(\vec {p}))/(1 {-} p)\Leftrightarrow {\mathcal {F}}^\sigma _N(\vec {p})(1 - p) = (1 - {\mathcal {F}}^\sigma _N(\vec {p}))p\Leftrightarrow {\mathcal {F}}^\sigma _N(\vec p)=p. \end{aligned}$$

\(\square \)

4.2 The noiseless case \(\sigma =0\)

Proposition 3

  1. 1.

    In the model without noise, \(R^*_N\) and \(\overline{R}^*_N\) are given by

    $$\begin{aligned} R^*_N(p) = \left\{ \begin{array}{ll} \frac{\lfloor 2Np \rfloor }{Np} &{} \text{ if } p< 0.5 \\ \frac{1}{p} &{} \text{ if } p \ge 0.5 \end{array} \right. \ \text { and } \ \overline{R}^*_N(p) = \left\{ \begin{array}{ll} \frac{N - \lfloor 2Np \rfloor }{N(1 -p)} &{} \text{ if } p < 0.5 \\ 0 &{} \text{ if } p \ge 0.5. \end{array} \right. \end{aligned}$$
  2. 2.

    For all \(p, \ R^*_N(p) \le 2\) and \(\overline{R}^*_N(p) < 1+ \frac{1}{N}.\)

  3. 3.

    For \(p > 0\), \(R^*_N(p) = 2 \Leftrightarrow p \in \{\frac{1}{2N},\frac{2}{2N},\dots ,0.5\}.\)

  4. 4.

    For \(p < 0.5, \ R^*_N(p) \underset{N \rightarrow \infty }{\longrightarrow } 2\) and \(\overline{R}^*_N(p) \underset{N \rightarrow \infty }{\longrightarrow } \frac{1-2p}{1-p}.\)

This follows directly from the formula of \(V_N^0(p)\). \(R^*_N(p)\) is capped at 2 because a party needs to get at least 50% of votes to win a seat, so the payoff can at best double compared to the proportion of votes. Figure 15 shows the representation of both parties \(R_N(\vec {p})\) and \(\overline{R}_N(\vec {p})\) with uniform districting, where we denote \(\vec {u}(p) = (p,\dots , p)\). Figure 16 shows the optimal representations \(R^*_N(p)\) and \(\overline{R}^*_N(p)\), i.e. the representations of both parties with an optimal districting for the blue party – thus anti-optimal for the red party. Both parties are pictured with their respective colors, once again \(N = 5\).

Fig. 15
figure 15

Representation of uniform districting

Fig. 16
figure 16

Optimal representation

Proposition 4

  1. 1.

    In the model without noise, \(I^*_N\) and \(\overline{I}^*_N\) are given by

    $$\begin{aligned} I^*_N(p) = \left\{ \begin{array}{ll} \frac{\lfloor 2Np \rfloor (1 - p)}{(N - \lfloor 2Np \rfloor )p} &{} \text{ if } p< 0.5 \\ \infty &{} \text{ if } p \ge 0.5 \end{array} \right. \ \text { and } \ \overline{I}^*_N(p) = \left\{ \begin{array}{ll} \frac{(N - \lfloor 2Np \rfloor )p}{\lfloor 2Np \rfloor (1 - p)} &{} \text{ if } p < 0.5 \\ 0 &{} \text{ if } p \ge 0.5 \end{array} \right. \end{aligned}$$
  2. 2.

    For each \(n=0,\dots , N-1\), \(I^*_N\) is decreasing and \(\overline{I}^*_N\) is increasing on the interval \([\frac{n}{2N},\frac{n+1}{2N})\).

  3. 3.

    For all \( p \in (0,0.5), \ I^*_N(p) \le N+1\).

  4. 4.

    For all \( p \ge \frac{1}{2N}, \ \overline{I}^*_N(p) < 1\).

  5. 5.

    For all \( p < 0.5, \ I^*_N(p) \underset{N \rightarrow \infty }{\longrightarrow } \frac{2(1-p)}{1-2p}\) and \(\overline{I}^*_N(p) \underset{N \rightarrow \infty }{\longrightarrow } \frac{1-2p}{2(1-p)}.\)

The graphs of \(I_N(p)\), \(\overline{I}_N(p)\), \(I^*(p)\) and \(\overline{I}^*_N(p)\) are shown on Figs. 17 and 18 with \(N=5\).

Proof

  1. 1.

    Follows directly from Proposition 1.

  2. 2.

    On the interval \([\frac{n}{2N},\frac{n+1}{2N})\), \( \lfloor 2Np \rfloor = n\) and \(I^*_N(p)=\frac{n(1 - p)}{(N - n )p}\) which is decreasing in p.

  3. 3.

    For \(p \in [0,0.5)\), let \(n= \lfloor 2Np \rfloor \). From the previous point \(I^*_N(p)\le I^*_N(\frac{n}{2N})\). Since \(I^*(\frac{n}{2N})\) increases with n,

    $$\begin{aligned} I^*_N(p)\le I^*_N\Big (\frac{n}{2N}\Big )\le I^*_N\Big (\frac{N-1}{2N}\Big )=N+1. \end{aligned}$$
  4. 4.

    Take \(p \in [\frac{1}{N}, 0.5)\),

    $$\begin{aligned} \overline{I}^*_N(p)=\frac{(N - \lfloor 2Np \rfloor )p}{\lfloor 2Np \rfloor (1 - p)}<\frac{(N - (2Np - 1))p}{(2Np - 1)(1 - p)}=\frac{Np-p(2Np-1)}{2Np-1-p(2Np-1)}<1 \end{aligned}$$

    where the last inequality follows from \(Np-1>0\implies Np<2Np-1\).

    For \(p \in [\frac{1}{2N}, \frac{1}{N})\), \(\lfloor 2Np \rfloor = 1\) hence \(\overline{I}^*_N(p) = \frac{(N - 1)p}{1 - p} = \frac{Np - p}{1 - p}<1\) since \(Np < 1\).

  5. 5.

    Follows directly from Proposition 1.

\(\square \)

Fig. 17
figure 17

Influence with uniform districting, \(\sigma = 0\)

Fig. 18
figure 18

Optimal influence, \(\sigma = 0\)

4.3 The noisy case \(\sigma >0\)

From Theorem 1, we derive formulas for optimal representation and influence.

Proposition 5

In the model with noise \(\sigma \), \(R^*_N\) and \(\overline{R}^*_N\) are given as follows:

$$\begin{aligned} \forall p \in [p^*_n,p^*_{n+1}), R^*_N(p)= & {} {\mathrm {s}^\sigma }\Big (\frac{N}{n}p\Big ) + \frac{F^\sigma (0)}{p} \text { and }\\&\overline{R}^*(p) = \frac{1 - F^\sigma (0) - p{\mathrm {s}^\sigma }(\frac{N}{n}p)}{1 - p} \end{aligned}$$

\(I^*_N\) and \(\overline{I}^*_N\) are such that:

$$\begin{aligned} \forall p \in [p^*_n,p^*_{n+1}), I^*_N(p)= \frac{(1-p_0)\big ({\mathrm {s}^\sigma }(\frac{N}{n}p)p + F^\sigma (0)\big )}{p\big (1 - {\mathrm {s}^\sigma }(\frac{N}{n}p)p - F^\sigma (0)\big )} \text { and } \overline{I}^*_N(p) = 1/I^*_N(p). \end{aligned}$$

Theses curves are shown on Figs. 19, 20, 2122.

Fig. 19
figure 19

Representation with uniform districting, \(\sigma = 5\%\)

Fig. 20
figure 20

Optimal representation, \(\sigma = 5\%\)

Fig. 21
figure 21

Influence with uniform districting, \(\sigma = 5\%\)

Fig. 22
figure 22

Optimal influence, \(\sigma = 5\%\)

Remark 1

For reasonable values of \(\sigma \) and non-negligible p, \(R^*_N(p) \le (F^\sigma )'(p^*)\), with almost equality when \(p \in \Big \{\frac{1}{N}p^*,\frac{2}{N}p^*,\dots ,p^*\Big \}\). First, \(F(0)=\Phi (-0.5/\sigma )\) is very small in practice (0.02275 for \(\sigma =25\%\), \(7.6e^{-24}\) when \(\sigma = 5\%\)), thus we can neglect \(\frac{F(0)}{p}\) as long as p is not too small. Then, \(R^*_N(p) \approx {\mathrm {s}^\sigma }(\frac{N}{n}p) \le (F^\sigma )'(p^*)\) and \({\mathrm {s}^\sigma }(\frac{N}{n}p) = (F^\sigma )'(p^*) \Leftrightarrow p \in \Big \{\frac{1}{N}p^*,\frac{2}{N}p^*,\dots ,p^*\Big \}\).

Optimal representation and influence for infinite districting (\(N\rightarrow \infty \)) are as follows.

Corollary 3

For all \(\sigma \ge 0\) and \(p\in (0,1)\),

$$\begin{aligned}&\lim _{N\rightarrow +\infty }R^*_N(p)=\frac{{\mathrm {cav}}F^\sigma (p)}{p}, \;\;\lim _{N\rightarrow +\infty }\overline{R}^*_N(p)=\frac{1-{\mathrm {cav}}F^\sigma (p)}{1-p},\\&\quad \lim _{N\rightarrow +\infty }I^*_N(p)=\frac{{\mathrm {cav}}F^\sigma (p)(1-p)}{(1-{\mathrm {cav}}F^\sigma (p))p},\;\; \lim _{N\rightarrow +\infty }\overline{I}^*_N(p)=\frac{(1-{\mathrm {cav}}F^\sigma (p))p}{{\mathrm {cav}}F^\sigma (p)(1-p)}. \end{aligned}$$

4.4 Measure of fairness

As gerrymandering distorts representation and influence of both parties, a natural question is to evaluate the fairness or unfairness of a given districting. We have already defined a districting as perfectly fair if both parties have equal influence. We extend this idea by introducing general measures of fairness to compare districtings. We abuse notation by omitting the dependence on N, as the following definition applies to any number of districts including the infinite case.

Definition 2

A measure of fairness is a function m from \((0,1)^N\) to \(\mathbb {R}\) that verifies:

  • Continuity over \((0,1)^N\);

  • Symmetry between parties: if we let \(1 - \vec {p} = (1 - p_1,\dots ,1 - p_N)\), \(m(1 - \vec {p}) = m(\vec {p})\);

  • Fairness ordering: given \(\vec {p_1}\) and \(\vec {p_2}\) with the same aggregate proportion p, if \({\mathcal {F}}^\sigma _N(\vec {p_1})< {\mathcal {F}}^\sigma _N(\vec {p_2}) < p\) or \({\mathcal {F}}^\sigma _N(\vec {p_1})> {\mathcal {F}}^\sigma _N(\vec {p_2}) > p\), then \(m(\vec {p_1}) < m(\vec {p_2})\);

  • Perfect fairness of perfect representation: for any districting \(\vec {p^*}\) such that \(R(\vec {p^*}) = 1\), we have \(m(\vec {p^*}) = \underset{\vec {p}\in (0,1)^\mathcal {N}}{\max }\{m(\vec {p})\}\).

This definition allows many possible measures and induces a complete preorder (a fairness preorder) on districtings: \(\vec {p_1}\) is fairer than \(\vec {p_2}\) if \(m(\vec {p_1})\ge m(\vec {p_2})\). Possible measures include \(\mathrm {Gap}(\vec {p}) {:}{=}1 - |{\mathcal {F}}^\sigma _N(\vec {p}) - p |\) for which unfairness is proportional to the gap between the expected result and the aggregate proportion of voters, as well as \(\mathrm {Fair}(\vec {p}) {:}{=}\min (I(\vec {p}), \overline{I}(\vec {p}))\) which is based on each party’s influence. \(\mathrm {Gap}(\cdot )\) is an “absolute” measure since it does not take the sizes of each proportion of voters into account, whereas \(\mathrm {Fair}(\cdot )\) is relative to such proportions.

Definition 3

A measure of fairness is relative if it also verifies perfect unfairness of no representation: for any districting \(\vec {p^*}\) such that \(R(\vec {p^*}) = 0\), we have \(m(\vec {p^*}) = \underset{\vec {p}\in (0,1)^\mathcal {N}}{\min }\{m(\vec {p})\}\).

For any strictly increasing function s, \(s(\mathrm {Fair}(\vec {p})) {:}{=}\min (s(I(\vec {p})), s(\overline{I}(\vec {p})))\) is relative. \(\mathrm {Gap}(\cdot )\) and \(\mathrm {Fair}(\cdot )\) yield quite different results when the proportion of voters is highly unbalanced. For instance, if \(p = 2\%\) and \({\mathcal {F}}^\sigma _N(\vec {p}) = 1\%\), \(\mathrm {Gap}(\vec {p}) = 0.98\) which is very fair whereas \(\mathrm {Fair}(\vec {p}) \approx 0.495\) which is much more unfair. However, such asymmetries of votes are unlikely in a bipartisan state. If we consider more reasonable differences such as \(p = 30\%\) and \({\mathcal {F}}^\sigma _N(\vec {p}) = 15\%\), \(\mathrm {Gap}(\vec {p})\) returns 0.85 which is unfair but still acceptable, while \(\mathrm {Fair}(\vec {p})\) outputs 0.412 which is very unfair. \(\mathrm {Fair}(\cdot )\) is thus more sensitive to the welfare of minority voters. For practical use, \(\mathrm {Gap}(\cdot )\) has the advantage of being easier to explain to a public audience.

Definition 4

  • A districting method \(\vec d\) is a mapping that associates a feasible districting to each proportion in (0, 1). Given a relative measure of fairness m, a districting method is perfectly fair (resp. perfectly unfair) if \(\vec d(p)\) is perfectly fair (resp. perfectly unfair) for all \(p \in (0,1)\).

  • Given a relative measure of fairness m, we define the average fairness of a districting method \(\vec d(p)\) as \(\displaystyle \int _0^1 m(\vec d(p)) dp\).

  • Given a relative measure of fairness m, we say that districting method \(\vec d_1\) is always fairer than method \(\vec d_2\) if

    $$\begin{aligned} \forall p\in (0,1), m(\vec d_1(p))\ge m(\vec d_2(p)) \end{aligned}$$

    and that it is fairer on average if

    $$\begin{aligned} \displaystyle \int _0^1 m(\vec d_1(p)) dp \ge \int _0^1 m(\vec d_2(p)) dp. \end{aligned}$$

The simplest districting method consists in having equal proportions in all districts. This is the uniform districting method \(\vec {u}\). Optimal gerrymandering yields a class of optimal partisan districting methods all denoted by \(\vec {o}\), which are equivalent to each other in terms of expected payoff: for \(\sigma > 0\) and \(p_1 \ge p_2 \ge \cdots \ge p_N\), the solution is unique almost everywhere, except for \( \{p_1^*, p_2^*, \dots , p_N^*\}\). Community districting \(\vec {c}\) is another method which consists in segregating blue and red voters as much as possible: \(\vec {c}(p) = (1,\dots ,1,p_{n+1},0,\dots ,0)\). The n first districts are completely blue, only one district is diverse and the rest are completely red. This method is known as full packing whereas uniform districting is full cracking.

Fig. 23
figure 23

Community districting, \(p = 0.34\)

Proposition 6

Given a relative measure of fairness m, the uniform districting method is perfectly unfair when \(\sigma \) converges to 0.

Proof

Suppose \(\sigma =0\) and consider a uniform districting. Then one party wins all the seats so the other party’s representation is 0. By continuity, the other party’s representation converges to 0 when \(\sigma \rightarrow 0\). \(\square \)

The uniform districting method (full cracking) is thus the most unfair of all when noise is reasonably small. This has a surprising but obvious consequence: optimal partisan gerrymandering is fairer than uniform districting. Figures 24 and 25 illustrate the fairness curve of the uniform districting method, with \(\mathrm {Fair}(\cdot )\) being represented by the dotted green line.

Fig. 24
figure 24

Fairness of uniform districting, \(\sigma =0\)

Fig. 25
figure 25

Fairness of uniform districting, \(\sigma = 5\%\)

Intuitively, a symmetry argument would imply that community districting (full packing) is the fairest method. This is not true for all p and N. Payoffs, influences and fairness of community districting are shown on Figs. 28, 29, and 31.

Fig. 26
figure 26

Payoff of community districting, \(\sigma =0\)

Fig. 27
figure 27

Payoff of community districting, \(\sigma =5\%\)

Fig. 28
figure 28

Fairness of gerrymandering, \(\sigma =5\%\)

Fig. 29
figure 29

Fairness of community districting, \(\sigma = 5\%\)

Proposition 7

Suppose that the number of districts is infinite (\(N\rightarrow +\infty \)) and take any measure of fairness m. The community districting method tends to be perfectly fair when \(\sigma \) converges to 0.

Proof

Fix p, let \(n_N(p)=\lfloor Np \rfloor \in \mathbb {N}\) and denote \(u_N = \frac{n_N(p)}{N}\), \(v_N=\frac{n_N(p)+1}{N}\). Both \(u_N\) and \(v_N\) tend to p when N goes to infinity and \(u_N \le p< v_N\). We have

$$\begin{aligned} {\mathcal {F}}^\sigma _N(\vec {c}(u_N)) = \frac{n_N(p)}{N}F^\sigma (1) + \frac{N - n_N(p)}{N}F^\sigma (0) \underset{N \rightarrow +\infty }{\longrightarrow } pF^\sigma (1) + (1 -p)F^\sigma (0). \end{aligned}$$

and

$$\begin{aligned}&{\mathcal {F}}^\sigma _N(\vec {c}(v_N)) = \frac{n_N(p)+1}{N}F^\sigma (1) + \frac{N - n_N(p)-1}{N}F^\sigma (0) \underset{N \rightarrow +\infty }{\nonumber }\\&\quad {\longrightarrow } pF^\sigma (1) + (1 -p)F^\sigma (0). \end{aligned}$$

Since \(\lim \nolimits _{\sigma \rightarrow 0}F^\sigma (1)=F^0(1)=1\) and \(\lim \nolimits _{\sigma \rightarrow 0}F^\sigma (0) = F^0(0)= 0\), the conclusion follows by continuity of m. \(\square \)

This result is useful as in practice, the number of districts is quite large. Figures 30 and 31 show that for \(N=20\), the districting is close to being perfectly fair.

Fig. 30
figure 30

Payoff of community districting, \(N=20\) and \(\sigma = 5\%\)

Fig. 31
figure 31

Fairness of community districting, \(N=20\) and \(\sigma = 5\%\)

The previous result is asymptotic. Notice that:

Lemma 3

When N is finite, no relative method is perfectly fair.

Proof

If \(\sigma = 0\), any method yields a payoff of 0 when \(p < \frac{1}{2N}\), hence the fairness of a relative measure is minimal. As the fairness ordering property implies that m is not constant, the method cannot be perfectly fair.

If \(\sigma > 0\), the payoff of any method is close to \(F^\sigma (0)\) when p is close to 0, hence for p sufficiently small, the representation of the blue party is above 1. \(\square \)

We now evaluate the fairness of optimal gerrymandering.

Proposition 8

  1. 1.

    When \(\sigma = 0\) and \(p \ge 0.5\), optimal gerrymandering is perfectly unfair.

  2. 2.

    Given the measure \(\mathrm {Fair}\), when \(\sigma = 0\) and N is infinite, the average fairness of optimal gerrymandering is \(\displaystyle \frac{1-\ln (2)}{2}\).

  3. 3.

    Given the measure \(\mathrm {Fair}\), when \(\sigma = 0\) and \(p \le \frac{1}{N}\) or \(p\ge \frac{3}{2N}\), the community districting method is fairer than optimal gerrymandering.

Notice that \(\frac{1-\ln (2)}{2} \approx 0.153\), which is quite small. Optimal gerrymandering is thus a very unfair method of districting. This result focuses on the noiseless case for which we are able compare optimal gerrymandering with community districting for a wide range of values of the parameters. For \(\sigma > 0\), showing that the community districting method is fairer than optimal gerrymandering is more difficult, showing that this is true on average for any \(\sigma \) is also involved. Numerical values give an insight on the average fairness of the three methods: for \(N = 5\) and \(\sigma = 5 \%\), the average fairness of \(\vec {c}\), \(\vec {o}\) and \(\vec {u}\) are respectively 0.661, 0.243 and 0.066. As N approaches infinity and \(\sigma \) approaches 0, these values converge respectively to 1, \(\frac{1 - \ln (2)}{2}\) and 0.

We conclude that according to this measure, optimal gerrymandering is closer to uniform districting than to community districting.

Proof

  1. 1.

    This follows directly from the fact that for \(p \ge 0.5\), \(\vec {o}(p) = \vec {u}(p)\).

  2. 2.

    When N is infinite, \(F^0(p) = {\mathrm {cav}}F^0(p) > p\), hence

    $$\begin{aligned} \displaystyle \int _0^1 \mathrm {Fair}(\vec o(p)) dp&= \int _0^{\frac{1}{2}} \frac{(1-2p)p}{2p(1-p)} dp + \int _{\frac{1}{2}}^1 0 \, dp \\&= \int _0^{\frac{1}{2}} \left( 1 - \frac{1}{2(1-p)}\right) dp \\&= \frac{1 - \ln (2)}{2} \end{aligned}$$
  3. 3.

    When \(p \le \frac{1}{N}\), both methods are equivalent. When \(p \ge \frac{3}{2N}\), \(R(\vec {o}(p)) \ge 1\).

    We have either \(R(\vec {c}(p)) \ge 1\), in which case optimality yields \(R(\vec {o}(p)) \ge R(\vec {c}(p)) \ge 1\), hence \(\mathrm {Fair}(\vec {o}(p)) \le \mathrm {Fair}(\vec {c}(p))\), or \(R(\vec {c}(p)) < 1\). In the latter case, if we let \(\vec {c}(p) = (1,\dots ,1,p_{n+1}, 0,\dots ,0)\), this implies \(p_{n+1} < 0.5\). If \(p \ge 0.5\), we know that \(\vec {o}(p)\) is perfectly unfair, therefore \(\mathrm {Fair}(\vec {o}(p)) \le \mathrm {Fair}(\vec {c}(p))\). Also notice that for \(N\in \{1,2,3,4\}\), the computation is straightforward. Let now \(N\ge 5\) and \(p < 0.5\).

    The feasibility constraint is \(\frac{n}{N} + \frac{p_{n+1}}{N} = p\). Since \(p < 0.5\), we know that \(n < \frac{N}{2}\). We have then,

    $$\begin{aligned} \mathrm {Fair}(\vec {c}(p))> & {} \mathrm {Fair}(\vec {o}(p)) \\&\Leftrightarrow \; \; \frac{R(\vec {c}(p))}{\overline{R}(\vec {c}(p))}> \frac{\overline{R}(\vec {o}(p))}{R(\vec {o}(p))} \\&\Leftrightarrow \; \; \frac{\mathcal {F}^0_N(\vec {c}(p))}{1 - \mathcal {F}^0_N(\vec {c}(p))} \frac{1-p}{p}> \frac{1 - \mathcal {F}^0_N(\vec {o}(p))}{\mathcal {F}^0_N(\vec {o}(p))} \frac{p}{1-p} \\&\Leftrightarrow \; \; \frac{\frac{n}{N}}{1 - \frac{n}{N}} \frac{1-p}{p}> \frac{1 - \frac{2n}{N}}{\frac{2n}{N}} \frac{p}{1-p}\\&\Leftrightarrow \; \; \frac{(1 -p)^2}{p^2}> \frac{(N-2n)(N-n)}{2n^2} \\&\Leftrightarrow \; \; \frac{(1 - \frac{n}{N} - \frac{p_{n+1}}{N})^2}{(\frac{n}{N} + \frac{p_{n+1}}{N})^2}> \frac{(N-2n)(N-n)}{2n^2} \\&\Leftrightarrow \; \; \frac{(N - n - p_{n+1})^2}{(n+p_{n+1})^2} > \frac{(N-2n)(N-n)}{2n^2} \\ \end{aligned}$$

    Since \(p_{n+1}<\frac{1}{2}\),

    $$\begin{aligned} \frac{(N - n - p_{n+1})^2}{(n+p_{n+1})^2} > \frac{(N - n - \frac{1}{2})^2}{(n+\frac{1}{2})^2}. \end{aligned}$$

    We conclude the proof by showing that \(\frac{(N - n - \frac{1}{2})^2}{(n+\frac{1}{2})^2} >\frac{(N-2n)(N-n)}{2n^2}\). To see this, note that \(p\ge \frac{3}{2N}\) implies \(n\ge 2\). Then,

    $$\begin{aligned} (\sqrt{2}-1)n\ge 2(\sqrt{2}-1)>\frac{1}{2}\Rightarrow N\Big ((\sqrt{2}-1)n-\frac{1}{2}\Big )>0, \end{aligned}$$

    while

    $$\begin{aligned} \Big (\sqrt{2}-\frac{3}{2}\Big )n^2+\Big (\frac{1}{\sqrt{2}}-\frac{3}{4}\Big )n<0, \end{aligned}$$

    thus we get

    $$\begin{aligned} N\Big ((\sqrt{2}-1)n-\frac{1}{2}\Big )>\Big (\sqrt{2}-\frac{3}{2}\Big )n^2+\Big (\frac{1}{\sqrt{2}}-\frac{3}{4}\Big )n. \end{aligned}$$

    Re-organizing gives

    $$\begin{aligned} \sqrt{2}n\Big (N-n-\frac{1}{2}\Big )>\Big (N-\frac{3}{2}n\Big )\Big (n+\frac{1}{2}\Big ) \Rightarrow \frac{(N-n-\frac{1}{2})^2}{(n+\frac{1}{2})^2}> \frac{(N-\frac{3}{2}n)^2}{2n^2}, \end{aligned}$$

    and since \((N-\frac{3}{2}n)^2\ge (N-2n)(N-n)\), the proof is complete.

\(\square \)

5 Conclusion

As in real life, optimal gerrymandering in our model consists in cracking and packing voters to maximize the number of seats, no matter the parameters N and \(\sigma \). Given any relative measure of fairness (e.g the function \(\mathrm {Fair}(\cdot )\)), full cracking is most unfair while full packing is one of the fairest districting methods. Since a gerrymandered districting is a mix of cracking and packing when \(p < p^*\), its fairness is between these two extremes. When N goes to infinity, the optimal payoff function converges to its concave closure, which becomes equivalent to solving a simple problem of Bayesian persuasion. This parallel is not only mathematical: gerrymandering is an intuitive way of illustrating the optimal signal of a Bayesian persuasion game as a mix of cracking and packing a given level of belief.

Our model differs from the current literature in that we assume a binary type of voters instead of a continuum, a finite number of districts and a random fluctuation at the district level instead of a combination of individual noise and global change in opinions. A random fluctuation at a state level \(\tau \epsilon _{agg}\) could be added in order to simulate aggregate shocks of opinion. Even though our model is less general than that of [19], it is aimed at being more practical with few parameters and applicable to a finite number of districts. We also introduce the notions of representation, influence and measure of fairness. We find it interesting to discuss optimality of gerrymandering together with fairness of districtings. Offering a way to evaluate the level of fairness of districtings may contribute to the debate about tolerating or banning gerrymandering.

Interestingly, our aim of evaluating the fairness of gerrymandering led us to the conclusion that districtings with political diversity are often less fair than those with political homogeneity. This gives some insights on the American democratic system. There is an overall consistency between the electoral system which is based on first-past-the-post elections and districting, and the US society which is more structured around communities than Europe for instance. This poses an interesting chicken–egg problem: whether the electoral system is the product of the social organization or the other way round.

Many research questions lie ahead of this work. First, one could study fairness on more sophisticated versions of the model taking into account graphical constraints, or requirements of minimal and maximal proportions of blue and red voters (to prevent the formation of districts with 100% voters of only one party). Second, one could revisit optimal design of districtings from the point of view of fairness. This opens up practical questions such as: Who should design districtings, politicians, neutral commissions or direct participation of citizens? Is it acceptable to create districts with total political uniformity or would that broaden the gap of partisanship even further? What level of “unfairness” can be tolerated in order to prevent such a partisan fragmentation from happening?

We hope that this work contributes to Michel Balinski’s research agenda: how the theoretical study of voting systems helps improving democratic institutions.