Random-Sample Voting with a Finite Population


Posted

Putting a proposition before a large population of voters can be expensive, so an organization wishing to do so would like to have a reasonable assurance that a given proposition will pass. One approach is to take a survey of a randomly chosen subset of voters and use the results to estimate the proposition’s chances amongst the general population. The larger the survey size and the larger the margin that the proposition passes in the survey, the larger the chances are that the proposition would pass for a general vote.

The mathematics for computing the size of the survey required was already worked out in Surveying for a Voter Proposition, assuming an infinite population. The exact formula for a finite population is worked out here. The result still requires simplification.

Now assume that for the general population

  • n – the total number of voters
  • k – the number of voters who would vote yes.

The formula for the distribution of votes is the same as for the survey,

P ( k | n , p ) = ( n k ) p k ( 1 p ) n k P(k "|" n,p) = left ( stack { n # k } right ) p sup k (1-p) sup { n - k }

(1)

Except that p, rather than being fixed, has the distribution given by equation (3). This combined distribution has the name Beta-Binomial, and is given by

P ( k | n , α , β ) = ( n k ) B ( k + α , n k + β ) B ( α , β ) P( k "|" n, %alpha , %beta ) = left ( stack { n # k } right ) { B( k + %alpha , n - k + %beta ) } over { B( %alpha , %beta ) }

(2)

with α=y+1 and β=sy+1.

What the organization wants to know is the smallest number of polled people y required such that the chances of the vote not passing by a majority is less than some given number δ. This means that we need the cumulative distribution, which is given by

CDF ( k | n , α , β ) = 1 B ( β + n k 1, α + k + 1 ) B ( α , β ) B ( n k , k + 2 ) ( n + 1 ) F CDF( k "|" n, %alpha , %beta ) = 1 - { B( %beta +n -k - 1, %alpha + k + 1) } over { B( %alpha , %beta ) B( n - k, k + 2) (n+1)} F

(3)

where F is the generalized hypergeometric function

F = 3 F 2 [ 1, α + k + 1 , n + k + 1 k + 2 , β n + k + 2 ; 1 ] . F = "" sub 3 F sub 2 left [ stack { 1, %alpha + k + 1,-n+k+1 # k+2,-%beta-n+k+2 } ;1 right ] "."

(4)

Now substituting in for α and β gives

CDF ( k | n , s , y ) = 1 B ( s y + n k , y + k + 2 ) B ( y + 1 , s y + 1 ) B ( n k , k + 2 ) ( n + 1 ) F CDF( k "|" n, s , y ) = 1 - { B( s-y +n -k, y + k + 2) } over { B( y+1 , s-y+1 ) B( n - k, k + 2) (n+1)} F

(5)

F = 3 F 2 [ 1, y + k + 2 , n + k + 1 k + 2 , s + y n + k + 1 ; 1 ] . F = "" sub 3 F sub 2 left [ stack { 1, y + k + 2,-n+k+1 # k+2,-s+y-n+k+1 } ;1 right ] "."

(6)

Given that a majority vote is required, the CDF needs to be evaluated at n = 2k. We can use this to eliminate n from the CDF formula, giving

δ = CDF ( k | n = 2 k , y , s ) = 1 B ( s y + k , y + k + 2 ) B ( y + 1 , s y + 1 ) B ( k , k + 2 ) ( 2 k + 1 ) F %delta = CDF( k "|" n=2 k, y , s ) = 1 - { B( s-y +k, y + k + 2) } over { B( y+1 , s-y+1 ) B( k, k + 2) (2 k+1)} F

(7)

F = 3 F 2 [ 1, y + k + 2,1 k k + 2 , s + y k + 1 ; 1 ] . F = "" sub 3 F sub 2 left [ stack { 1, y + k + 2,1-k # k+2,-s+y-k+1 } ;1 right ] "."

(8)

The generalized hypergeometric function F may be expanded out as a sum,

F = i = 0 k ( y + k + 2 ) i ( 1 k ) i ( k + 2 ) i ( s + y k + 1 ) i F = sum from i=0 to k { { (y+k+2) sub i (1-k) sub i } over { (k+2) sub i (-s+y-k+1) sub i } }

(9)

where (a)i is the rising factorial or Pochhammer symbol, (a)0 = 1, (a)i = a(a+1)(a+2)...(a+i-1). The summation series terminates at the first zero term.

Next eliminate the Beta functions, giving the final “closed form” solution,

δ = 1 ( s y + k 1 ) ! ( y + k + 1 ) ! ( s + 1 ) ! ( 2 k ) ! ( s + 2 k + 1 ) ! y ! ( s y ) ! ( k 1 ) ! ( k + 1 ) ! i = 0 k ( y + k + 2 ) i ( 1 k ) i ( k + 2 ) i ( s + y k + 1 ) i . %delta = 1 - { (s-y+k-1)! (y+k+1)! (s+1)! (2k)! } over { (s+2k+1)! y! (s-y)! (k-1)! (k+1)! } sum from i=0 to k { { (y+k+2) sub i (1-k) sub i } over { (k+2) sub i (-s+y-k+1) sub i } } "."

(10)

The number of terms in the sum is k+1. Summing a series which has as many terms as half the general population may not be realistic. Another approach is required for practical computations regarding large populations.

This computation was suggested to me by David Chaum, who wanted to know if a closed-form solution existed. See rsvoting.org for more information.


Categories Voting, Probability

← Older Newer →