on
Prediction for the 2022 House Midterms
Introduction
In this week’s blog, I present my final prediction model for the 2022 House Midterms. My final prediction model is a pooled linear regression model using data of almost all 435 districts from 1950-2020 to then predict for 2022 at the district level. I predict that the Democrats will win 216 seats, while the Republicans will win 219 seats, resulting in a Republican victory in the House of Representatives.
The Model
The dependent variable is Democratic Two-Party Vote Share (%) at the district level.
The independent variables are as follows: 1. Generic Ballot: the average generic ballot score for Democrats (filtered for 52 days before the election) 2. GDP % Difference: The percent difference in GDP from Q6 to Q7 3. Dem Incumbent in District: whether the incumbent in the district at the time of the election is a Democrat (1) or not (0) 4. Dem Party is in Power in House: whether the party controlling the house at the time of the election is the Democratic Party (1) or not 5. Dem President is in Power: whether the president at the time of the election is a Democrat (1) or not (0) 6. Previous Dem Two Party Vote Share: the previous Democratic two-party vote share (%) in the district 7. Expert Ratings: the average of Cook Political Report, Inside Elections, and Sabato’s Crystal Ball ratings for the district (1 - solid Democrat → 7 - solid Republican)
Using this pooled linear regression, I then predicted for the Democratic two-party vote share at the district level. Then, I aggregated the data so that any districts with less than or equal to 50% for Democratic two-party vote share would be considered a Republican seat, while any districts with more than 50% for Democratic two-party vote share would be considered a Democratic seat. This allowed me to predict the Democratic Party’s seat share at the national level.
Model Justification from Previous Literature
For the generic ballot, I drew upon Gelman and King (1993) who found that the number of days left until the election matters for the accuracy of the polls. They state, “In most years, early public opinion polls give fairly miserable forecasts of the actual election outcome… Additionally, in virtually every presidential election in the last forty years, the polls converge to a point near the actual election outcome shortly before election day” (Gelman & King, 1993). Thus, I decided to filter the generic ballot polling data for only polls 52 days away from or closer to the election because the closer the poll is to the actual election day, the more likely it is to be closer to the actual election result. The 52 day cutoff was chosen because I was limited by the data; the closest poll to election day in 1952 was 51.5 days.
For GDP, I drew upon Achen and Bartels (2017) who analyzed the economic model of voting behavior of presidential races. They found that, “it is possible to account for recent presidential election outcomes with a fair degree of precision solely on the basis of how long the incumbent party had been in power and how much real income growth voters experienced in the six months leading up to Election Day.” This led them to argue that this evidence challenges the idea that citizens are rational retrospective voters and instead argued that, “while they vote on the basis of how they feel at the moment, they forget or ignore how they felt over the course of the incumbents’ term in office” (Achen & Bartels, 2017). Thus, I decided to use the GDP percent difference from Q6 to Q7 to account for this behavior in voters. I chose GDP over RDI because in my exploration, I found that GDP was better at explaining House elections, whereas RDI Was better at explaining presidential elections.
For the incumbency variables, I wanted to text Adam Brown (2014)’s research. House of Representatives incumbents in the United States enjoy electoral and structural advantages such as media coverage and campaign finance access. However, Adam Brown points out that “existing research has not asked whether individual voters actually prefer incumbents over newcomers, other things being equal.” Thus, he ran a randomized survey experiment to test this and found that “voters respond only minimally—if at all—to incumbency status once the structural advantages are held constant” (Brown, 2014). This is perhaps contrary to many people’s initial thoughts, especially when it comes to the commonly-held notion that voters like their local congressperson just because they are their local congressperson.
For previous Democratic vote share, I used this variable based on research of voter behavior and party identification. For the expert prediction, I used the average of these three organizations because I had data for all the districts and they all used the 7-point scale where 1 is Solid Democrat, 2 is Lean Democrat, 3 is Tilt Democrat, 4 is Toss-Up, 5 is Tilt Republican, 6 is Lean Republican, and 7 is Solid Democrat.
Model Results
##
## Pooled Linear Regression Model of Democratic Two-Party Vote Share
## ========================================================================
## Dependent variable:
## -------------------------------
## Democratic Two-Party Vote Share
## ------------------------------------------------------------------------
## Generic Ballot for Democrats 0.282***
## (0.025)
##
## GDP % Difference from Q6-Q7 -0.119***
## (0.028)
##
## Democratic Incumbent in District 3.022***
## (0.422)
##
## Democratic Party is in Power in House -1.939***
## (0.245)
##
## Democratic President is in Power -3.495***
## (0.226)
##
## Previous Democratic Two-Party Vote Share 0.328***
## (0.006)
##
## Average Expert Ratings -4.442***
## (0.066)
##
## Constant 40.832***
## (1.291)
##
## ------------------------------------------------------------------------
## Observations 14,436
## R2 0.720
## Adjusted R2 0.720
## Residual Std. Error 12.907 (df = 14428)
## F Statistic 5,312.176*** (df = 7; 14428)
## ========================================================================
## Note: *p<0.1; **p<0.05; ***p<0.01
Interpretation of Coefficients
All of the coefficients are statistically significant at the 99% confidence level.
- Generic ballot: The coefficient is 0.282, which means that for every percent increase in the generic ballot, there is a 0.282% point increase in the Democratic two-party vote share.
- GDP % Difference from Q6-Q7: The coefficient is -0.119, which means that for every percent increase in the GDP difference from Q6 to Q7, there is a 0.119% point decrease in the Democratic two-party vote share. At first glance, this might seem interesting because one might assume that a better-performing economy would benefit Democrats. However, the economy and if it does well is often attributed to Republicans.
- Democratic Incumbent in District: The coefficient is 3.022, which means that if the incumbent in the district is a Democrat, the Democratic two-party vote share is 3.022% points greater than if the incumbent is NOT a Democrat. This makes sense because incumbents tend to win their elections compared to non-incumbents and contributes to the commonly-held belief that incumbency can explain most of the elections.
- Democratic Party is in Power in House: The coefficient is -1.939, which means that if the party controlling the House at the time of the election is the Democratic party, the Democratic two-party vote share is 1.939% points less than if the Democratic party was NOT controlling the House. This was the most interesting incumbency variable because the other two (district and President) followed commonly-held notions. This perhaps argues that it is better for democrats at the district level if the Democrats are not controlling the House.
- Democratic President is in Power: The coefficient is -3.495, which means that if the president is a Democrat at the time of the election, the Democratic two-party vote share is 3.495% points less than if the president was NOT a Democrat. This contributes to the long-held notion that the President’s party is punished during midterm elections.
- Previous Democratic Two-Party Vote Share: The coefficient is 0.328, which means that every percent increase in previous Democratic two-party vote share is associated with a 0.328% points increase in the Democratic two-party vote share for the current election.
- Expert Ratings : The coefficient is -4.442, which means that as the district becomes more Republican (7 is solid Republican), this is associated with a 4.442% points decrease in the Democratic two-party vote share.
Model Validation
The adjusted R-squared of the model is 0.720 which indicates that the independent variables explain 72% of the variation. Considering that some of my earlier blogs had negative adjusted R-squared values, I am relatively at peace with this adjusted R-squared. The adjusted R-squared is the in-sample testing of the model’s performance that I have chosen to do.
I tried to include leave one out cross validation as the out-of-sample testing, but because my linear regression model was a pooled model with over 15,000 observations, it takes way too long to run the LOOCV and it crashes my laptop every time. This also was the same for my attempts at other forms of cross-validation. Thus, I unfortunately can only rely on in-sample testing of my model’s performance.
Predictions for the 2022 House Midterms
Using my Democratic two-party vote share model at the district level, I was also able to predict the Democratic seat share at the national level by assigning districts with less than or equal to 50% as Republican seats and more than 50% as Democratic seats.
Thus, I predict that Democrats will win 216 seats, while the Republicans will win 219 seats, resulting in a Republican victory in the House of Representatives.
Here is a map of the predicted 2022 Democratic vote shares by districts. The districts that are more red have a lower Democratic vote share, thus I predict them to be Republican seats. The districts that are more blue have a higher Democratic vote share, thus I predict them to be Democratic seats. The districts that are more white are more toss-up seats.
Predictive Interval of Predictions
I created a predictive interval of my predictions at the 95% confidence interval.
At the lower bound, I predict that Democrats will win 213 seats, while the Republicans will win 222 seats, resulting in a larger Republican victory than my prediction.
At the upper bound, I predict that Democrats will win 218 seats, while the Republicans will win 217 seats, resulting in the narrowest of Democratic victory in the House.
Conclusion
This final model is a culmination of more than 8 weeks of learning and exploring. There was a lot of trial and error throughout this time, but I am overall happy with the work that I have put into this class and my final model. I look forward to seeing how my model holds up to the actual results on Tuesday, November 8. However, whether or not my model was accurate, I am still impressed that I was able to even create a model, especially considering where I started from in terms of skill and knowledge.
There are limitations with my model that range from data constrictions (doing at the district level is HARD) and statistical inference choices (was pooled model the best model?). Nonetheless, I look forward to unpacking these limitations in the reflective paper after Election Day.
Sources
Christopher H Achen and Larry M Bartels. Democracy for Realists: Why Elections Do Not Produce Responsive Government, volume 4. Princeton University Press, 2017.
Adam R. Brown. Voters Don’t Care Much About Incumbency. Journal of Experimental Political Science, 1(2):132–143, 2014.
Andrew Gelman and Gary King. Why are American presidential election campaign polls so variable when votes are so predictable? British Journal of Political Science, 23(4): 409–451, 1993.