11 November 2008

Broadband penetration vs. Obama vote

I was reading how the Obama campaign talked about the "net roots." How about we plot internet broadband penetration vs. Obama's percentage of the vote and well by state:



I added a perfect fit line that shows where the states would be if the two measures were perfectly correlated. The red linear trend line showed what the states actually turned to cluster around: it was not parallel, i.e., only a so-so correlation. I know that I probably should have used normalized numbers for both measures but I ran out of time. Still, I think this was very intriguing: it did seem to correlate not too badly. I identified the outliers on the Democratic side: DC (always heavily Democratic), Vermont (Socialist senator!) and Hawaii (Obama's childhood home state); on the Republican side: Alaska (Palin's home state), Utah (heavily Mormon) and Wyoming (Cheney's home state). These were expected. Look then for the states that displayed the best correlation between broadband penetration and Obama vote (from left to right): West Virginia, Mississippi, South Carolina, Montana, Missouri—still not officially settled!—, North Carolina, Ohio, Minnesota, Virginia, Oregon and New Jersey. They were a mixture of "toss-up" states (MO, NC, OH, VA) and "blue" (MN, OR, NJ) or "red" states (WV, MS, SC, MT). Note that the blue states were at the high end of the scale (to the right), the red ones at the bottom (to the left) and the toss-ups in the centre. Delaware (Biden's home state), Illinois (Obama's other home state) and Arizona (McCain's home state) ended up on their partisan side but not in an outspoken way.

I tried a second approach using the Obama minus McCain percentage of the vote:



The perfect fit line ran parallel to the blue trend line: a good correlation this time around. Overall, the distribution of the states was very similar to the previous graph. Conclusion: the higher the broadband penetration, the higher the Democratic vote seemed to be. I guess that does confirm some of the analyses of the election I read. Nevertheless, I have to caution to read too much in these graphs, the more since my grasp of this type of statistics is a bit shaky...

By the way, a "classic" face-off of broadband vs. Obama yielded this interesting result:



Broadband has been popular with Googlers for a long time. Obama was more a recent phenomenon that ended up overtaking broadband in aggregate.

Update 11-14-08: My statistical skills were indeed lacking. Look at the comments for corrections and the like.

5 comments:

  1. hey, I found this post via your comment the Princeton site. I run a blog with a fair-sized US readership, and I've been comparing the state-by-state statistics via Google Analytics with the election margins. I figure my blog (about punk music) has an obvious - and strong - blue bias, but I haven't found information about broadband penetration until now, so hopefully I can adjust my figures for that.

    however, I had already noticed Alaska as a strange outlier - in fact, the only state with an above-average readership (against basic population figures) which voted McCain. Taking it out of the regression analysis changed the correlation coefficent from around 0.45 to 0.55. (see below_)

    if you're using excel, you should choose to display the R-squared value for your trendline. that's what tells how well your data correlates, not the slope of the line (which is irrelevant - 'perfect fit' isn't necessarily x=y, but y=ax+b where all y's fit that equation, or in other words sit on the trendline). R-squared tells you how far your data points are scattered away from the regression line, which is what you want to know, and how much your dependent variable (broadband penetration) explains your independent variable (vote/vote margin).

    ReplyDelete
  2. (followed the same link here)
    Even if the correlation proves real, there's always the chance that both are just correlated with a third variable. For example, I'm sure rural voters both favour McCain and are less likely to have broadband (and same thing for more heavily rural states). But of course perhaps that's what "net roots" is supposed to mean to some extent? I'm not really sure...

    ReplyDelete
  3. Thanks for the comments, here and via email (Sam Wang from Princeton Election Consortium). I'm looking into correcting/updating this post but I'm not sure when i'll get it done...

    ReplyDelete
  4. I can already respond to gabbagabbahey's suggestion: the 1st graph's (broadband penetration vs. Obama vote) trend line has an R-squared (coefficient of determination) value of 0.1505, the 2nd graph (broadband penetration vs. Obama margin) of 0.1538. As a perfect fit corresponds with an R-squared of 1.0 for a linear trend line, what we have here is anything but :-) So much for that simplistic theory! Of course, that doesn't mean broadband penetration might not be a factor, just not one that is by itself highly correlated to Obama's electoral results. I'm still looking into Dr. Wang's suggestions.

    ReplyDelete
  5. I'm surprised the R-squared was so low - but I think it has more to do with the distance of the outliers (that you've already identified and satisfactorily explained) than the strength of the underlying trend. I repeated the second graph (email me if you want to see my work!), removing DC, VT, WY, UT, AK and also adjusting HI and DE to their 2004 margins - that brought it up to 0.42, which isn't huge, but it's a good deal more significant (medium). as part of a statistical hatchet job on my part, though! - ID, OK both have similar margins for McCain as UT, WY, and AK but much lower broadband %.

    however, taking those remaining states as a 'normal' range of US politics ;), it's interesting that in a range of 33%-65% there is no red state with a broadband penetration of more than 55% (KS), or a blue state with less than 42% (IN - a swing state). you can see that on the graph here, although they aren't named.

    I'd agree with Elithrion that there are a number of ways of explaining the trend. 'correlation doesn't equal causation', so it could just reflect something else.

    thanks for pointing me to the basic figures, though, they've helped me with my own stats a good deal.

    ReplyDelete