12 November 2008

Flu vs. Asthma vs. Pneumonia vs. Bronchitis vs. Sinusitis

Today, I heard on NPR about a new web site: Google Flu Trends. It lets you see where there is more search volume for flu in the US so you can track flu activity by state with comparison for previous years. In other words, this is a Google Insights for Search analysis presented with a more user-friendly interface. It made me think: maybe I should face-off five common diseases?

Worldwide, the flu virus dominated, followed by resp. asthma, pneumonia, bronchitis and sinusitis. The large peaks in October 2004 and October 2005 should correspond with serious flu epidemics. The seasonal aspect of the flu is well known: it occurs in winter. Pneumonia and bronchitis also showed small increases in winter. I proceeded to perform the same analysis for nine English-speaking countries/territories: Australia, Canada, Hong Kong, India, Ireland, New Zealand, South Africa, the UK and the US.

The winter of 2005-2006 was a significant peak for the flu in all these countries, other peaks displayed more variation. Some countries, e.g., Ireland and New Zealand, really only had one flu epidemic while for instance Canada and the US had a flu peak every winter. In India, the diseases' trend lines fell through time, perhaps reflecting better prevention and/or treatment through the years?

There is another way of looking at the data: by disease.

Note that the Philippines were no. 1 three times. Germany, Austria and the Netherlands popped up in the top 10 for bronchitis because the term was exactly the same in German and Dutch, unlike for other diseases.

11 November 2008

Broadband penetration vs. Obama vote

I was reading how the Obama campaign talked about the "net roots." How about we plot internet broadband penetration vs. Obama's percentage of the vote and well by state:

I added a perfect fit line that shows where the states would be if the two measures were perfectly correlated. The red linear trend line showed what the states actually turned to cluster around: it was not parallel, i.e., only a so-so correlation. I know that I probably should have used normalized numbers for both measures but I ran out of time. Still, I think this was very intriguing: it did seem to correlate not too badly. I identified the outliers on the Democratic side: DC (always heavily Democratic), Vermont (Socialist senator!) and Hawaii (Obama's childhood home state); on the Republican side: Alaska (Palin's home state), Utah (heavily Mormon) and Wyoming (Cheney's home state). These were expected. Look then for the states that displayed the best correlation between broadband penetration and Obama vote (from left to right): West Virginia, Mississippi, South Carolina, Montana, Missouri—still not officially settled!—, North Carolina, Ohio, Minnesota, Virginia, Oregon and New Jersey. They were a mixture of "toss-up" states (MO, NC, OH, VA) and "blue" (MN, OR, NJ) or "red" states (WV, MS, SC, MT). Note that the blue states were at the high end of the scale (to the right), the red ones at the bottom (to the left) and the toss-ups in the centre. Delaware (Biden's home state), Illinois (Obama's other home state) and Arizona (McCain's home state) ended up on their partisan side but not in an outspoken way.

I tried a second approach using the Obama minus McCain percentage of the vote:

The perfect fit line ran parallel to the blue trend line: a good correlation this time around. Overall, the distribution of the states was very similar to the previous graph. Conclusion: the higher the broadband penetration, the higher the Democratic vote seemed to be. I guess that does confirm some of the analyses of the election I read. Nevertheless, I have to caution to read too much in these graphs, the more since my grasp of this type of statistics is a bit shaky...

By the way, a "classic" face-off of broadband vs. Obama yielded this interesting result:

Broadband has been popular with Googlers for a long time. Obama was more a recent phenomenon that ended up overtaking broadband in aggregate.

Update 11-14-08: My statistical skills were indeed lacking. Look at the comments for corrections and the like.

09 November 2008

NSF vs. NEA vs. USAID vs. NEH vs. IMLS

I faced off five grant-making US government agencies: the National Science Foundation (NSF), the National Endowment for the Arts (NEA), the Agency for International Development (USAID), the National Endowment for the Humanities (NEH) and the Institute of Museum and Library Services (IMLS).

The NSF was clearly in the lead, followed by the NEA and USAID. IMLS was at the bottom of Google-popularity. However, which one was decreasing or increasing its search volume? That was hard to see. I downloaded the data and plugged them into MS Excel. I made a graph of the cumulative weekly Google searches:

The lines were pretty smooth, in other words, none of the agencies saw extreme differences in search volume from one week to another. How about a graph showing the trend lines of the change percentage of the data of weekly Google searches?

Now I was getting somewhere: obviously, the IMLS experienced the highest increase on average, then the NSF and USAID. The other two showed a decline on average, with a small decline for the NEH and a larger one for the NEA. Getting back to Google Insights for Search, I had a look at how the searches were spread geographically. First, the NSF:

The top 3 states were New Mexico, District of Columbia and Maryland. Nevadans stood out for searching for the NSF the least. Next, the NEA:

The leader was here Arkansas! Not the state I would have most linked with the arts if you'd asked me. There must be an explanation for this: any suggestions? No. 2 was Alaska—also unexpected—followed by Vermont. Let's move on to USAID.

DC led big time as was expected, followed by Maryland and Virginia which also have a high concentration of federal government people concerned with international issues. North and South Dakota, Wyoming and West Virginia had no searches for USAID at all. How did the NEH fare?

DC, Rhode Island and Massachusetts were the most interested. The "zero-searchers" of USAID were joined by Nevada, Idaho and Montana. Finally, how did the distribution pattern of the IMLS look?

Strangely enough, Idaho was the clear leader! It was followed at some distance by DC and Maryland.