04 October 2008

Library vs. Bibliothèque vs. Bibliotheek vs. Bibliothek in Canada, Belgium and Switzerland

Today, a comparison involving multilingual countries and their Google searches for certain words. Let's begin with Canada:

Library totally dominated here although its popularity declined somewhat through time. In Canada, approximately 75% of the population speaks English and 25% French. When doing a Google Insights for Search test for the same word in both languages, limited geographically to internet users from Canada, one would expect to see proportional Google-popularity. For instance, we would expect to find library and bibliothèque in a 75%/25% proportion. Instead, the actual overall proportion (2004-present) was 94%/6%. Odd! However, the second test with bookstore/bookshop and librairie came out with a perfect 75%/25% split. I would guess that this inconsistency might have something to do with a too limited data set for Google searches in Canada, rather than a cultural distinction—Québequois don't like libraries... not likely. Still, it is puzzling. Anyone have any other ideas to explain this? I did the same search for Belgium (my native country):

This time, bibliotheek was most popular by far. Now, Belgium has about 60% Dutch-speakers and 40% French-speakers. The proportions of local Google searches were as follows: bibliotheek vs. bibliothèque (library) 83%/17% and boekwinkel/boekhandel vs. librairie (bookstore) 64%/36%. I again obtained one expected and one unexpected result, just like for Canada. The library pair proportion was off but the bookstore one was "correct." I tried one more country, Switzerland (only the two main languages out of the four):

Bibliothek was at the top. The Helvetian Federation's population consists of about 65% German-speakers, 20% French-speakers, 5% Italian-speakers, 1% Romansh-speakers and the remainder speakers of unofficial, immigrants' languages, e.g., Serbo-Croatian. For the sake of determining the proportions when facing off German and French equivalents, I recalculated the proportion of those two languages as if they were the only ones: 75%/25%. Bibliothek vs. bibliothèque (library) produced 63%/37% which was not too far off. Buchhandlung vs. librairie (bookstore) yielded 56%/44%. All words displayed a diminishing trend. In conclusion, Switzerland showed a different "proportionality" pattern than Canada and Belgium (library face-off not too bad & bookstore face-off not OK vs. library face-off not OK & bookstore face-off OK). Looking at it another way, Canada and Switzerland saw a search-quantity decline, Belgium didn't.

Update 10-6-08: I took the advice given to me on the LanguageHat blog and fixed the searches a bit. First, Canada:

No significant difference could be observed, except for a slightly higher occurrence of bibliothèque/bibliotheque. Next, Belgium:

Here the increase caused by adding the non-accented bibliotheque spellng was more significant: the library duo yielded an 81%/19% but this was still not close to the proportion of language speakers. Finally, Switzerland's new graph was as follows:

Both Buchhandlung/Buchladen and bibliothèque/bibliotheque grew. Bibliothek vs. bibliothèque/bibliotheque yielded a 64%/36% proportion, Buchhandlung/Buchladen" vs. librairie a 66%/34%. The first change doesn't seems meaningful to me; the second one does bring the numbers a bit closer to the language estmates. One final thing: the proportion of French-speakers in Canada is of course a matter of contention, I picked 25% as a reasonable number, esp. taking into consideration the "mixed" families.

03 October 2008

Democratic Party vs. Republican Party vs. Green Party vs. Libertarian Party vs. Constitution Party (Part 3)

This is the last part of a three-part series of posts (part 1, part 2). We explained the method used for this experiment, i.e., analyzing "toss-up" states in the US presidential election by comparing Google Insights for Search and Pollster.com graphs in part 1. We continue with Ohio:

For the first couple of months of this year, the Democratic Party (DP) was ahead in Google-popularity in this state while McCain was ahead in the regression trend line of the polls. For the rest of the year, the DP and Republican Party (RP) lines related to one another in the same way as the Obama and McCain trend lines: first the DP/Obama in the lead, then the RP/McCain. However, in September, Obama again gained the upper hand. The RP stayed in the lead though. I wonder if there isn't some kind of delayed effect in these case, or maybe the high-sensitivity trend lines for the candidates overreacted to recent polls? Now we turn our attention to Pennsylvania.

This state too was kind of puzzling: the parties' graph showed a DP advantage for most of the year except in September when the RP overtook them, only to end in a tie. On the candidates' graph, McCain was ahead till Obama overtook him in April; the latter then put a lot of distance between him and his opponent after some tightening in the beginning of September. This seems related to the Republican-convention a.k.a. Palin-announcement bounce for McCain and the current financial-crisis bounce for Obama (see also, for instance, Ohio above). Next is Virginia:

Similarly to Pennsylvania, the DP had the edge till the RP bested them in September. McCain however was ahead but lost more and more till the race turned into a dead heat in June. In September, the same Palin bounce and the financial-crisis bounces gave the advantage to Obama. Finally, I looked into Missouri, a state that has gone in and out of the "toss-up" category quite a few times. It was in McCain's camp when I started this series of posts but is again in contention.

The DP dominated till the RP took over in September. McCain was in the lead the whole year but it's now a tie. One more thing: Nader, Barr and McKinney are an afterthought in the polls. This is partially due to them often not being included in polls in the first place and, when they are, they don't appear in enough of them to survive Pollster.com's high-sensitivity smoothing filter. To be fair, let me give you briefly what their poll standings were today when using normal smoothing, as well as their parties' Google-popularity for 2008 so far (shorter period didn't yield good data):

I quickly tested the correlation between the two measures by producing scatter graphs with a linear regression trend line added (without the Constitution Party as I had no poll data for their candidate).

Quick conclusion: it looked like there was no correlation, in other words the Google-popularity of third parties couldn't be used to try to predict the third-party candidates Pollster.com scores or vice versa. Which probably means that we didn't miss out by not having them on all but one state poll graph after all.

02 October 2008

Sarah Palin vs. Joe Biden vs. Glen Ifill

I'm afraid that part 3 of the Democratic Party vs. Republican Party vs. Green Party vs. Libertarian Party vs. Constitution Party series of posts will have to wait till tomorrow. Tonight I watched the vice presidential debate at Washington University in St. Louis and of course I want to present you with my word analysis again (see previous posts on the first presidential debate and the convention speeches). I found the transcript and separated out the words spoken by Palin, Biden and moderator Ifill. Let's start with the bubble graph in which I summarize how many words, sentences and characters (excl. spaces) each speaker used:

Ifill didn't use a lot of words which makes sense as she was the moderator. However, her words were notably longer even though she spoke in short sentences. This seems a bit odd at first. The explanation lies in her word choice as moderator: her most common words were "governor" and "senator" (as we'll see below). Palin and Biden were very similar in style though Palin did use longer sentences. I'm not sure what that could mean.

Next, using Wordle, I again made a "word cloud" of each of the debate contributions.

Sarah Palin

Joe Biden

Glen Ifill

Peculiarly, Palin's most common word was "also." My impression was that at times she sort of went on and on, just listing talking points, which then maybe were loosely connected by the word "also"... Then came "going," maybe due to her more colloquial style? Also frequent was "John" or "McCain," in other words she was praising and invoking him a lot. Biden's most common word was "John" followed by "McCain," highlighting his not addressing Palin as much as McCain as well as his familiarity with his fellow senator. Then came "going." Maybe he was more colloquial in his style than I realized? Or maybe in both cases they were talking about their team's plans and therefore looking at the near future? Biden also used "Barack" and "Obama" many times, it seems to me when he was stressing their team effort. Finally, Ifill's most common word was "governor" (sometimes abbreviated as "gov." in the transcript: this should be added to the full term's number actually). She did use it more than "senator" or "sen." (no. 2), maybe pointing at the fact that she was addressing Palin more than Biden?

Update 11-4-08: For Palin's use of "also," see the post on the Language Log blog. For the transcript I used, see CNN. "Maverick" really only appears 15 times in it: 9 times for Biden and 6 times for Palin—hard to believe, isn't it?

Update 10-5-08: As I explain in this new post, Wordle has a lot more settings than I was aware of. Here are the improved, "corrected" word clouds.

Sarah Palin (improved)

Joe Biden (improved)

Glen Ifill (improved)

01 October 2008

Democratic Party vs. Republican Party vs. Green Party vs. Libertarian Party vs. Constitution Party (Part 2)

We explained the method used for this experiment, i.e., analyzing "toss-up" states in the US presidential election by comparing Google Insights for Search and Pollster.com graphs, in Part 1. So let's get started right away with Minnesota:

We again obtained contrasting graphs. The Republican Party (RP) was overall more Google-popular than the Democratic Party (DP). In the Pollster.com graph, however, Obama was ahead the whole time. Also, McCain's trend line followed a similar path as the RP's line while the Obama trend line was almost an inversion of the DP line. Hmm... Next is North Carolina:

This was the first state we encountered that had an actual trend line for a third-party candidate, i.e., Barr, because there were enough opinion polls that included him so as to still show up in Pollster.com's high-sensitivity graph. It is often said that he might take votes away from McCain. True, McCain lost his lead at the end of September but Barr's low trend line, if anything, was diminishing somewhat through time. In this state too we found a contrast between the parties' graph and the candidates' graph: The DP was ahead but was bypassed by the RP at the end of August. Let's look at New Hampshire:

The Google-popularity graph suffered from the lack of enough data: did New Hampshirites not surf the web as much as the inhabitants of other states? Maybe they were "politicked out" after their prominent role during the primaries? For what it's worth, the DP was the leader till mid-September when the RP passed them by. The candidates' trend line graph did confirm the overall picture with McCain temporarily overtaking Obama in mid-September. Remember that the Pollster.com graphs give a continuous trend line while the Google Insights graphs show basically monthly averages connected by a line. This explained in a way Obama having already taken back the lead in the end while the RP was still ahead. I give you one more toss-up state today: Nevada.

In this state also the general picture was the same in both graphs. The DP/Obama was ahead for most of the year but by September the RP/McCain was trumping them narrowly. I will post the last part tomorrow.

30 September 2008

Democratic Party vs. Republican Party vs. Green Party vs. Libertarian Party vs. Constitution Party (Part 1)

A grand face-off of the five major political parties of the US! Since 2004, their Google-popularity has evolved as follows:

Generally speaking, the Democratic Party (DP) narrowly bested the Republican Party (RP) though at pivotal moments the latter succeeded in rising above the former, e.g., the 2004 presidential elections (peak in October) and at the very end as we are again approaching presidential elections. Note the surge at the occasion of the 2006 elections for Congress. The Green Party was the leader among the so-called "third parties." When we look at the electoral results, these trend lines do seem to run in parallel: George Bush narrowly won re-election in 2004 and the DP became the majority party in Congress in 2006. I wonder whether the recent uptick in interest in the RP will turn out to be as consequential come election day (November 4). Of course, 2004 showed us that October is the pivotal month...

How about we perform an experiment and look at the interest in political parties in the states that right now are judged to be competitive, a.k.a. "toss-ups," starting in January? Then, we could compare the graphs with Pollster.com's presidential polls' state regression-trend-line graphs. On Pollster.com, I used the more sensitive smoothing setting and besides Obama and McCain added the third-party candidates when poll data were available: Ralph Nader (ex-Green Party, now independent), Cynthia McKinney (ex-DP, now Green Party) and Bob Barr (ex-RP, now Libertarian Party) . I realize I have to a certain extent compared apples and oranges here: parties (Google Insights) vs. candidates (Pollster.com). However, when I attempted to rectify this by adding the candidates to the Google Insights terms, this did not work: Obama and McCain's overwhelming media presence just distorted everything. In a way, I propose that searches for the parties might be a better indicator of the real, underlying political support for the candidates. It is of course also possible that the interest in the parties reflects rather on the congressional races. Anyway, let's see what the graphs told us, starting with Colorado:

The DP/Obama was ahead of the RP/McCain in general. September did see a tightening on Pollster.com but an actual overtaking of the DP by the RP in Google Insights for Search. Next is Florida:

The RP overtook the dominant DP in August (Google Insights) while Pollster.com showed McCain on top with only a tightening of the race in August. When we got to the present though, Pollster.com does point at a tie. So we had conflicting evidence here. The next "toss-up" state I tested was Indiana:

The DP was the most Google-popular through June but by September the RP had taken the lead. The opinion poll trend line however showed McCain ahead the whole time with some temporary tightening only in September. This state too offers a contrast between the two graphs.

Note that so far the "third parties" have not shown up yet in the Pollster.com graphs because they didn't cross the threshold to allow for a regression analysis. Tomorrow, I will continue this experiment with other "toss-up" states.

Toyota vs. General Motors vs. Volkswagen vs. Ford vs. Honda

I faced off the largest car manufacturers in the world this time. These would be in order of number of cars produced (2007):
  1. Toyota (incl. Lexus + Scion + Daihatsu + Isuzu)
  2. General Motors (incl. Buick + Cadillac + Chevrolet + Daewoo + GMC + Holden + Hummer + Opel + Pontiac + Saab + Saturn + Vauxhall + Wuling + Oldsmobile)
  3. Volkswagen (incl. Audi + Bentley + Bugatti + Škoda + SEAT + Scania + MAN)
  4. Ford (incl. Lincoln + Mercury + Aston Martin + Volvo + Mazda + Jaguar + Land Rover)
  5. Honda

I let Google Insights for Search compare these sets of names. So what was the result?

The ranking so to speak in Google -popularity worldwide was different from the production-numbers list above:
  1. Ford
  2. General Motors
  3. Honda
  4. Toyota
  5. Volkswagen

The discrepancy was most marked for Toyota and Ford that switched places. Actually, Ford was ahead of everyone else. The latter were constantly jostling for position in a narrow band. Most everyone declined through the years except for Volkswagen and Honda. Let us look at some country-specific graphs, beginning with the US:

There was a bit more seasonality here. Otherwise, the only major difference lay in Volkswagen's poor performance and the fact that Honda was the only car company to increase. Note however that "less than 25% of searches containing 'volkswagen+audi…' are categorized as Automotive." Still, I couldn't remove the Automotive filter because a few of the terms had significant non-automobile meanings, e.g., SEAT = seat. What about Germany?

Locally-headquartered Volkswagen was tops, then followed Ford and General Motors. Honda and esp. Toyota were stuck at the bottom. Honda and Volkswagen were the only ones to grow. Next, the United Kingdom:

Ford was solidly in the lead, followed by Volkswagen. All companies declined. Let's make one more graph, this time using data from Japanese (English-savvy) web surfers:

Honda and Toyota were clear leaders on their home turf. Volkswagen and General Motors were struggling. Still, General Motors slightly grew overall. Honda and Ford saw the biggest improvement.

28 September 2008

Nancy Pelosi vs. Henry Paulson vs. Barney Frank vs. Ben Bernanke vs. Harry Reid vs. Mitch McConnell vs. John Boehner vs. Spencer Bachus et al.

Today, I investigated the major players in the negotiations leading to the proposed "mother of all bailouts" a.k.a. the Emergency Economic Stabilization Act of 2008. First, I faced off Treasury Secretary Paulson, Federal Reserve Chairman Bernanke and the major Democratic leaders involved: House Speaker Pelosi, Senate Majority Leader Reid and House Financial Services Committee Chair Frank.

I limited the graph to the year 2008. Pelosi was the most powerful: constantly-high Google-popularity. However, towards the end, the financial crisis pushed Frank and esp. Paulson to the forefront, overtaking her. In case there were any doubt, the House of Representatives weighed heavier than the Senate: just compare Pelosi and Reid. Let's have a look at what a face-off with the major Republican leaders (House Minority Leader Boehner, Senate Minority Leader McConnell, House Financial Services Committee Ranking Member Baucus) yielded:

Paulson and Bernanke dominated the Republicans, esp. at the end. Considering the results of the first face-off, the latter were pretty much bystanders, at least according to their Google-popularity... Why didn't I look into President Bush and the two major presidential candidates? There was that high-profile meeting after all? Actually, I did:

As you can see, they were the elephants in the (negotiating) room, esp. Obama and McCain. Anyway, now comes the hard part: getting majorities in the House and the Senate to sign off on the bailout...