07 October 2008

Barack Obama vs. John McCain vs. Audience vs. Tom Brokaw

The second US presidential debate took place tonight at Belmont University in Nashville, TN. I followed the same methodology as with the first debate and the vice presidential debate, amended regarding Wordle by my 10-5-08 post. So let's start with the bubble graph displaying length of words, sentences as well as number of words:



Obama and McCain were again close though the latter used a bit shorter sentences. Remarkably, the audience questions contained the longest words, more even than Brokaw's. The moderator also spoke about twice as much as his colleague Jim Lehrer did in the first debate. While splitting up the text in the different speakers' parts, something caught my eye: Brokaw seemed to thank McCain a lot but not Obama. Just to make sure, I counted all speakers' instances of thanking somebody:



What stood out on this measure? McCain made it a point to thank the audience speakers and secondly thanked Brokaw quite a few times. He even thanked Obama 1.5 times—I counted a sarcastic "thank you" only half. Maybe McCain's expressions of gratitude reflected his being more formal and/or older? Or maybe he was trying to counter his rather stern if not grumpy behavior during the first debate? Obama on the other hand didn't thank many people. He thanked two audience questioners but was mostly business. He didn't thank McCain once. Maybe that was a reaction to criticism that he was bending over backwards to him too much during the first debate? Brokaw thanked McCain a lot more than Obama but neglected the audience too.

Finally, we made the "word clouds" again.

Barack Obama





John McCain





Audience





Tom Brokaw



I'll conclude with a few quick observations. Obama's no. 1 word tonight was "going," followed by "got," "Sen." and "McCain." Policywise, "health" and "energy" stood out. As for McCain, "know" was tops—invocations of experience? Then came "going," "Obama" and esp. "America(n)(s)." The audience very much stressed "economic"—remember "It's the economy, stupid!" from Clinton's first presidential campaign? As expected, Brokaw said "Sen." most, followed by "McCain" and then "Obama."

Movie vs. Film vs. Flick vs. "Motion Picture" vs. "Moving Picture"

How about a face-off five different terms for a movie?



As expected, movie was most common, followed at quite a distance by film. The other three basically didn't figure in the graph. However, I think we can try to clarify this further: instead of the interest level, let's select the "growth relative to category" (i.e., entertainment) option.



Now we can see growth and decline of all five Google search terms/phrases. Growing from most to least were flick and motion picture. Film fluctuated but stayed at the same level almost. Going down were, from least to most, movie, motion picture and moving picture. I wonder whether these observations correspond with he linguistic evolution regarding which of the words gain and which lose "market share" in the standard vocabulary.

05 October 2008

Wordle

I realize now that I didn't use the superb Wordle online text analysis website properly: my bad! Of course, one should always read the instructions completely and go over the different settings before using any software. The default graph shows only 150 words but this can be changed. Also, one can, for instance, have the words arranged more or less alphabetically: this facilitates finding specific words. Here are the complete settings I'll be using from now on:
  • language:
remove nos.
remove common English words
  • font:
Teen
  • layout:
max. words: word count rounded up to the next thousand
prefer alphabetical order
rounder edges
horizontal
  • color:
Wordly
a little variance


The following three posts have now been amended with the improved "word clouds":
  • Sarah Palin vs. Joe Biden vs. Glen Ifill (10-2-08)
  • Barack Obama vs. John McCain vs. Jim Lehrer (9-26-08)
  • Michelle Obama vs. Joe Biden vs. Barack Obama vs. Sarah Palin vs. Cindy McCain vs. John McCain (9-4-08)

04 October 2008

Library vs. Bibliothèque vs. Bibliotheek vs. Bibliothek in Canada, Belgium and Switzerland

Today, a comparison involving multilingual countries and their Google searches for certain words. Let's begin with Canada:



Library totally dominated here although its popularity declined somewhat through time. In Canada, approximately 75% of the population speaks English and 25% French. When doing a Google Insights for Search test for the same word in both languages, limited geographically to internet users from Canada, one would expect to see proportional Google-popularity. For instance, we would expect to find library and bibliothèque in a 75%/25% proportion. Instead, the actual overall proportion (2004-present) was 94%/6%. Odd! However, the second test with bookstore/bookshop and librairie came out with a perfect 75%/25% split. I would guess that this inconsistency might have something to do with a too limited data set for Google searches in Canada, rather than a cultural distinction—Québequois don't like libraries... not likely. Still, it is puzzling. Anyone have any other ideas to explain this? I did the same search for Belgium (my native country):



This time, bibliotheek was most popular by far. Now, Belgium has about 60% Dutch-speakers and 40% French-speakers. The proportions of local Google searches were as follows: bibliotheek vs. bibliothèque (library) 83%/17% and boekwinkel/boekhandel vs. librairie (bookstore) 64%/36%. I again obtained one expected and one unexpected result, just like for Canada. The library pair proportion was off but the bookstore one was "correct." I tried one more country, Switzerland (only the two main languages out of the four):



Bibliothek was at the top. The Helvetian Federation's population consists of about 65% German-speakers, 20% French-speakers, 5% Italian-speakers, 1% Romansh-speakers and the remainder speakers of unofficial, immigrants' languages, e.g., Serbo-Croatian. For the sake of determining the proportions when facing off German and French equivalents, I recalculated the proportion of those two languages as if they were the only ones: 75%/25%. Bibliothek vs. bibliothèque (library) produced 63%/37% which was not too far off. Buchhandlung vs. librairie (bookstore) yielded 56%/44%. All words displayed a diminishing trend. In conclusion, Switzerland showed a different "proportionality" pattern than Canada and Belgium (library face-off not too bad & bookstore face-off not OK vs. library face-off not OK & bookstore face-off OK). Looking at it another way, Canada and Switzerland saw a search-quantity decline, Belgium didn't.

Update 10-6-08: I took the advice given to me on the LanguageHat blog and fixed the searches a bit. First, Canada:



No significant difference could be observed, except for a slightly higher occurrence of bibliothèque/bibliotheque. Next, Belgium:



Here the increase caused by adding the non-accented bibliotheque spellng was more significant: the library duo yielded an 81%/19% but this was still not close to the proportion of language speakers. Finally, Switzerland's new graph was as follows:



Both Buchhandlung/Buchladen and bibliothèque/bibliotheque grew. Bibliothek vs. bibliothèque/bibliotheque yielded a 64%/36% proportion, Buchhandlung/Buchladen" vs. librairie a 66%/34%. The first change doesn't seems meaningful to me; the second one does bring the numbers a bit closer to the language estmates. One final thing: the proportion of French-speakers in Canada is of course a matter of contention, I picked 25% as a reasonable number, esp. taking into consideration the "mixed" families.

03 October 2008

Democratic Party vs. Republican Party vs. Green Party vs. Libertarian Party vs. Constitution Party (Part 3)

This is the last part of a three-part series of posts (part 1, part 2). We explained the method used for this experiment, i.e., analyzing "toss-up" states in the US presidential election by comparing Google Insights for Search and Pollster.com graphs in part 1. We continue with Ohio:





For the first couple of months of this year, the Democratic Party (DP) was ahead in Google-popularity in this state while McCain was ahead in the regression trend line of the polls. For the rest of the year, the DP and Republican Party (RP) lines related to one another in the same way as the Obama and McCain trend lines: first the DP/Obama in the lead, then the RP/McCain. However, in September, Obama again gained the upper hand. The RP stayed in the lead though. I wonder if there isn't some kind of delayed effect in these case, or maybe the high-sensitivity trend lines for the candidates overreacted to recent polls? Now we turn our attention to Pennsylvania.





This state too was kind of puzzling: the parties' graph showed a DP advantage for most of the year except in September when the RP overtook them, only to end in a tie. On the candidates' graph, McCain was ahead till Obama overtook him in April; the latter then put a lot of distance between him and his opponent after some tightening in the beginning of September. This seems related to the Republican-convention a.k.a. Palin-announcement bounce for McCain and the current financial-crisis bounce for Obama (see also, for instance, Ohio above). Next is Virginia:





Similarly to Pennsylvania, the DP had the edge till the RP bested them in September. McCain however was ahead but lost more and more till the race turned into a dead heat in June. In September, the same Palin bounce and the financial-crisis bounces gave the advantage to Obama. Finally, I looked into Missouri, a state that has gone in and out of the "toss-up" category quite a few times. It was in McCain's camp when I started this series of posts but is again in contention.





The DP dominated till the RP took over in September. McCain was in the lead the whole year but it's now a tie. One more thing: Nader, Barr and McKinney are an afterthought in the polls. This is partially due to them often not being included in polls in the first place and, when they are, they don't appear in enough of them to survive Pollster.com's high-sensitivity smoothing filter. To be fair, let me give you briefly what their poll standings were today when using normal smoothing, as well as their parties' Google-popularity for 2008 so far (shorter period didn't yield good data):



I quickly tested the correlation between the two measures by producing scatter graphs with a linear regression trend line added (without the Constitution Party as I had no poll data for their candidate).





Quick conclusion: it looked like there was no correlation, in other words the Google-popularity of third parties couldn't be used to try to predict the third-party candidates Pollster.com scores or vice versa. Which probably means that we didn't miss out by not having them on all but one state poll graph after all.

02 October 2008

Sarah Palin vs. Joe Biden vs. Glen Ifill

I'm afraid that part 3 of the Democratic Party vs. Republican Party vs. Green Party vs. Libertarian Party vs. Constitution Party series of posts will have to wait till tomorrow. Tonight I watched the vice presidential debate at Washington University in St. Louis and of course I want to present you with my word analysis again (see previous posts on the first presidential debate and the convention speeches). I found the transcript and separated out the words spoken by Palin, Biden and moderator Ifill. Let's start with the bubble graph in which I summarize how many words, sentences and characters (excl. spaces) each speaker used:



Ifill didn't use a lot of words which makes sense as she was the moderator. However, her words were notably longer even though she spoke in short sentences. This seems a bit odd at first. The explanation lies in her word choice as moderator: her most common words were "governor" and "senator" (as we'll see below). Palin and Biden were very similar in style though Palin did use longer sentences. I'm not sure what that could mean.

Next, using Wordle, I again made a "word cloud" of each of the debate contributions.

Sarah Palin




Joe Biden




Glen Ifill


Peculiarly, Palin's most common word was "also." My impression was that at times she sort of went on and on, just listing talking points, which then maybe were loosely connected by the word "also"... Then came "going," maybe due to her more colloquial style? Also frequent was "John" or "McCain," in other words she was praising and invoking him a lot. Biden's most common word was "John" followed by "McCain," highlighting his not addressing Palin as much as McCain as well as his familiarity with his fellow senator. Then came "going." Maybe he was more colloquial in his style than I realized? Or maybe in both cases they were talking about their team's plans and therefore looking at the near future? Biden also used "Barack" and "Obama" many times, it seems to me when he was stressing their team effort. Finally, Ifill's most common word was "governor" (sometimes abbreviated as "gov." in the transcript: this should be added to the full term's number actually). She did use it more than "senator" or "sen." (no. 2), maybe pointing at the fact that she was addressing Palin more than Biden?


Update 11-4-08: For Palin's use of "also," see the post on the Language Log blog. For the transcript I used, see CNN. "Maverick" really only appears 15 times in it: 9 times for Biden and 6 times for Palin—hard to believe, isn't it?


Update 10-5-08: As I explain in this new post, Wordle has a lot more settings than I was aware of. Here are the improved, "corrected" word clouds.

Sarah Palin (improved)



Joe Biden (improved)



Glen Ifill (improved)

01 October 2008

Democratic Party vs. Republican Party vs. Green Party vs. Libertarian Party vs. Constitution Party (Part 2)

We explained the method used for this experiment, i.e., analyzing "toss-up" states in the US presidential election by comparing Google Insights for Search and Pollster.com graphs, in Part 1. So let's get started right away with Minnesota:





We again obtained contrasting graphs. The Republican Party (RP) was overall more Google-popular than the Democratic Party (DP). In the Pollster.com graph, however, Obama was ahead the whole time. Also, McCain's trend line followed a similar path as the RP's line while the Obama trend line was almost an inversion of the DP line. Hmm... Next is North Carolina:





This was the first state we encountered that had an actual trend line for a third-party candidate, i.e., Barr, because there were enough opinion polls that included him so as to still show up in Pollster.com's high-sensitivity graph. It is often said that he might take votes away from McCain. True, McCain lost his lead at the end of September but Barr's low trend line, if anything, was diminishing somewhat through time. In this state too we found a contrast between the parties' graph and the candidates' graph: The DP was ahead but was bypassed by the RP at the end of August. Let's look at New Hampshire:





The Google-popularity graph suffered from the lack of enough data: did New Hampshirites not surf the web as much as the inhabitants of other states? Maybe they were "politicked out" after their prominent role during the primaries? For what it's worth, the DP was the leader till mid-September when the RP passed them by. The candidates' trend line graph did confirm the overall picture with McCain temporarily overtaking Obama in mid-September. Remember that the Pollster.com graphs give a continuous trend line while the Google Insights graphs show basically monthly averages connected by a line. This explained in a way Obama having already taken back the lead in the end while the RP was still ahead. I give you one more toss-up state today: Nevada.





In this state also the general picture was the same in both graphs. The DP/Obama was ahead for most of the year but by September the RP/McCain was trumping them narrowly. I will post the last part tomorrow.