Steve's Blog: statistics

Showing posts with label statistics. Show all posts

Monday, September 20, 2010

Habit 2 of Bad Thinking

I have to give a presentation on the "Seven Sins of Bad Thinking" and I mostly just talking about one on this blog: ignoring either costs or benefits. I'm going to add a second to the list today and it starts with a story.

Say you have 10 employees and you're adding an 11th. You have a couple different products that you produce and you need to put the 11th employee on one of those product teams. One of your VPs say to put them on product A because you'll get the biggest increase in production (in units) if you put the new guy on team A--and it's all about productivity. Another VP says to put the new guy on product B because product B brings in the biggest profit per unit--that's where the big money is. What do you do?

You say they both are only telling half the story. Your goal is to maximize profit and to do that you need to think about both how many units you make and how much you profit from each. The value of the employee is (marginal units)*(profit per unit) and you can't just think about one aspect or the other.

That might seem obvious but how often do we focus solely on what is possible or solely on what is certain when we should probably be thinking about what is more likely?

(There was an example to go with this from a discussion I had with a friend. But when I wrote it down it was a mess so I'll just post the short version.)

Thursday, September 16, 2010

Do critics or viewers better predict how good a movie is?

One of my friends said he likes to look up the Rotten Tomatoes rating for a movie to decide whether to see it. I suggested he look at the Flixster ratings instead because they'll tend to better predict whether he'll like the movie.

This is something we can test empirically, as long as there is a standard for what "like the movie" means. A sensible definition is that people like a movie in proportion of the average rating of the movie on Netflix.*

So I gathered data on the ten or eleven most popular movies over the past three years and regressed the Netflix ratings on both the Rottan Tomatoes ratings and the Flixster ratings. The results (in technical terms):

(1) Netflix = 0.006*RT + 3.47
R^2 = 0.3436

(2) Netflix = 0.0191*Flixster + 2.39
R^2 = 0.6893

Both predict the Netflix ratings decently, but the Flixster rating is easily the better predictor. It can explain about 69% of the spread (variance) in the Netflix scores compared with less than 35% for the RT ratings.

Here's a nice graph:

The conclusion is that you should never use RT when deciding whether to watch an old movie. You can just use Flixster or Netflix. On the other hand, when new movies are released the Flixster ratings are probably biased because only the bigger fans see movies on opening weekend. You'll have to wait a few weeks before you can get an accurate judgment from Flixster. So maybe Rotten Tomatoes has a use for very new releases.

* - Technical appendix: There are some biases to the Netflix ratings--Netflix customers aren't the "typical American" because they're more likely to be younger and more educated. But it's the only good available data, and with millions of ratings shouldn't be too biased. The other problem is that movies with a small number of ratings have a selection bias: only people who expect to like the movie will watch it so all the ratings are probably slightly inflated compared to what they would be if a random sample of people were forced to watch and rate the movies. But that isn't such a big problem since, if you're considering watching the movie, you're implicitly in the set (or close to being in the set) of people who would watch the given movie.

Wednesday, August 4, 2010

Descriptive and Normative

I will never understand why the difference between descriptive statements and normative statements is so hard for people to understand.

David Berri once wrote a post on The Wages of Wins blog about who was widely considered an "MVP-candidate" but wasn't actually playing well. He was obviously making the normative statement "if you're not playing well, you don't deserve the MVP award. These guys aren't playing well [... you infer the rest]." But commenters objected, writing that Dave shouldn't title "Who is not the MVP" if he wasn't making a descriptive point about who is not the actual MVP. Yet we know that is a disingenuous point because if you take the title "Who is not the MVP" literally when the MVP won't be award for 5 months, then you expect the post to include every name in the NBA.

Still, you might be willing to cut the commentator some slack since writers often try to pass off normative statements for facts. Nicholas Krisftof wrote the worst op-ed he's ever written a few days ago. It's an opinion column celebrating wider availability for an abortificant that is very safe to use. It's obvious he's pro-choice and celebrating this fact because of the way he places facts in the article and the language he uses: "Up to 70,000 women die from complications of abortions . . . last year the World Health Organization expanded it's uses as an 'essential medicine.'" But it's painful to read because, while this is an opinion column, Kristof is trying to hide his opinion and manipulate the reader by telling the story of a pill (with selectively chosen facts) as if it were a news story.

One of the most painful areas where people mix up normative and positive statements is when they talk about the arts. A third of the time when people say "that was a great movie" they mean that lots of people like or that on average people like it (descriptive), another third of the time they mean "I like it" (normative) and another third it's hard to know what they mean but it's something along the lines of either "this movie is widely praised by the elite" (descriptive) or "I think this movie will be widely praised by the elite for its structural properties" (normative). When I tried to write all of that I started to understand, just a little, why people can be so confused about whether people are saying something about what is or what should be. Look at this story where the author makes a statement of fact "Inception is ranked #3 on IMDb" but presents it in a warped way "Inception is the third greatest movie of all time." It certainly is not the third greatest movie of all time by any reasonable metric, so does he really just mean he liked it a lot? Or does he think IMDb is a representative poll? (This would make a nice post on why people need to learn statistics in high school.)

The most painful area, though, where people confuse the normative/positive distinction, is ethics. People often use "legal" as a proxy for moral, as in "well it ain't illegal" which is meant to imply it's acceptable and you shouldn't complain. To this day, despite being reminded a hundred times, my dad still thinks that say "the Supreme Court ruled [insert ruling, say that abortion is legal]" is a good argument for why I should think killing an infant is acceptable. Philosophers often confuse what is "natural" with what is right. This comes up often in relation to utilitarianism, where philosophers argue that because utilitarianism places too much of a burden on people to do the right things all the time that it's the wrong moral philosophy. If ethics had a low point of complete intellectual abdication in the past 50 years, it might be that debate. Yet, somehow, a related debate is starting to drag things down even further. As scientists have understand more about people's moral intuitions and it's become clear that, like how all languages are related and share a similar structure, all the rules different societies develop to regulate behavior share similar structure. Someone this has evolved into "morality is a product of evolution" becoming a catch-phrase that is meant to be taken as a normative statement--whatever your brain tells you to do is acceptable because your brain evolved to know right from wrong. Or something. It doesn't make any sense to me, but I've heard people I'd consider pretty smart and reasonable try to make that point.

All of that was cringe-inducing to write.

Monday, August 2, 2010

Favorite Movies

I always wondered if there were a poll of Americans that asks what their favorite movie is, like this one for England. It turns out there is one, from 2008, and contrary to my claim in an an earlier post, Gone with the Wind was #1.

Star Wars, the favorite in England, is #2. I think a response of any of the Star Wars movies counted for Star Wars, or many people who consider the trilogy their favorite and just said "Star Wars" because it seems odd that Star Wars was #2 and Empire didn't make the cut.

What is very surprising is that The Godfather comes it at #9. That may be a product of The Godfather - Part 2 siphoning off votes. Many people consider these two movies their favorites, but they are strongly divided about which is better (The Godfather is il migliore.)

Also revealing is that The Shawshank Redemption didn't make the list, but Forrest Gump did. I've long thought that IMDb and other sources were biased in favor of Shawshank because young people like it but old people don't. Given that Shawshank is #3 in the English poll, I suspect that there is a big divide between Americans and foreigners too. It's probably not that shocking, though, that Gump, which crushed Shawshank at the box office, is more popular. (I think it Gump is a vastly superior feel-good story.)

The box office, which I've long championed as an underrated measure of people's tastes, is also a good predictor for other films such as The Sound of Music. Gender, touch on in another post, biases most polls--with that bias removed The Notebook leaps into the top 10.

Raiders of the Lost Ark is another notable absence. My guess is that is because while everyone (all age groups, both genders) like Raiders, I don't know many people who consider it their favorite.

Would The Dark Knight make the list if another poll was done today? If it did which movie would lose the most ground? My guess is The Lord of the Rings, which was #4, might drop a few spots but would stay on the list.

Sunday, August 1, 2010

Quote of the Day: Hair Edition

Blondes have around 140,000 hairs, brunettes 110,000, and redheads only 90,000

I don't know if that's true. I also don't know why black hair is ignored.

Update: I think this is the 200th (undeleted) post on this blog.

Picture of the Day

"Would you consider role-playing a rape fantasy with a partner who asked you too?"

(Green = yes, Red = No)

More here.

Wyoming might be driven by a small-sample size effect. Nevada makes a lot of sense. Florida, though, is far and away the big state most interested in rape fantasies. I wasn't that surprised. Also note the legacy of puritanism is alive and well in New England.

Saturday, July 31, 2010

Stupid Questions

They say there are no stupid questions, yet this is a story about a stupid question.

My friends and I were talking about Rotten Tomatoes, which is a review aggregator for movies. They read all the reviews and decide if they're positive or negative, then report the percentage of positive reviews. If it's over 65% or so they certify the movie as "fresh."

I was explaining that the number RT reports is biased if it's meant to predict which movies are "good." Now I didn't specify what I mean by good but I think the intuitive definition is something like "most (American) people liked it" and you could estimate the percentage p that do/did by forcing N random people to watch the movie and vote thumbs up or down. Critics aren't like typical movie-goers, they tend to systemically have different tastes, so the RT number is a biased estimator of p.

The counterpoint is that, although it's rarely clearly stated, is that what makes things good or bad are some objective structural properties, things like complexity, pacing, lighting, or narrative structure. Critics are important because they're good at understanding these things, and the Academy Awards isn't done by a poll of the country because only the Academy actually knows good from bad. If that doesn't sound stupid then try to articulate some structural properties and then explain why they're the right ones (I'll return with an analogy later).

One of my friends asked "well, isn't there just good and bad music, so why not with films?" I quickly noted that "no, there isn't just good and bad music." He didn't that was a good enough answer, because I "don't know enough about music." (Implicitly I think the assumption here is that if you study music for long enough you'll understand, by divine revaluation, "the" structural properties and, well presumably then craft the perfect song possible. No, that is not meant to have a mocking tone.) A friend who played in the high school band explained that . . . well he didn't, he just talked about what kind of music he likes. I guess God just revealed to him one day that whatever he likes is also the right preference for everyone else.

After a while a light-bulb went on and everyone admitted that RT is a bad predictor (relative to say Yahoo! Movies, Netflix or Flixster) of what movies everyone likes because of selection bias. But no one informed Flixster as, ironically, they print only the RT rating when you search movies on their app, and you have to click each one-by-one to see the Flixster ratings (which is a somewhat biased version of the poll described above).

Now we return to the structural properties question. You actually can articulate a lot of properties that make any kind of art good. You can do the same for properties that make people attractive, which is illustrative. We know height makes men more attractive (up until about 6'2'' in the U.S.) and high cheek bones make women attractive (obviously likewise up to a point), and symmetric faces make both sexes more attractive. But how do we know that? Because we asked people to rate faces and those traits are correlated with higher ratings. You see, the structural approach gets things backwards.

Friday, July 30, 2010

Lying with Numbers AND Quote of the Day

First, the quote of the day, from this article on a program for humanities students to go into medicine:

I didn’t want to waste a class on physics, or waste a class on orgo. The social determinants of health are so much more pervasive than the immediate biology of it.

Why didn't she just say important? Did she think pervasive would sound more intelligent. And why is she going into medicine if she wants to deal with things like sanitation, employment, and clean water in Africa?

At the beginning of the article the author mentions a study that showed the academic performance of the humanities students matched that of the traditional students. Presumably the implication is that you don't really need to "waste a class on physics" because it won't matter.

But that's not what the study showed at all because of selection bias.

Consider this example. I ask some biology majors at MIT and some physics majors at UF to take a test on E&M (Physics II or 8.02 at MIT). They end up scoring the same (no significant difference). Majoring in physics doesn't help you do well on physics, right?

No. MIT admits students largely on their ability to do well on test, esp. in the sciences. Since MIT students are smarter you'd hope they do better than public school students, perhaps even though they are being compared with a select group of them.

In the article the humanities students are probably smarter on average. Their average GPA is 3.74 and SAT is 1447. They'd probably mostly be competitive applicants at some of the top medical schools in the country. You'd hope they could hold their own, all else equal, with people admitted to a less competitive programs.

Wednesday, July 28, 2010

Quote of the Day

I heard if u hit a kardashian u win a championship.. Kim k holla me!!! I need ya for 17 min

- Ty Lawson

Here's the context.

I like how he specified exactly 17 minutes. Andrew Gelman had a good blog post recently about how rounded numbers communicate uncertainty. If he said 20 minutes we'd understand that as "about 20 minutes" but when someone says "I'll be there in 17 minutes" you expect it to be 17 minutes sharp.

Wednesday, July 21, 2010

Is The Godfather IMDb's real #1?

Inception is #3 on IMDb's list of top movies. Toy Story was #6 a few weeks ago, but it's rank has sunk to #8 and will probably continue to plummet. Avatar started at #28 and has sunk out of the top 100. The Dark Knight, most famously, was briefly #1. There's got to be something wrong with how IMDb uses ratings for new movies. What is it?

It's called selection bias. The pool of people who vote on movies right when they come out isn't representative of the typical IMDb user who will vote on the film. They tend to be people who were so hyped about the movie that they saw it opening night or opening weekend. The result is that Inception has the 40,000 votes from people verly likely to enjoy the film. (IMDb corrects for this somewhat by filling in the "missing" votes that would get Inception up to 200k with an "average" opinion, but the trend shows they clearly don't compensate enough.)

There are other biases, though, in the IMDb formula that can throw the ranks out of whack. One is that it doesn't take into account the age distribution of voters. In general 18-29 years old give movies higher ratings than older people (45+) and these groups have different tastes. 18-29 year olds have substantially higher ratings than 45+s for The Shawshank Redemption (9.4 vs 8.7) and The Dark Knight (9.1 vs 7.7). In contrast, both age groups give similar ratings for The Godfather and Avatar. So I asked if it's possible IMDb has the ranking of Shawshank and The Godfather and The Dark Knight and Avatar backwards if you adjust for age demographics.

Short answer: yes. Based on these data, if you put a gun to my head and asked which movies would have a higher rating if the entire country watched each pair, my guesses would be The Godfather (by a big margin) and Avatar (by the slimest of margins).

Another bias to explore is gender bias. IMDb voters are overwhelming male, but the U.S. population (and movie-goers in general) aren't. Can that help explain why The Blind Side has a 4.4 on Netflix and an A on Yahoo! Movies but doesn't even make the IMDb Top 250? Sort of. The average rating for females is 8.2 vs 7.7 for men, and most of the ratings are from men. But if you give equal weight it only bumps The Blind Side up to a 7.9, not enough to make the Top 250 cut. The bias effecting The Blind Side is probably the selection bias in who visits the website (hipsters who don't like mainstream movies or football much?). The Notebook, though, is easily booted from the Top 250 by gender bias. It has a phenomenal 8.6 rating from females and a 7.8 from males, averaging to 8.2 and a would-be spot somewhere around #120. That is probably still to low, though, given it's 4.2 on Netflix from 5.5 million ratings.

Update: Here's another interesting story about the Godfather and Shawshank rankings.

Sunday, June 27, 2010

Bad math

A study found that "39.1% (95% CI 36.6–41.7%) of men's partnerships were ‘not (yet) regular’ vs 20.0% (95% CI 18.2–21.9%) of women's partnerships." and "[s]ex occurred within 24 [hours] in 23.4% (95% CI 19.7–27.5%) of men's and 10.7% (95% CI 8.3–13.6%) of women's partnerships."

But here's the catch, which is not reported in the paper's abstract, the real percentages have to be the exact same for men and women. For each male who is in a partnership that is "not (yet) regular" there has to be a female to complete the partnership. There was a big debate between mathematicians about this a few year back, but I think everyone eventually agreed on this point.

So what do the data really say? Well, since the first CI's don't even overlap, we have significant evidence men and women view their relationships differently, women viewing things are relatively more seriously. The data also show that either men significantly inflate the amount of casual sex they have when surveyed, or that women do the opposite. You can guess which is more probable.

Both of these findings are sure to shock no one.

But one possibility I left out is shocking. Perhaps the 19.1% of irregular partnerships men have and women don't are between men. Maybe 50% of the casual sex men have is with other men. As it turns out that the study only included heterosexual partnerships so we can rule out that hypothesis.

Sunday, June 6, 2010

Worst Book Review of All Time

The worst book review I've ever read.

Ehrenreich drives me nuts. She thinks that, without having any background in a subject, she can do a little research and make novel insights that have someone escaped everyone else. In her new book, she takes on positive psychology despite knowing nothing about psychology.

She, as the review notes, doesn't understand regressions or statistical significance. (She's doesn't understand what a categorical variable is either.) She does know how to ask asinine questions about dimensional analysis and note that the functional form might be misspecified. But she doesn't understand non-parametric regression, which makes her criticisms moot. Somehow, the reviewer takes all of this as the mark of genius.

The worst part, though, is the reviewer's conclusion, a collection of platitudes about how knowing the truth is more important than being happy. It makes his ignorance about positive psychology to obvious. Does he know "[s]tudies indicate that depressed individuals have more realistic views than non-depressed people?" Is he saying we should all become depressed, lest (to paraphrase) fantasy take precedence over reality?

Steve's Blog