Triangle Spring: Of Surveys and Scheduling

From exec-l, it seems like Fall nominations just happened. The following-year Fall scheduling process that happens in the previous Spring is pretty much the most important planning event of the year. In many respects, it is the only real planning event of the year, which strikes me as error prone since the showing's schedule is determined by a consensus mechanism with no set direction or goal, but I digress (this isn't the topic I want to talk about). Accepting that scheduling is done with a pick-your-own-goal kind of multi-unknown-purpose theme, I've always wondered that a least for my own goals and for the unknown (but likely overlapping) goals of other e-board voters, what kind of information would allow us to each come up with our own best possible decision.

The biggest problem (really a very common biggest problem), is the issue of unknown factors. In particular, most people have not personally seen all the shows and require external influence in order to come to a decision. A close second, is that it's hard to figure out what the results of that decision might be. The first issue is self-explanatory. The second is more complicated. Let's say I had a goal of trying to pick shows that would increase the size of the audience, equalize the gender balance, and increase the level of education of the audience. For audience size, I might think that more popular shows (e.g. higher ANN ratings) would be more attractive, but at the same time what if everyone has seen those shows already and would be driven away? For gender balance, I might think that shoujo-genre shows would be more attractive, but what is the mechanism by which females would learn about and choose to attend? For education, I might think that more "fringe" shows (e.g. Tale of Genji or Angel's Egg) might broaden peoples' horizons, but it might also drive them away.

Ben started the reviews site in the e-board repository last semester, which seems to get people significant information about shows they have not seen, in an easy centralized manner. There seems to be interest in this, and if people get in the habit of contributing and reading it, people's general level of knowledge should increase. But summaries only get us part of the way there. Dossiers are often written to attract people to a show, focusing in on aspects that would convince a person to start a show. While there is some attempt to structure things to balance present and future expectation (e.g. with the "how strong is the second half" question), in reality, the general motivation will be to induce an unknowing person to start the show, not a person in the middle of watching the show to finish. To some extent, the quantitative ratings statistics carry some of this information. A higher rated show is more likely to have a satisfying start and finish (since others found it satisfying), plus we also get a sense of what the overall public feels about the show. Thus, someone evaluating the show could combine the numerical rating with their own attraction of elements of the show and weigh them during a voting decision. However, these numerical scores carry their own problem. How do we know how these ratings by unknown thousands of people on the internet relate to how CJAS members will react? (People won't like a show just because we show it. Say you have a bunch of people who don't like comedy. If you inundate them with comedy, they won't learn to like it. They'll just leave.)

Of course, these issues are nothing new. But, I was re-visiting the topic of surveys recently, and realized that the answers to a lot of the issues could be found in existing survey data. So, I took the FA03, FA04, and SP05 survey data and did some analysis on the derived statistics for each show. In particular, for each show, I analyzed the following variables:

Fall mean rating
Fall std. dev.
Fall # responses
Spring mean rating
Spring std. dev.
Spring # responses
ANN # ratings
ANN # people indicating they've seen part or all of the show
ANN arithmetic mean
ANN arithmetic std. dev
ANN weighted mean
ANN Bayesian score
Was the show shown as a series (0/1 dummy variable)

My analysis is not yet complete, I've done a Pearson's-R correlation table and some initial regressions. There are some interesting results that I want to note before I dig for more data and write up a more detailed analysis:

Moderate (0.5 <>
High correlation with high significance between Spring mean and ANN Bayesian score
No relationship between Fall # responses and ANN Bayesian score
High correlation with high significance between Spring # responses and ANN Bayesian score
Low (0.3 <>
High correlation with high significance between Spring # responses and shown as a series
No relationship between Fall mean and Fall # responses
High correlation (R > 0.7) with high significance between Spring mean and ANN # rating

The differences between Fall and Spring are particularly notable, but also preliminary. There is one big problem, I can't find my SP04 survey data. Thus, my Spring semester data set is small (n=11). Another issue is the stats having to do with # responses. By itself, # responses isn't necessarily that meaningful except to determine statistical significance and perhaps do chi-square tests between shows. What I really want to use this for is a "butts in seats" estimate for a given show, under the assumption that you only rated a show if you saw it. Thus if you didn't see a show, you didn't rate it. But in order to measure these "drifters," who enter only for their favorite shows and then leave, I'm assuming that we surveyed everyone who walked in that night, w/o missing anyone. If we only survey at the beginning and at break, there will be a significant time-based bias. For these particular semesters, I'm pretty confident that the survey results are pretty thorough, since I administered them. For future semesters, I'm hoping that surveyors have been and will be similarly aggressive administering the survey.

Caveats aside, it's quite interesting that series drive increased # responses so much more in the spring than in the fall. It's also interesting that ANN ratings drive # responses significantly in the spring, but not the fall, and also that those same ratings are much more correlated to CJAS ratings in the spring than in the fall.

When I did some preliminary work on regression models to predict the CJAS rating based on ANN data, it was interesting and shocking to me that the multiple-R on my best multiple regression on Fall mean was in the 0.6-ish range, whereas it was in the >0.9 range on the Spring mean. This would suggest that the ANN data are a good predictor of the audience's impression of a show after seeing the whole thing, but not as good a predictor of the audience's first-half reaction.

What I eventually aim for is getting some statistically significant model that can predict the audience's reaction to a show based on factors we can discover or control at the time of scheduling.

Triangle Spring

Sunday, March 05, 2006

Of Surveys and Scheduling

1 comment:

Subscribe

Blog Archive