Error message

  • Deprecated function: implode(): Passing glue string after array is deprecated. Swap the parameters in drupal_get_feeds() (line 394 of /home/cocosci/
  • Deprecated function: The each() function is deprecated. This message will be suppressed on further calls in _menu_load_objects() (line 579 of /home/cocosci/

Skill acquisition in online games

Today, we're talking to Tom Stafford, Senior Lecturer in Psychology and Cognitive Science at University of Sheffield. His recent work explored skill acquisition using a massive dataset of online gamers. He shares a bit about his project and his experience working with big data below.

What's your one-sentence summary of the project? 

Visualising how practice amount, practice quality and initial ability affect skill development

What overarching cognitive question were you trying to answer with this project? 

Online games allow us to connect the outcome of skill development with a player's total history of practice up to that point. With a large number of players we can precisely quantify the influence on performance of factors which affect skill development.

Tom Stafford
How did you discover (or build) this dataset, and what made you decide to work with it to answer your question? 

I knew that I wanted to get data from gamers, since - unlike many domains - it was a domain where all actions taken during practice could be unobtrusively recorded. A friend recommended I contact Preloaded, who were working on a game funded by the Wellcome Trust, and both organisations were kind enough to share the data from the game.

What skills were most valuable to you during the project? Do you have any suggestions for how others might acquire those skills? 

The most valuable skills for this project were (in order of nebulousness):

  • Coding in Python. Fortunately there is plenty of advice online for this. The best way is probably to have a project where you’re excited about the result, so you are motivated to keep climbing the learning curve.
  • Thinking statistically. Exploring a large dataset requires deciding on summary measures, visualising them and then double-checking that they reveal what you really want to see and aren’t confounded with something else. All three of those stages require thinking carefully about distributions and potential relationships between variables. I don’t know how you get good at this other than experience. Certainly a necessity is that you use tools which allow rapid generation of multiple different visualisations (this counts out most GUI stats applications, and reinforces the necessity for command line tools such as Python).
  • Caring about theory. Exploring a large dataset means choosing from a myriad possible analyses and visualisations. Knowing what previous work has suggested, and what would be interesting or surprising, is essential to narrow your focus on analyses that go beyond superficial portrayals of relationships between variables and speak to underlying constructs. In turn, focus on underlying constructs is essential if your results are to have validity which generalises beyond your specific data.
What prior experience did you have working with big data before this project? 

None. When I started on this project all my existing tools and/or the way I used them immediately broke (e.g. GUI stats and spreadsheet packages, Matlab, serial processing)

How did ethical considerations for this study differ from laboratory studies? Did the IRB or ethics board have any new concerns? 

This data was anonymised at point of collection, and concerns a highly artificial online world (no persistent identity, no links to real-world actions), so ethical concerns were minimal.

What objections or obstacles did you have to overcome in the review process that were unique to working with big data? 

Because the data wasn’t from a lab study, we lacked certain controls and participant details which would be standard for a normal experimental psychology paper. I had to argue in the response to reviewers about why, despite these limitations, some of our claims were still reliable (for example, by pointing out that participant variability due to undocumented factors would only act to decrease our statistical power, it didn’t in itself give cause to suspect that the effects we did find were false alarms).

Did the recent movement toward open science and reproducibility play any role in planning or executing this project? If so, how? 

Yes, they absolutely did. Our aim, from the beginning, was to produce a paper based on open data, and to publish the analysis code alongside the paper.

One benefit of this was that,when a reviewer queried one of the thresholds we selected for our analysis, commenting that it seemed arbitrary, we were able to write in response, “Yes, it is arbitrary, but our code is online, and you can check that the main result still holds if you change the threshold up or down by an order of magnitude.” That was very satisfying.

What did a big data perspective afford you for this project that a more traditional perspective might not have? 

A large part of analysing a lab study seems to be trying to work out which effects are real (and then maybe thinking about what they mean). With a large data set you dramatically reverse the amount of time you devote to considering these two things - all real effects will be significant, if you do the right analysis. Now the effort is in, once you find an effect, figuring out what it means. This has got to be good for the status of theory within cognitive science.

Do you have any advice for those interested in using big data for cognitive science? 

Computer scientists, make friends with people who are interested in cognitive theory.
Cognitivists, make friends with people who have studied computer science.

Project publication: 

Stafford, T. & Dewar, M. (2014). Tracing the trajectory of skill learning with a very large sample of online game players. Psychological Science, 25(2) 511-518.

For more, contact: 
Tom Stafford
Department of Psychology
University of Sheffield
Mike Dewar