Skill acquisition in online games
Today, we're talking to Tom Stafford, Senior Lecturer in Psychology and Cognitive Science at University of Sheffield. His recent work explored skill acquisition using a massive dataset of online gamers. He shares a bit about his project and his experience working with big data below.
Visualising how practice amount, practice quality and initial ability affect skill development
Online games allow us to connect the outcome of skill development with a player's total history of practice up to that point. With a large number of players we can precisely quantify the influence on performance of factors which affect skill development.

I knew that I wanted to get data from gamers, since - unlike many domains - it was a domain where all actions taken during practice could be unobtrusively recorded. A friend recommended I contact Preloaded, who were working on a game funded by the Wellcome Trust, and both organisations were kind enough to share the data from the game.
The most valuable skills for this project were (in order of nebulousness):
- Coding in Python. Fortunately there is plenty of advice online for this. The best way is probably to have a project where you’re excited about the result, so you are motivated to keep climbing the learning curve.
- Thinking statistically. Exploring a large dataset requires deciding on summary measures, visualising them and then double-checking that they reveal what you really want to see and aren’t confounded with something else. All three of those stages require thinking carefully about distributions and potential relationships between variables. I don’t know how you get good at this other than experience. Certainly a necessity is that you use tools which allow rapid generation of multiple different visualisations (this counts out most GUI stats applications, and reinforces the necessity for command line tools such as Python).
- Caring about theory. Exploring a large dataset means choosing from a myriad possible analyses and visualisations. Knowing what previous work has suggested, and what would be interesting or surprising, is essential to narrow your focus on analyses that go beyond superficial portrayals of relationships between variables and speak to underlying constructs. In turn, focus on underlying constructs is essential if your results are to have validity which generalises beyond your specific data.
None. When I started on this project all my existing tools and/or the way I used them immediately broke (e.g. GUI stats and spreadsheet packages, Matlab, serial processing)
This data was anonymised at point of collection, and concerns a highly artificial online world (no persistent identity, no links to real-world actions), so ethical concerns were minimal.
Because the data wasn’t from a lab study, we lacked certain controls and participant details which would be standard for a normal experimental psychology paper. I had to argue in the response to reviewers about why, despite these limitations, some of our claims were still reliable (for example, by pointing out that participant variability due to undocumented factors would only act to decrease our statistical power, it didn’t in itself give cause to suspect that the effects we did find were false alarms).
Yes, they absolutely did. Our aim, from the beginning, was to produce a paper based on open data, and to publish the analysis code alongside the paper.
One benefit of this was that,when a reviewer queried one of the thresholds we selected for our analysis, commenting that it seemed arbitrary, we were able to write in response, “Yes, it is arbitrary, but our code is online, and you can check that the main result still holds if you change the threshold up or down by an order of magnitude.” That was very satisfying.
A large part of analysing a lab study seems to be trying to work out which effects are real (and then maybe thinking about what they mean). With a large data set you dramatically reverse the amount of time you devote to considering these two things - all real effects will be significant, if you do the right analysis. Now the effort is in, once you find an effect, figuring out what it means. This has got to be good for the status of theory within cognitive science.
Computer scientists, make friends with people who are interested in cognitive theory.
Cognitivists, make friends with people who have studied computer science.
Stafford, T. & Dewar, M. (2014). Tracing the trajectory of skill learning with a very large sample of online game players. Psychological Science, 25(2) 511-518.