Predicting engagement, interaction, and revenue from players in a mobile social game.

 
 

Data access and competition rules

You will be working with REAL DATA from Uken games!  In order to access the dataset, please see Rules and Eligibility and Registration



Background

Video game analytics is an important area for statistics, and this case study has been developed by Uken games. Freemium games are free to download and play, but users have the option of making in-app purchases to enhance or accelerate their performance. After the app is downloaded, users typically work through a tutorial stage, then move on to open play where users earn some in-game currency.  When users have enough in-game currency they can progress to the next stage. In-game currency can be earned by completing tasks or it can be purchased.  Users have the option of connecting their game account to Facebook, which unlocks the ability to interact with friends in-game by sending and receiving gifts.  This case study is modified from a Statistical Society of Canada case study from 2014 with permission from Uken games



Measurements

Three target variables are available for each user: revenue, engagement (time played), and retention (does the player return to the game after a number of days).


Times are recorded for events such as when the user makes different in-app purchases, sends or receives gifts, or unlocks different achievements.  Demographic data is also available for some users. 

For privacy reasons some of the variables have been masked.  For example, the total length of the observation time period recording when achievements were unlocked is the same for all users but the exact length of that time period has been removed.  Also, revenue and engagement numbers have been rescaled.



Analysis



Below are some questions to guide the analysis:

  1. 1.    Can you come up with a good way to visualize this data?

  2. 2.    What are some of the exploratory insights you can obtain from this data?

  3. 3.    How do the user demographics and user actions affect the response variables (engagement, revenue, retention)? Which are the strongest predictors? What interactions are present?  The data is split into training and testing datasets (if you don’t know what this means, try here or ask your graduate student mentor.)

  4. 4.    In the mobile gaming industry, the golden standard for evaluating product changes is through randomized control-treatment experiments (often called A/B tests). Common metrics to test include revenue and retention, which are included in this dataset. Based on your analysis of the data, what A/B tests would you follow up with if you had access to our full data stream? What would be the experimental design? (including the sample size, experimental groups, and the statistical model you would use)

  5. 5.    What other insights can you provide?



Dataset Characteristics


The distributions of revenue and engagement are very heavy tailed. Most users don't make in-app purchases.  Most of the players who make in-app purchases do not make large purchases.  However the small number of users who make large in-app purchases account for a large part of the revenue, in a sense subsidizing the game for the other players.

The game economy is closed - the conversion price between real currency and in-game currency is controlled by the game company, as well as the number and type of available purchases.


For each user, measurements are taken between the time they install the app and until a certain number of days has passed. For privacy reasons, we cannot reveal the exact observation period, but note that the length of the observation period is the same for every user.


The dataset consists of a single table, user_stats.csv, with one record for each user. There is a header containing the variable names listed below.  In R you can use the read the data using the command:


fulldata = read.csv("user_stats.csv",header=T)


There are 300,000 rows, where the response variable from 50,000 rows were witheld back as a validation data set.

The data includes the following columns:


Demographic features
user_id ­ integer uniquely identifying each user
install_date – in the format of year, month, date
platform ­ (ipad, iphone). What platform does a user install on?
platform2_install_date ­ date when a user installs on a second platform (NA if they only install on one platform throughout the observation period)
fb_connect ­ date when user connects their game account to Facebook (NA if they don’t do so during the observation period)
country ­ string specifying the country the user is from (NA is unknown)
gender ­ (male, female, NA). Gender is known if and only if the user connects to Facebook. Note that if a user connects to Facebook after the observation period, their gender is known but fb_connect will be NA.


Metrics
return_player ­ (0,1) 1 if a player plays a session on the last day of the observation period, 0 otherwise

engagement ­ number of minutes the game was played during the observation period
revenue ­ amount of money the user spent during the observation period


Event features
tutorial_completed ­ date when user completes the tutorial.
first_game_player ­ d ate when user plays their first round of the game (note that some users quit before ever starting a game)
first_type_1_game ­ t here are four variations of the game, each with different intensity. Each round, a user chooses what variation they would like to play. first_type_1_game is the date of the first time a user played the first variation.
first_type_2_game
first_type_3_game
first_type_4_game
first_win ­ d ate of the first round the player won
first_bonus ­ w hen a user accumulates enough energy, they can exercise a bonus which allows them to win a game faster and accrue more in­game currency. first_bonus is the date when this first happens

first_special_purchase ­ date of first in­app purchase of any kind that the user has made. first_purchase_A - date of first of first in­app purchase of type A that the user has made

first_purchase_B
first_purchase_C

first_purchase_D
first_purchase_E
first_purchase_F
first_purchase_G

first_purchase_H
first_gift_sent ­ I f a user connects their account to facebook, they can send and receive gifts with their facebook friends. There are two types of gifts they can receive (corresponding to different in­game currency). The dates in which these events first occur are coded by first_gift_sent, first_gift_received

first_gift_2_sent

first_gift_received
first_gift2_received
first_uken_gift_received ­ our company can also send a gift to the players (for example, during Holiday promotions). This feature indicates the date of the first such gift they received from us. first_collection ­ u sers have the option of collecting some artifacts in the game. Once enough artifacts are gathered, a collection is complete, and the user gets a bonus of virtual currency. first_collection is the date when this first happens.
first_prize_A ­ I n each round played, a user may win one of three prizes; prize A, prize B, or prize C first_prize_B
first_prize_C
stage1 ­ date when user first plays stage 1.

stage2  ­ date when user first plays stage 2.
stage3  ­ etc.
stage4

stage5

stage6

stage7


Training_Validation - 0 if it is part of the training data set and 1 if the response variables (return_player,  engagement, and revenue) were withheld as part of the validation dataset.


Remarks:
1. the revenue and engagement numbers have been rescaled
2. Stage 1 becomes available as soon as the user completes the tutorial. Subsequent stages become available as a player plays rounds on the stages available to them. A player may choose whichever unlocked stage they like, and it is possible, for example, that they unlock and play stage 4 without ever playing stage 3.
3. For all event features, NA indicates that the event did not occur in the observation period
4. In­app purchases provide users with virtual currency that allows users to continue playing when they run out of currency, or to increase the intensity of the game. They can also be used to change the game aesthetics.