6 min readOct 23, 2020

Machine Learning and Consumer Behavior Prediction

This post is for the capstone project that is required for the Udacity Data Science Nano degree program.

Project Definition

Project Overview

The project required to combine and assemble different data sets related to consumer demographics and their behavior toward different types of advertisement Buy One Get One (BOGO) and a discount to know which group respond best to which offer type.

Problem Statement

Every week Starbucks sends promotions that range in difficulty and duration to customers to encourage them to visit the branch and claim them. However, not all people respond to offers the same. This requires further analysis to predict the preference of a given consumer.

Metrics

The built in model for predicting customer reaction toward an offer need to be solid in terms of identifying the right offer type for a customer. The used model for classification should have a certain accuracy rate. As a start I have identified it to be accurate more than 60%.

Analysis

Data Exploration & Visualization:

We had three different data sets that are related to each other in one way to another. The first contained different promotions that Starbucks ran over time; they are one of three types: Buy One Get One (BOGO), Discount and Informational advertisement. The next data set contained information about customers registered. The final set contained information about events carried out by customers or targeted toward them, whether members or not.

I have decided to find answers to the below question:

What is the distribution for people completing offers based on types and demographics (age, gender, income)?
There is a specific group, who complete offers without knowing about them, what are the key characteristics for them?
Is there a specific channel type that affects the success of the offer (Quantity Vs quality)?

An offer is considered complete if the customer followed the below sequence. The data sets considers the offer completed if it was not Viewed. That is why the W/ Promotion column was added

Data in the table below shows that females are better in catching offers than male. Last column is in percentage format.

In general, as difficulty increases, fewer offers are completed. When social medial is used as a channel, it is more likely that that offer will be completed as shown in the below table.

Social media use for offer Vs. completion

Finally, I have decided to look into heavy spenders, who spend and complete offers without knowing they exist in the first place. There was no major difference in term of gender. However, we can see from the below charts that middle age and middle-income groups are a heavy spender and they spend heavily regardless if there is an offer or not. It can be attributed to the fact they constitute the majority of the data sets.

Methodology

Data Preprocessing

Before building the model, the data required some cleaning; the main focus was the demographic of consumers; thus, data, where demographic was not available, was discarded. Next, the completed offers were analyzed to make sure it was completed after the person viewed the offer. The W/ Promotion column was introduced, transaction data are sorted by time and a specific offer. Offer by offer the time stamp is analyzed to check if the offer was completed based on awareness of the offer or it just happened. Then, data sets are merged to produce the final DF.

Offers for the same person was grouped based on offer type. At later stage, each offer type was analyzed to check if the offer was successful or not. i.e. Offer completed by type > 75% of the received offers I have merged transaction data with offer data and customer characteristics. If the user happened to complete 75% of the offers given by type, that seems he and people of similar characteristics are more likely to claim future offers. Gender and age range are considered categorical values, thus, values are represented as binaries.

This way, the data set to build the model was ready. We need to identify how each person would react to each offer type, BOGO and discount, where BOGO contained data for BOGO offers sent and 1 if the person used more than 75% and 0 otherwise. The same thing was applied for the discount set.This requirement merits the need for splitting the values into separate groups based on the offer type.

Once the data was in place, the data set was split into two groups BOGO and discount, where BOGO contained data for BOGO offers sent and if the person used more than 75% or not and the something was applied for the discount set. Each set was split into two groups to learn and test to build the prediction model.

Implementation

Using supervised machine learning, the model takes customer’s demographic and predict which offer is more likely to be successful to this user, both, or there is no preference. To train and test, values are divided into 80% learn and 20% test. In other words, we train the model using part of the data and test the model using the remaining test samle.

The model takes in age, income, gender for a user and predict the right offer for the person if this person which offer is more likely to be completed of shared

Refinement

Two classification methods have been used for this Gaussian and SVC to be compared against each other. Using the testing score for each of the two models to select the best algorithm for the model. By analyzing the output for both models in combination, as one would be used for both BOGO and the discount predictor the accuracy score is better for the SVC model: (71% & 68%) compared to (80% & 80%). The same applies to the F1 score (62% & 62%) compared to (62% & 58%). The recall which is the positive out of total true positive is about the same for all at around 70%

Conclusion

Using the data sets provided an interesting task; it is not an everyday opportunity to work with real-world data that you can mingle to make sense. The predictor model should generate profit by providing the customer with an offer that is more likely to be used, which can be translated into profits to the company. With more than 60% accuracy, the predictor would provide the right offer type in most of the cases.

Working with limited number of information and build in what can be used for predicting the right offer type was a challenging task, especially the process for identifying the true completion from the false positive. This model can do better if it continuously learned about user behavior toward given effort to be fine-tuned and provide better prediction in the future. Besides, more information about customers can play into providing better prediction.

To see more about this analysis, see the link to my Github available here