Groupon Deals Data Analysis
Setup
A. Background:
- Some Groupon deals have a minimal requirement, e.g., the deal only works when there are at least 100 committed buyers.
- Groups:
- Control group: deals without the minimal requirement
- Treatment group: deals with minimal requirement
B. Question at Hand
- Does having the minimal requirement affect the deal outcomes, such as revenue, quantity sold, and Facebook likes received?
C. Need for propensity matching
- Unequal distribution of Treatment in Outcomes
- High revenue and low revenue
- High Quantity sold vs Low Quantity sold
- High Facebook likes received vs Low Facebook likes received
D. Features to be used
What features to select: As we will illustrate later, the following features/variables should be excluded:
- Features/variables that predict treatment status perfectly, such as min_req feature, which the treatment feature is directly derived from (see the code notebook for the result of adding min_req).
- Features/variables that may be affected by the treatment
Data Analysis
1. Read the groupon data
df = pd.read_csv('./data/groupon.csv')
df.info()
2. Extract features for propensity score matching
3. Visualize Effect size using Cohen's D
fig, ax = plt.subplots(figsize=(15, 5))
ax = sns.barplot(data=all_stats_df, x='effect_size', y='feature', hue='matching', orient='h')
4. Visualize P-value significance of t-test
fig, ax = plt.subplots(figsize=(15, 5))
ax = sns.barplot(data=all_stats_df, x='log_P', y='feature', hue='matching', orient='h')
ax.set_xlabel('-log(P-value) of t-test between control and treatment groups')
ax.axvline(x=-np.log10(0.05), color='r', linestyle='--', label='alpha = -np.log10(0.05)')
ax.legend()
5. Distribution of Quantity Sold
col = 'quantity_sold'
ax = sns.distplot(matched_df[col])
iqr = np.percentile(matched_df[col], 75) - np.percentile(matched_df[col], 25)
upper_bound = np.percentile(matched_df[col], 75) + 3.0 * iqr
lower_bound = np.percentile(matched_df[col], 75) + 1.0 * iqr
ax.axvline(x=np.mean(matched_df[col]), color='r', linestyle='--', label='mean')
ax.axvline(x=upper_bound, color='g', linestyle='--', label='tukey upper bound')
ax.axvline(x=lower_bound, color='g', linestyle='--', label='tukey lower bound')
ax.legend()
Conclusion