o2o优惠券使用预测

程序员文章站 2022-07-01 23:27:10

...

前沿：

这是天池的一个新人实战塞题目，原址 https://tianchi.aliyun.com/getStart/information.htm?spm=5176.100067.5678.2.e1321db7ydQmSB&raceId=231593 ，下文会分析以下几个过程。

1.数据预处理

2.特征的选取

3.算法的说明

4.结果分析

5.其他

第一部分：数据预处理

原始数据可以从上边链接中下载，拿到.csv文件，可以使用pandas处理。

比如：

dfoff = pd.read_csv('ccf_offline_stage1_train.csv', keep_default_na=False)

参数 keep_default_na默认为True，当为True时，文件中的'null'则读物Nan, 此时不能使用 dfoff['Date'] != 'null' 判断，为了对‘null’可以使用 “==”，“！=”，此处设置 keep_default_na=False 。

我们需要得出优惠券与购买的关联数据，以此得出Label。

有以下4中组合：

　　有优惠券，购买商品条数
　　无优惠券，购买商品条数
　　有优惠券，不购买商品条数
　　无优惠券，不购买商品条数

代码如下：

print('有优惠券，购买商品条数', dfoff[(dfoff['Date_received'] != 'null') & (dfoff['Date'] != 'null')].shape[0])
print('无优惠券，购买商品条数', dfoff[(dfoff['Date_received'] == 'null') & (dfoff['Date'] != 'null')].shape[0])
print('有优惠券，不购买商品条数', dfoff[(dfoff['Date_received'] != 'null') & (dfoff['Date'] == 'null')].shape[0])
print('无优惠券，不购买商品条数', dfoff[(dfoff['Date_received'] == 'null') & (dfoff['Date'] == 'null')].shape[0])

　　文件中有买多少减多少，需要格式化为折扣率，距离门店格式化为数字等

def convertRate(row):
    if row == 'null':
        return 1.0
    elif ':' in row:
        rows = row.split(':')
        return 1.0 - float(rows[1])/float(rows[0])
    else:
        return float(row)

def getDiscountMan(row):
    if ':' in row:
        rows = row.split(':')
        return int(rows[0])
    else:
        return 0

def getDiscountJian(row):
    if ':' in row:
        rows = row.split(':')
        return int(rows[1])
    else:
        return 0

def getWeekday(row):
    if row == 'null':
        return row
    else:
        return date(int(row[0:4]), int(row[4:6]), int(row[6:8])).weekday() + 1


def processData(df):
    df['discount_rate'] = df['Discount_rate'].apply(convertRate)
    df['discount_man'] = df['Discount_rate'].apply(getDiscountMan)
    df['discount_jian'] = df['Discount_rate'].apply(getDiscountJian)
    df['discount_type'] = df['Discount_rate'].apply(getDiscountType)
    print(df['discount_rate'].unique())

    df['distance'] = df['Distance'].replace('null', -1).astype(int)
    return df

　　调用 dfoff = processData(dfoff) 即可格式化以上信息。

注意代码中apply()函数，apply()函数是pandas里面所有函数中*度最高的函数。该函数如下：

DataFrame.apply(func, axis=0, broadcast=False, raw=False, reduce=None, args=(), **kwds)

对收到优惠券日期处理：

date_received = dfoff['Date_received'].unique()  #.unique()删除重复项
date_received = sorted(date_received[date_received != 'null']  #排序
print('优惠券收到日期从',date_received[0],'到', date_received[-1])  #输出最小日期和最大日期

同样对于消费日期处理：

date_buy = dfoff['Date'].unique()
date_buy = sorted(date_buy[date_buy != 'null'])
date_buy = sorted(dfoff[dfoff['Date'] != 'null']['Date'])
print('消费日期从', date_buy[0], '到', date_buy[-1])

将发放的优惠券与被使用的优惠券画图：

couponbydate = dfoff[dfoff['Date_received'] != 'null'][['Date_received', 'Date']].groupby(['Date_received'], as_index=False).count()
couponbydate.columns = ['Date_received','count']
buybydate = dfoff[(dfoff['Date'] != 'null') & (dfoff['Date_received'] != 'null')][['Date_received', 'Date']].groupby(['Date_received'], as_index=False).count()
buybydate.columns = ['Date_received','count']

sns.set_style('ticks')
sns.set_context("notebook", font_scale= 1.4)
plt.figure(figsize = (12,8))
date_received_dt = pd.to_datetime(date_received, format='%Y%m%d')

plt.subplot(211)
plt.bar(date_received_dt, couponbydate['count'], label = 'number of coupon received' )
plt.bar(date_received_dt, buybydate['count'], label = 'number of coupon used')
plt.yscale('log')
plt.ylabel('Count')
plt.legend()

plt.subplot(212)
plt.bar(date_received_dt, buybydate['count']/couponbydate['count'])
plt.ylabel('Ratio(coupon used/coupon received)')
plt.tight_layout()
plt.show()

　　得到一幅图： o2o优惠券使用预测

第二部分：特征的选取

第三部分：算法的说明

第四部分：结果分析

第五部分：其他

上一篇： C#ModBus Tcp的学习及Master的实现

下一篇：死磕 java同步系列之JMM（Java Memory Model）

o2o优惠券使用预测

教你使用淘宝优惠券公众号领大额无门槛优惠券

京东app怎么领取并使用蝴蝶节优惠券?

简单却好用：使用Keras 2实现基于LSTM的多维时间序列预测

微信营销活动优惠券怎么设置？微信优惠券的使用攻略

滴滴打车怎么使用优惠券?

支付宝水电券怎么使用? 支付宝电费优惠券的使用方法

使用Python进行体育竞技分析（预测球队成绩）

高德打车优惠券在哪? 高德地图打车优惠券的使用方法

如何使用.net开发一款小而美的O2O移动应用？ ——“家庭小秘”APP介绍及采访记录

使用Keras进行时间序列预测回归问题的LSTM实现