逻辑回归进行异网联通电信用户年龄分类.docx
- 文档编号:28656829
- 上传时间:2023-07-19
- 格式:DOCX
- 页数:6
- 大小:16.16KB
逻辑回归进行异网联通电信用户年龄分类.docx
《逻辑回归进行异网联通电信用户年龄分类.docx》由会员分享,可在线阅读,更多相关《逻辑回归进行异网联通电信用户年龄分类.docx(6页珍藏版)》请在冰豆网上搜索。
逻辑回归进行异网联通电信用户年龄分类
逻辑回归算法可以高效的解决二分类问题,也可以解决多分类只是没有KNN高效,KNN天生就是解决多分类,不过KNN过于简单,适用性没有逻辑回归好。
我是想通过移动用户的交际圈来判断异网用户是否25岁以下,想看看到底年轻人有多少去了联通。
这里就提取了两个特征,一个是交际圈前5名的平均年龄,再一个就是平均联系亲密度。
然后利用网内对网内的数据进行模型训练,准确率上去后再开始应用。
这块能用到的地方还是挺多的,比如判断用户是否会订购业务,是否即将离网,是否套餐降档等等,但是,处理实际的用户多分类问题,最佳的还是神经网络深度学习,但是运算量太大了,我的机器配置过低了。
实际应用逻辑回归需要改进的地方比较多,不像理论数据,简单运算就能90以上准确率。
还得继续学学别人的优化算法。
用的是python3.6,参考代码是2.7的,花了点时间看懂然后改代码,改参数,还是不熟,报错就XX,搞了一个多小时终于OK了。
代码和结果如下
#-*-coding:
utf-8-*-
"""
CreatedonWedApr1119:
49:
132018
fromnumpyimport*
importmatplotlib.pyplotasplt
importtime
#calculatethesigmoidfunction
defsigmoid(inX):
return1.0/(1+exp(-inX))
#trainalogisticregressionmodelusingsomeoptionaloptimizealgorithm
#input:
train_xisamatdatatype,eachrowstandsforonesample
#train_yismatdatatypetoo,eachrowisthecorrespondinglabel
#optsisoptimizeoptionincludestepandmaximumnumberofiterations
deftrainLogRegres(train_x,train_y,opts):
#calculatetrainingtime
#startTime=time.time()
numSamples,numFeatures=shape(train_x)
alpha=opts['alpha'];maxIter=opts['maxIter']
weights=ones((numFeatures,1))
#optimizethroughgradientdescentalgorilthm
forkinrange(maxIter):
ifopts['optimizeType']=='gradDescent':
#gradientdescentalgorilthm
output=sigmoid(train_x*weights)
error=train_y-output
weights=weights+alpha*train_x.transpose()*error
elifopts['optimizeType']=='stocGradDescent':
#stochasticgradientdescent
foriinrange(numSamples):
output=sigmoid(train_x[i,:
]*weights)
error=train_y[i,0]-output
weights=weights+alpha*train_x[i,:
].transpose()*error
elifopts['optimizeType']=='smoothStocGradDescent':
#smoothstochasticgradientdescent
#randomlyselectsamplestooptimizeforreducingcyclefluctuations
dataIndex=list(range(numSamples))
foriinrange(numSamples):
alpha=4.0/(1.0+k+i)+0.01
randIndex=int(random.uniform(0,len(dataIndex)))
output=sigmoid(train_x[randIndex,:
]*weights)
error=train_y[randIndex,0]-output
weights=weights+alpha*train_x[randIndex,:
].transpose()*error
del(dataIndex[randIndex])#duringoneinteration,deletetheoptimizedsample
else:
raiseNameError('Notsupportoptimizemethodtype!
')
print('Congratulations,trainingcomplete!
')
returnweights
#testyourtrainedLogisticRegressionmodelgiventestset
deftestLogRegres(weights,test_x,test_y):
numSamples,numFeatures=shape(test_x)
matchCount=0
foriinrange(numSamples):
predict=sigmoid(test_x[i,:
]*weights)[0,0]>0.5
ifpredict==bool(test_y[i,0]):
matchCount+=1
accuracy=float(matchCount)/numSamples
returnaccuracy
#showyourtrainedlogisticregressionmodelonlyavailablewith2-Ddata
defshowLogRegres(weights,train_x,train_y):
#notice:
train_xandtrain_yismatdatatype
numSamples,numFeatures=shape(train_x)
ifnumFeatures!
=3:
print("Sorry!
Icannotdrawbecausethedimensionofyourdataisnot2!
")
return1
#drawallsamples
foriinrange(numSamples):
ifint(train_y[i,0])==0:
plt.plot(train_x[i,1],train_x[i,2],'or')
elifint(train_y[i,0])==1:
plt.plot(train_x[i,1],train_x[i,2],'ob')
#drawtheclassifyline
min_x=min(train_x[:
1])[0,0]
max_x=max(train_x[:
1])[0,0]
weights=weights.getA()#convertmattoarray
y_min_x=float(-weights[0]-weights[1]*min_x)/weights[2]
y_max_x=float(-weights[0]-weights[1]*max_x)/weights[2]
plt.plot([min_x,max_x],[y_min_x,y_max_x],'-g')
plt.xlabel('X1');plt.ylabel('X2')
plt.show()
fromnumpyimport*
importmatplotlib.pyplotasplt
importtime
defloadData():
train_x=[]
train_y=[]
fileIn=open('E:
/ceshi/luoji001.txt')
forlineinfileIn.readlines():
lineArr=line.strip().split()
train_x.append([1.0,float(lineArr[0]),float(lineArr[1])])
train_y.append(float(lineArr[2]))
returnmat(train_x),mat(train_y).transpose()
##step1:
loaddata
print("step1:
loaddata...")
train_x,train_y=loadData()
test_x=train_x;test_y=train_y
##step2:
training...
print("step2:
training...")
opts={'alpha':
0.01,'maxIter':
20,'optimizeType':
'smoothStocGradDescent'}
optimalWeights=trainLogRegres(train_x,train_y,opts)
##step3:
testing
print("step3:
testing...")
accuracy=testLogRegres(optimalWeights,test_x,test_y)
##step4:
showtheresult
print("step4:
showtheresult...")
print('Theclassifyaccuracyis:
%.3f%%'%(accuracy*100))
showLogRegres(optimalWeights,train_x,train_y)
输出:
step1:
loaddata...
step2:
training...
Congratulations,trainingcomplete!
step3:
testing...
step4:
showtheresult...
Theclassifyaccuracyis:
86.262%
于知乎编辑于2018-04-11
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- 逻辑 回归 进行 联通 电信用户 年龄 分类