CourseraMachineLearning机器学习课程笔记.docx
- 文档编号:12447223
- 上传时间:2023-04-19
- 格式:DOCX
- 页数:44
- 大小:371.22KB
CourseraMachineLearning机器学习课程笔记.docx
《CourseraMachineLearning机器学习课程笔记.docx》由会员分享,可在线阅读,更多相关《CourseraMachineLearning机器学习课程笔记.docx(44页珍藏版)》请在冰豆网上搜索。
CourseraMachineLearning机器学习课程笔记
MachineLearning
Week1
I.Introduction
Typesofmachinelearning
II.LinearRegressionwithOneVariable
Notation
Possibleh(“hypothesis”)functions
Costfunction
Learningrate
Gradientdescentfortwo-parameterlinearregression(repeatuntilconvergence)
“Batch”gradientdescent
III.LinearAlgebraRevision(optional)
Week2
IV.LinearRegressionwithMultipleVariables
Largenumberoffeature
Multi-parameterlinearregressioninvectornotation
Pickingfeatures
Costfunctionformultiplefeatures
Gradientdescentforthecostfunctionofmulti-parameterlinearregression
Featurescaling
Meannormalization
Howtopicklearningrateα
Polynomialregression
“Normalequation”:
UsingmatrixmultiplicationtosolveforθthatgivesminJ(θ)
GradientdescentvsSolvingforθ(“normalequation”)
Usingmatricestoprocessmultipletrainingcasesatonce
V.OctaveTutorial
Week3
VI.LogisticRegression
Classificationproblems
Two-class(orbinary-class)classification
Logisticregression
Decisionboundary
Cost-functionforlogisticregression
Gradientdescentforlogisticregression
Advancedoptimizationalgorithmsandconcepts
Multi-classclassification
VII.Regularization-theproblemofoverfitting
Underfittingvs.Overfitting
Addressingoverfitting
Regularization
Howtopickλ(lambda)
Lineargradientdescentwithregularization
Normalequationwithregularization
Logisticgradientdescentwithregularization
Week4
VIII.NeuralNetworks:
Representation
Neuralnetworks
Neuronsmodelledaslogisticunits
Neuralnetwork
Notation
Calculatingthehypothesisforasampleneuralnetwork
Vectorizedforwardpropagation
Week5
IX.NeuralNetworks:
Learning
Costfunctionformulti-classclassificationneuralnetwork
Forwardpropagation
Minimizingcostfunctionforneuralnetworks:
back-propagation
Back-propagationformultipletrainingsamples
Back-propagationintuition...
Useofadvancedminimumcostoptimizationalgorithms
Numericalgradientchecking
InitialvaluesofΘ
Networkarchitecture
Stepsintraininganeuralnetwork
Week6
X.AdviceforApplyingMachineLearning
Whattodowhenyougetunacceptablylargeerrorsafterlearning
Machinelearningdiagnostic
Evaluatingthehypothesisfunction
Calculatingmisclassificationerrorrate
Cross-validation-evaluatingalternativehypothesisfunctionmodels
Distinguishhighbiasfromhighvariance(underfittingvs.overfitting)
Choosingtheregularizationparameterλ
Toosmalltrainingset?
Learningcurves
Selectingmodelforneuralnetworks
XI.MachineLearningSystemDesign
Improvingamachinelearningsystem-whattoprioritize
Recommendedapproachforbuildinganewmachinelearningsystem
Erroranalysis
Errormeasureforskewedclasses
Predictionmetrics:
precisionandrecall
Predictionmetrics:
averageandF1score
Week7
X.SupportVectorMachines
Week8
XIII.Clustering
Typesofunsupervisedlearning
Notation
K-meansclusteringalgorithm
K-meanscostfunction(distortionfunction)
PracticalconsiderationsforK-means
XIV.DimensionalityReduction
Datacompression(datadimensionalityreduction)
Principalcomponentanalysis(PCA)
HowtochoosekforPCA
Decompressing(reconstructing)PCR-compresseddata
MoreaboutPCA
BaduseofPCA:
topreventoverfitting
RecommendationonapplyingPCA
Week9
XV.AnomalyDetection
Examplesofanomalydetection
Howanomalydetectionworks
Frauddetection
Gaussian(Normal)distribution
Densityestimation
Anomalydetectionalgorithm
Trainingtheanomalydetectionalgorithm
Evaluatingtheanomalydetectionalgorithm
XVI.RecommenderSystems
Week10
XVII.LargeScaleMachineLearning
XVIII.ApplicationExample:
PhotoOCR
Usefulresources
Week1
I.Introduction
Typesofmachinelearning
●Supervisedlearning(the“rightanswer”isprovidedasinput,inthe“trainingset”)
○Regressionproblem(expectedoutputisreal-value)
○Classificationproblem(answerisaclass,suchasyesorno)
●Unsupervisedlearning
II.LinearRegressionwithOneVariable
Notation
m:
Numberoftrainingexamples
x’s:
“input”variables(features)
y’s:
“output”(targets)variables
(x,y):
onetrainingexample
(x(i),y(i)):
ithtrainingexample
h():
function(t)foundbythelearningalgorithm
θ:
(theta)the“parameters”usedinh()togetherwiththefeaturesx
hθ():
notjustgenerallythefunctionh(),butspecificallyparameterizedwithθ
J(θ):
thecostfunctionofhθ()
n:
numberoffeatures(inputs)
x(i):
inputs(features)oftheithtrainingexample
xj(i):
thevalueoffeaturejintheithtrainingexample
λ:
(lambda)regularizationparameter
:
=meansassignmentinalgorithm,ratherthanmathematicalequality
Possibleh(“hypothesis”)functions
●Linearregressionwithonevariable(a.k.a.univariatelinearregression):
○
Shorthand:
h(x)Whereθ0andθ1are“parameters”
●Linearregressionwithmultiplevariables(a.k.a.multivariatelinearregression):
○
(fornfeatures)
●Polynomialregression
○
(e.g.byjustmakingupnewfeaturesthatarethesquareandcubeofanexistingfeature)
Costfunction
ThecostfunctionJ()evaluateshowcloseh(x)matchygiventheparametersfindingtheparametersθusedbyh().Forlinearregression(herewithonefeaturex)
pickθ0andθ1sothathθ(x)isclosetoyforourtrainingexamples(x,y)
i.e.minimizethe“sumofsquareerrors”costfunction:
Bytakingthesquareoftheerror(i.e.hθ(x)-y),weavoidhavingtoosmallresultsfromhθcancellingouttoolargeresults(as-12==1),thusyieldingatruer“cost”oftheerrors.
N.B.
”makessomeofthematheasier”(seeexplanationwhy).
“Squarederror”costfunction:
areasonablechoiceforcostfunction.Themostcommononeforregressionproblems.
Gradientdescent
Iterativealgorithmforfindingalocalminimumforthecostfunction.Worksforlinearregressionwithanynumberofparameters,butalsootherkindsof“hypotheses”functions.ScalesbetterforcaseswithlargenumberoffeaturesthansolvingfortheoptimalminJ()
(forj=0andj=1,repeatuntilconvergence)
where
isthe“parameter”inhθthatisusedforfeaturej
isthelearningrate
isthepartialderivative(slope)atthecurrentpointθj
isthecostfunction(inthiscasewithtwoparameters,forcaseswithonlyonefeature)
N.B.updateθ0andθ1simultaneously!
(i.e.asoneatomicoperation)
Learningrate
Thesize
ofeachstepwheniteratingtofindasolution.N.B.noneedtovary
betweeniterations.Gradientdescentwillnaturallytakesmallerandsmallerstepsthecloserwegettoasolution.
Gradientdescentfortwo-parameterlinearregression(repeatuntilconvergence)
forj=0andj=1
simplifiesto
“Batch”gradientdescent
Justmeanseachiterationofthegradientisappliedtoallthetrainingexamples(ina“batch”).
III.LinearAlgebraRevision(optional)
[...]
Week2
IV.LinearRegressionwithMultipleVariables
Largenumberoffeature
Forproblemsinvolvingmany“features”(i.e.x1,x2,x3...xn)linearalgebravectornotationismoreefficient.
Multi-parameterlinearregressioninvectornotation
forconvenienceofnotation,andtoallowuseofvectormultiplication,definea0thfeaturex0=1,thuswecanwrite
Now,defininga((n+1)×1)vectorxcontainingallthefeaturesanda((n+1)×1)vectorθcontainingalltheparametersforthehypothesisfunctionhθ,wecanefficientlymultiplythetwo(yieldingascalarresult)ifwefirsttranspose(rotate)theθintoθT
hθ(x)=θTxinOctave:
theta’*x
Pickingfeatures
Useyourdomaininsightsandintuitionstopickfeatures.E.g.derivingacombinedfeaturemighthelp.Therearealsoautomaticalgorithmsforpickingfeatures.
Costfunctionformultiplefeatures
Forn+1features(wherex0=1)thecombinedcostfunctionovermtrainingsampleswillbe
whichreallymeans(notethatistartsfrom1andjstartsfrom0)
Gradientdescentforthecostfunctionofmulti-parameterlinearregression
withthenumberoffeaturesn>=1andx0=1,oneiterationjofthegradientdescentis
inOctave:
theta=theta-alpha*(1/m)*sum((theta'*x-y)*x)
thus(atomicallyupdatingθjforj=0,...,n)
...
Practicalconsiderationsforgradientdescent
Featurescaling
Makesurethevaluesineachgroupofpropertiesxnareonthesameorder(i.e.scalesomegroupsifnecessary),orthegradientdescentwilltakealongtimetoconverge(becausethecontourplotistooelongated).Typically,makeallgroupscontainvalues
Meannormalization
Tomakesurethevaluesforfeaturexirangebetween-1and+1,replacexiwithxi-μiwhereμiisthemeanofallvaluesinthetrainingsetforfeaturexi(excludingx0asitisalways1)
Sotogether,
whereμiisthemeanofallvaluesforthepropertyinthetrainingsetandsiistherangeofvalues(i.e.max(x)-min(x))forthepropertyi.
Howtopicklearningrateα
Thenumberofiterationsbeforelineardescentconvergescanvaryalot(anythingbetween30and3millionisnormal).
Tomakesurethelineardescentworks,plottherunning-minimumofthecostfunctionJ(θ)andmakesureitdecreaseforeachiteration.
Automaticconvergencetest;e.g.declareconvergenceifJ(θ)changebylessthan10-3inoneiteration.Howeverlookingattheplotisusuallybetter.
IfJ(θ)isincreasingratherthandecreasing(oroscillating)thentheusualreasonisthatαistoobig.
ToosmallαwillresultistooslowchangeinJ(θ).
Tryingtodetermineαheuristically,trystepsof≈x3,so0.001,0.003,0.01,0.03,0.1,0.3,...
Polynomialregression
Ifasimplestraightlinedoesnotfitthetrainingdatawell,thenpolynomialregressioncanbeused.
Justdefinesomenewfeaturethatarethe
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- CourseraMachineLearning 机器 学习 课程 笔记