Tải bản đầy đủ (.pdf) (218 trang)

Learning apache mahout classification by ashish gupta

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (4.49 MB, 218 trang )



LearningApacheMahoutClassification


TableofContents
LearningApacheMahoutClassification
Credits
AbouttheAuthor
AbouttheReviewers
www.PacktPub.com
Supportfiles,eBooks,discountoffers,andmore
Whysubscribe?
FreeaccessforPacktaccountholders
Preface
Whatthisbookcovers
Whatyouneedforthisbook
Whothisbookisfor
Conventions
Readerfeedback
Customersupport
Downloadingtheexamplecode
Downloadingthecolorimagesofthisbook
Errata
Piracy
Questions
1.ClassificationinDataAnalysis
Introducingtheclassification
Applicationoftheclassificationsystem
Workingoftheclassificationsystem
Classificationalgorithms


Modelevaluationtechniques
Theconfusionmatrix
TheReceiverOperatingCharacteristics(ROC)graph
AreaundertheROCcurve


Theentropymatrix
Summary
2.ApacheMahout
IntroducingApacheMahout
AlgorithmssupportedinMahout
ReasonsforMahoutbeingagoodchoiceforclassification
InstallingMahout
BuildingMahoutfromsourceusingMaven
InstallingMaven
BuildingMahoutcode
SettingupadevelopmentenvironmentusingEclipse
SettingupMahoutforaWindowsuser
Summary
3.LearningLogisticRegression/SGDUsingMahout
Introducingregression
Understandinglinearregression
Costfunction
Gradientdescent
Logisticregression
StochasticGradientDescent
UsingMahoutforlogisticregression
Summary
4.LearningtheNaïveBayesClassificationUsingMahout
IntroducingconditionalprobabilityandtheBayesrule

UnderstandingtheNaïveBayesalgorithm
Understandingthetermsusedintextclassification
UsingtheNaïveBayesalgorithminApacheMahout
Summary
5.LearningtheHiddenMarkovModelUsingMahout
Deterministicandnondeterministicpatterns
TheMarkovprocess


IntroducingtheHiddenMarkovModel
UsingMahoutfortheHiddenMarkovModel
Summary
6.LearningRandomForestUsingMahout
Decisiontree
Randomforest
UsingMahoutforRandomforest
StepstousetheRandomforestalgorithminMahout
Summary
7.LearningMultilayerPerceptronUsingMahout
Neuralnetworkandneurons
MultilayerPerceptron
MLPimplementationinMahout
UsingMahoutforMLP
StepstousetheMLPalgorithminMahout
Summary
8.MahoutChangesintheUpcomingRelease
Mahoutnewchanges
MahoutScalaandSparkbindings
ApacheSpark
UsingMahout’sSparkshell

H2Oplatformintegration
Summary
9.BuildinganE-mailClassificationSystemUsingApacheMahout
Spame-maildataset
CreatingthemodelusingtheAssassindataset
Programtouseaclassifiermodel
Testingtheprogram
Secondusecaseasanexercise
TheASFe-maildataset
Classifierstuning


Summary
Index



LearningApacheMahoutClassification



LearningApacheMahoutClassification
Copyright©2015PacktPublishing
Allrightsreserved.Nopartofthisbookmaybereproduced,storedinaretrievalsystem,
ortransmittedinanyformorbyanymeans,withoutthepriorwrittenpermissionofthe
publisher,exceptinthecaseofbriefquotationsembeddedincriticalarticlesorreviews.
Everyefforthasbeenmadeinthepreparationofthisbooktoensuretheaccuracyofthe
informationpresented.However,theinformationcontainedinthisbookissoldwithout
warranty,eitherexpressorimplied.Neithertheauthor,norPacktPublishing,andits
dealersanddistributorswillbeheldliableforanydamagescausedorallegedtobecaused

directlyorindirectlybythisbook.
PacktPublishinghasendeavoredtoprovidetrademarkinformationaboutallofthe
companiesandproductsmentionedinthisbookbytheappropriateuseofcapitals.
However,PacktPublishingcannotguaranteetheaccuracyofthisinformation.
Firstpublished:February2015
Productionreference:1210215
PublishedbyPacktPublishingLtd.
LiveryPlace
35LiveryStreet
BirminghamB32PB,UK.
ISBN978-1-78355-495-9
www.packtpub.com



Credits
Author
AshishGupta
Reviewers
SivaPrakash
TharinduRusira
VishnuViswanath
CommissioningEditor
AkramHussain
AcquisitionEditor
ReshmaRaman
ContentDevelopmentEditor
MerwynD’souza
TechnicalEditors
MonicaJohn

NovinaKewalramani
ShrutiRawool
CopyEditors
SarangChari
GladsonMonteiro
AartiSaldanha
RashmiSawant
ProjectCoordinator
NehaBhatnagar
Proofreaders
SimranBhogal
SteveMaguire
Indexer
MonicaAjmeraMehta
Graphics
SheetalAute


AbhinashSahu
ProductionCoordinator
ConidonMiranda
CoverWork
ConidonMiranda



AbouttheAuthor
AshishGuptahasbeenworkinginthefieldofsoftwaredevelopmentforthelast8years.
Hehasworkedindifferentcompanies,suchasSAPLabsandCaterpillar,asasoftware
developer.Whileworkingforastart-upwherehewasresponsibleforpredictingpotential

customersfornewfashionapparelsusingsocialmedia,hedevelopedaninterestinthe
fieldofmachinelearning.Sincethen,hehasworkedonusingbigdatatechnologiesand
machinelearningfordifferentindustries,includingretail,finance,insurance,andsoon.
Hehasapassionforlearningnewtechnologiesandsharingtheknowledgethusgained
withothers.HehasorganizedmanybootcampsfortheApacheMahoutandHadoop
ecosystem.
Firstofall,Iwouldliketothankopensourcecommunitiesfortheircontinuouseffortsin
developinggreatsoftwareforall.IwouldliketothankMerwynD’SouzaandReshma
Raman,myeditorsforthisproject.Specialthankstothereviewersofthisbook.
Nothingcanbeaccomplishedwithoutthesupportoffamily,friends,andlovedones.I
wouldliketothankmyfriends,family,andespeciallymywifeandmysonfortheir
continuoussupportthroughoutthewritingofthisbook.



AbouttheReviewers
SivaPrakashisworkingasatechleadinBangalore.Hehasextensivedevelopment
experienceintheanalysis,design,development,implementation,andmaintenanceof
variousdesktop,mobile,andweb-basedapplications.Helovestrekking,traveling,music,
readingbooks,andblogging.
YoucanfindhimonLinkedInat />TharinduRusiraiscurrentlyacomputerscienceandengineeringundergraduateatthe
UniversityofMoratuwa,SriLanka.Asastudentresearcher,hehasstronginterestsin
machinelearning,compilers,andhigh-performancecomputing.
Tharinduhasalsoworkedasaresearchanddevelopmentsoftwareengineeringinternat
ZaiziAsia(Pvt)Ltd.,wherehefirststartedusingApacheMahoutduringthe
implementationofanenterprise-levelcontentmanagementandinformationretrieval
system.
HeseesthepotentialofApacheMahoutasascalablemachinelearninglibraryfor
industry-levelimplementationsandhasevencontributedtotheMahout0.9release,the
lateststablereleaseofMahout.

HeisavailableonLinkedInat />VishnuViswanathisaseniorbigdatadeveloperwhohasmanyyearsofindustrial
expertiseinthearenaofmachinelearning.Heisatechenthusiastandispassionateabout
bigdataandhasexpertiseonmostbig-data-relatedtechnologies.
YoucanfindhimonLinkedInat />


www.PacktPub.com


Supportfiles,eBooks,discountoffers,and
more
Forsupportfilesanddownloadsrelatedtoyourbook,pleasevisitwww.PacktPub.com.
DidyouknowthatPacktofferseBookversionsofeverybookpublished,withPDFand
ePubfilesavailable?YoucanupgradetotheeBookversionatwww.PacktPub.comandas
aprintbookcustomer,youareentitledtoadiscountontheeBookcopy.Getintouchwith
usat<>formoredetails.
Atwww.PacktPub.com,youcanalsoreadacollectionoffreetechnicalarticles,signup
forarangeoffreenewslettersandreceiveexclusivediscountsandoffersonPacktbooks
andeBooks.

/>DoyouneedinstantsolutionstoyourITquestions?PacktLibisPackt’sonlinedigital
booklibrary.Here,youcansearch,access,andreadPackt’sentirelibraryofbooks.


Whysubscribe?
FullysearchableacrosseverybookpublishedbyPackt
Copyandpaste,print,andbookmarkcontent
Ondemandandaccessibleviaawebbrowser



FreeaccessforPacktaccountholders
IfyouhaveanaccountwithPacktatwww.PacktPub.com,youcanusethistoaccess
PacktLibtodayandview9entirelyfreebooks.Simplyuseyourlogincredentialsfor
immediateaccess.



Preface
Thankstotheprogressmadeinthehardwareindustries,ourstoragecapacityhas
increased,andbecauseofthis,therearemanyorganizationswhowanttostorealltypesof
eventsforanalyticspurposes.Thishasgivenbirthtoaneweraofmachinelearning.The
fieldofmachinelearningisverycomplexandwritingthesealgorithmsisnotapieceof
cake.ApacheMahoutprovidesuswithreadymadealgorithmsintheareaofmachine
learningandsavesusfromthecomplextaskofalgorithmimplementation.
TheintentionofthisbookistocoverclassificationalgorithmsavailableinApache
Mahout.Whetheryouhavealreadyworkedonclassificationalgorithmsusingsomeother
toolorarecompletelynewtothefield,thisbookwillhelpyou.So,startreadingthisbook
toexploretheclassificationalgorithmsinoneofthemostpopularopensourceprojects
whichenjoysstrongcommunitysupport:ApacheMahout.


×