Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (8.76 MB, 372 trang )
<span class='text_page_counter'>(1)</span><div class='page_container' data-page=1></div>
<span class='text_page_counter'>(2)</span><div class='page_container' data-page=2></div>
<span class='text_page_counter'>(3)</span><div class='page_container' data-page=3></div>
<span class='text_page_counter'>(4)</span><div class='page_container' data-page=4></div>
<span class='text_page_counter'>(5)</span><div class='page_container' data-page=5></div>
<span class='text_page_counter'>(6)</span><div class='page_container' data-page=6>
Chapter14:OrganizationalRamifications
ChiefDataMonetizationOfficer
Privacy,Trust,andDecisionGovernance
UnleashingOrganizationalCreativity
Summary
HomeworkAssignment
Notes
Chapter15:Stories
CustomerandEmployeeAnalytics
ProductandDeviceAnalytics
NetworkandOperationalAnalytics
CharacteristicsofaGoodBusinessStory
Summary
HomeworkAssignment
Notes
Chapter1:TheBigDataBusinessMandate
Figure1.1BigDataBusinessModelMaturityIndex
Figure1.2Moderndata/analyticsenvironment
Chapter2:BigDataBusinessModelMaturityIndex
Figure2.1BigDataBusinessModelMaturityIndex
Figure2.3Packagingandsellingaudienceinsights
Figure2.4Optimizeinternalprocesses
Figure2.5Createnewmonetizationopportunities
Chapter3:TheBigDataStrategyDocument
Figure3.1Bigdatastrategydecompositionprocess
Figure3.2Bigdatastrategydocument
Figure3.3Chipotle's2012lettertotheshareholders
Figure3.4Chipotle's“increasesamestoresales”businessinitiative
Figure3.5Chipotlekeybusinessentitiesanddecisions
Figure3.6CompletedChipotlebigdatastrategydocument
Figure3.7BusinessvalueofpotentialChipotledatasources
Figure3.8ImplementationfeasibilityofpotentialChipotledatasources
Figure3.9Chipotleprioritizationofusecases
Figure3.10SanFranciscoGiantsbigdatastrategydocument
Figure3.11Chipotle'ssamestoresalesresults
Chapter4:TheImportanceoftheUserExperience
Figure4.1Originalsubscribere-mail
Figure4.2Improvedsubscribere-mail
Figure4.3Actionablesubscribere-mail
Figure4.5TraditionalBusinessIntelligencedashboard
Figure4.6Actionablestoremanagerdashboard
Figure4.9Localeventsusecase
Figure4.10Localweatherusecase
Figure4.11Financialadvisordashboard
Figure4.12Clientpersonalinformation
Figure4.13Clientfinancialinformation
Figure4.14Clientfinancialgoals
Figure4.15Financialcontributionsrecommendations
Figure4.16Spendanalysisandrecommendations
Figure4.17Assetallocationrecommendations
Figure4.18Otherinvestmentrecommendations
Chapter5:DifferencesBetweenBusinessIntelligenceandDataScience
Figure5.1SchmarzoTDWIkeynote,August2008
Figure5.2OaklandA'sversusNewYorkYankeescostperwin
Figure5.3BusinessIntelligenceversusdatascience
Figure5.4CRISP:CrossIndustryStandardProcessforDataMining
Figure5.5BusinessIntelligenceengagementprocess
Figure5.6TypicalBItoolgraphicoptions
Figure5.7Datascientistengagementprocess
Figure5.8Measuringgoodnessoffit
Figure5.9Dimensionalmodel(starschema)
Figure5.10UsingflatfilestoeliminateorreducejoinsonHadoop
Figure5.11Samplecustomeranalyticprofile
Figure5.12Improvecustomerretentionexample
Chapter6:DataScience101
Figure6.1Basictrendanalysis
Figure6.2Compoundtrendanalysis
Figure6.3Trendlineanalysis
Figure6.4Boxplotanalysis
Figure6.5Geographical(spatial)trendanalysis
Figure6.6Pairsplotanalysis
Figure6.8Clusteranalysis
Figure6.9Normalcurveequivalentanalysis
Figure6.10Normalcurveequivalentsellerpricinganalysisexample
Figure6.11Associationanalysis
Figure6.12Convertingassociationrulesintosegments
Figure6.13Graphanalysis
Figure6.14Textmininganalysis
Figure6.15Sentimentanalysis
Figure6.16Traversepatternanalysis
Figure6.17Decisiontreeclassifieranalysis
Figure6.18Cohortsanalysis
Chapter7:TheDataLake
Figure7.1Characteristicsofadatalake
Figure7.2Theanalyticsdilemma
Figure7.3Thedatalakelineofdemarcation
Figure7.4CreateaHadoop-baseddatalake
Figure7.5Createananalyticsandbox
Figure7.6MoveETLtothedatalake
Figure7.7HubandSpokeanalyticsarchitecture
Figure7.8Datascienceengagementprocess
Figure7.9Whatdoesthefuturehold?
Figure7.10EMCFederationBusinessDataLake
Chapter8:ThinkingLikeaDataScientist
Figure8.1FootLocker'skeybusinessinitiatives
Figure8.2ExamplesofFootLocker'sin-storemerchandising
Figure8.3FootLocker'sstoremanagerpersona
Figure8.4FootLocker'sstrategicnounsorkeybusinessentities
Figure8.7FootLocker'srecommendationsworksheet
Figure8.9Thinkinglikeadatascientistdecompositionprocess
Chapter9:“By”AnalysisTechnique
Figure9.1Identifyingmetricsthatmaybebetterpredictorsofperformance
Figure9.2NBAshootingeffectiveness
Figure9.3LeBronJames'sshootingeffectiveness
Chapter10:ScoreDevelopmentTechnique
Figure10.1FICOscoreconsiderations
Figure10.2FICOscoredecisionrange
Figure10.3Recommendationsworksheet
Figure10.4Updatedrecommendationsworksheet
Figure10.5Completedrecommendationsworksheet
Figure10.6PotentialFootLockercustomerscores
Figure10.7FootLockerrecommendationsworksheet
Figure10.8CLTVbasedonsales
Figure10.9MorepredictiveCLTVscore
Chapter11:MonetizationExercise
Figure11.1“Adayinthelife”customerpersona
Figure11.2Fitnesstrackerprioritization
Figure11.3Monetizationroadmap
Chapter12:MetamorphosisExercise
Figure12.1BigDataBusinessModelMaturityIndex
Figure12.2Patientactionableanalyticprofile
Chapter13:PowerofEnvisioning
Figure13.1BigDataVisionWorkshopprocessandtimeline
Figure13.2BigDataVisionWorkshopillustrativeanalytics
Figure13.3BigDataVisionWorkshopuserexperiencemock-up
Figure13.4PrioritizeHealthcareSystems'susecases
Figure13.5Prioritizationmatrixtemplate
Figure13.6Prioritizationmatrixprocess
Chapter14:OrganizationalRamifications
Chapter1:TheBigDataBusinessMandate
Table1.1ExploitingTechnologyInnovationtoCreateEconomic-Driven
BusinessOpportunities
Table1.2EvolutionoftheBusinessQuestions
Chapter2:BigDataBusinessModelMaturityIndex
Table2.1BigDataBusinessModelMaturityIndexSummary
Chapter3:TheBigDataStrategyDocument
Table3.1MappingChipotleUseCasestoAnalyticModels
Chapter5:DifferencesBetweenBusinessIntelligenceandDataScience
Table5.1BIAnalystVersusDataScientistCharacteristics
Chapter6:DataScience101
Table6.12014–2015TopNBARPMRankings
Table6.2CaseStudySummary
Chapter7:TheDataLake
Table7.1DataLakeDataTypes
Chapter8:ThinkingLikeaDataScientist
Table8.1EvolutionofFootLocker'sBusinessQuestions
Chapter9:“By”AnalysisTechnique
Table9.1LeBronJames'sShootingPercentages
Chapter10:ScoreDevelopmentTechnique
Table10.1PotentialScoresforOtherIndustries
Chapter11:MonetizationExercise
Table11.1PotentialFitnessTrackerRecommendations
Table11.2RecommendationDataRequirements
Table11.3RecommendationsValueVersusFeasibilityAssessment
Chapter12:MetamorphosisExercise
Thedayswhenbusinessstakeholderscouldrelinquishcontrolofdataand
analyticstoITareover.Thebusinessstakeholdersmustbefrontandcenterin
championingandmonetizingtheorganization'sdatacollectionandanalysis
efforts.Businessleadersneedtounderstandwhereandhowtoleveragebigdata,
exploitingthecollisionofnewsourcesofcustomer,product,andoperationaldata
coupledwithdatasciencetooptimizekeybusinessprocesses,uncovernew
monetizationopportunities,andcreatenewsourcesofcompetitivedifferentiation.
Andwhileit'snotrealistictoconvertyourbusinessusersintodatascientists,it's
<i>criticalthatweteachthebusinessuserstothinklikedatascientistssotheycan</i>
collaboratewithITandthedatascientistsonusecaseidentification,requirements
definition,businessvaluation,andultimatelyanalyticsoperationalization.
Thisbookprovidesabusiness-hardenedframeworkwithsupportingmethodology
andhands-onexercisesthatnotonlywillhelpbusinessuserstoidentifywhere
andhowtoleveragebigdataforbusinessadvantagebutwillalsoprovide
<b>PartI:BusinessPotentialofBigData.</b>PartIincludesChapters1through
4andsetsthebusiness-centricfoundationforthebook.HereiswhereI
introducetheBigDataBusinessModelMaturityIndexandframethebigdata
discussionaroundtheperspectivethat“organizationsdonotneedabigdata
strategyasmuchastheyneedabusinessstrategythatincorporatesbigdata.”
<b>PartII:DataScience.</b>PartIIincludesChapters5through7andcoversthe
principlebehinddatascience.Thesechaptersintroducesomedatascience
basicsandexplorethecomplementarynatureofBusinessIntelligenceanddata
scienceandhowthesetwodisciplinesarebothcomplementaryanddifferentin
theproblemsthattheyaddress.
<b>PartIII:DataScienceforBusinessStakeholders.</b>PartIIIincludes
Chapters8through12andseekstoteachthebusinessusersandbusiness
leadersto“thinklikeadatascientist.”Thispartintroducesamethodologyand
severalexercisestoreinforcethedatasciencethinkingandapproach.Ithasa
lotofhands-onwork.
Thisbookistargetedtowardbusinessusersandbusinessmanagement.Iwrote
thisbooksothatIcoulduseitinteachingmyBigDataMBAclass,soincludedall
ofthehands-onexercisesandtemplatesthatmystudentswouldneedto
successfullyearntheirBigDataMBAgraduationcertificate.
<i>Ithinkfolkswouldbenefitbyalsoreadingmyfirstbook,BigData:</i>
Youcandownloadthe“ThinkingLikeaDataScientist”workbookfromthebook's
websiteatwww.wiley.com/go/bigdatamba.Andoh,theremightbeanothersurprise
AsstudentsfrommyclassatUSFhavetoldme,thismaterialallowsthemtotakea
problemorchallengeanduseawell-thought-outprocesstodrivecross-organizationalcollaborationtocomeupwithideastheycanturnintoactions
Chapters1through4setthefoundationfordrivingbusinessstrategieswithdata
science.Inparticular,theBigDataBusinessModelMaturityIndexhighlightsthe
realmofwhat'spossiblefromabusinesspotentialperspectivebyprovidingaroad
mapthatmeasurestheeffectivenessofyourorganizationtoleveragedataand
analyticstopoweryourbusinessmodels.
Chapter1:TheBigDataBusinessMandate
Chapter2:BigDataBusinessModelMaturityIndex
Chapter3:TheBigDataStrategyDocument
<i>Havingtroublegettingyourseniormanagementteamtounderstandthe</i>
<i>businesspotentialofbigdata?Can'tgetyourmanagementleadershipto</i>
<i>considerbigdatatobesomethingotherthananITscienceexperiment?Are</i>
<i>understandinghowdataandanalyticscanpowertheirtopinitiatives?</i>
<i>Ifso,thenthis“BigDataSeniorExecutiveCarePackage”isforyou!</i>
<i>Andforalimitedtime,yougetanunlimitedlicensetosharethiscare</i>
<i>packagewithasmanyseniorexecutivesasyoudesire.Butyoumustact</i>
<i>NOW!Becomethelifeofthecompanypartieswithyourextensive</i>
<i>knowledgeofhownewcustomer,product,andoperationalinsightscan</i>
<i>guideyourorganization'svaluecreationprocesses.Andmaybe,justmaybe,</i>
<i>getapromotionintheprocess!!</i>
<b>Figure1.1</b>BigDataBusinessModelMaturityIndex
TheBigDataBusinessModelMaturityIndexprovidesaroadmapforhow
organizationscanintegratedataandanalyticsintotheirbusinessmodels.TheBig
DataBusinessModelMaturityIndexiscomposedofthefollowingfivephases:
<b>Phase1:BusinessMonitoring.IntheBusinessMonitoringphase,</b>
organizationsareleveragingdatawarehousingandBusinessIntelligenceto
monitortheorganization'sperformance.
<b>Phase2:BusinessInsights.TheBusinessInsightsphaseisabout</b>
leveragingpredictiveanalyticstouncovercustomer,product,andoperational
insightsburiedinthegrowingwealthofinternalandexternaldatasources.In
thisphase,organizationsaggressivelyexpandtheirdataacquisitioneffortsby
couplingalloftheirdetailedtransactionalandoperationaldatawithinternal
datasuchasconsumercomments,e-mailconversations,andtechniciannotes,
aswellasexternalandpubliclyavailabledatasuchassocialmedia,weather,
traffic,economic,demographics,homevalues,andlocaleventsdata.
<b>Phase3:BusinessOptimization.IntheBusinessOptimizationphase,</b>
organizationsapplyprescriptiveanalyticstothecustomer,product,and
operationalinsightsuncoveredintheBusinessInsightsphasetodeliver
actionableinsightsorrecommendationstofrontlineemployees,business
managers,andchannelpartners,aswellascustomers.ThegoaloftheBusiness
Optimizationphaseistoenableemployees,partners,andcustomersto
optimizetheirkeydecisions.
<b>Phase4:DataMonetization.IntheDataMonetizationphase,</b>
organizationsleveragethecustomer,product,andoperationalinsightsto
createnewsourcesofrevenue.Thiscouldincludesellingdata—orinsights—
intonewmarkets(acellularphoneprovidersellingcustomerbehavioraldata
toadvertisers),integratinganalyticsintoproductsandservicestocreate
<b>Phase5:BusinessMetamorphosis.TheholygrailoftheBigData</b>
BusinessModelMaturityIndexiswhenanorganizationtransitionsits
businessmodelfromsellingproductstoselling“business-as-a-service.”Think
GEselling“thrust”insteadofjetengines.ThinkJohnDeereselling“farming
optimization”insteadoffarmingequipment.ThinkBoeingselling“airmiles”
insteadofairplanes.Andintheprocess,theseorganizationswillcreatea
platformenablingthird-partydeveloperstobuildandmarketsolutionsontop
oftheorganization'sbusiness-as-a-servicebusinessmodel.
Ultimately,bigdataonlymattersifithelpsorganizationsmakemoremoneyand
improveoperationaleffectiveness.Examplesincludeincreasingcustomer
acquisition,reducingcustomerchurn,reducingoperationalandmaintenance
costs,optimizingpricesandyield,reducingrisksanderrors,improving
compliance,improvingthecustomerexperience,andmore.
innovationslikeHadoopandSpark,therealdiscussionshouldbeaboutthe
economicimpactofbigdata.Newtechnologiesdon'tdisruptbusinessmodels;it's
whatorganizationsdowiththesenewtechnologiesthatdisruptsbusinessmodels
andenablesnewones.Let'sreviewanexampleofonesucheconomic-driven
businesstransformation:thesteamengine.
Thesteamengineenabledurbanization,industrialization,andtheconqueringof
newterritories.Itliterallyshrankdistanceandtimebyreducingthetimerequired
tomovepeopleandgoodsfromonesideofacontinenttotheother.Thesteam
engineenabledpeopletoleavelow-payingagriculturaljobsandmoveintocities
forhigher-payingmanufacturingandclericaljobsthatledtoahigherstandardof
living.
Forexample,citiessuchasLondonshotupintermsofpopulation.In1801,before
theadventofGeorgeStephenson'sRocketsteamengine,Londonhad1.1million
residents.Aftertheinvention,thepopulationofLondonmorethandoubledto2.7
millionresidentsby1851.Londontransformedthenucleusofsocietyfromsmall
tight-knitcommunitieswheretextileproductionandagriculturewereprevalent
intobigcitieswithavarietyofjobs.Thesteamlocomotiveprovidedquicker
transportationandmorejobs,whichinturnbroughtmorepeopleintothecities
anddrasticallychangedthejobmarket.By1861,only2.4percentofLondon's
manufacturingortransportationbusiness.Thesteamlocomotivewasamajor
turningpointinhistoryasittransformedsocietyfromlargelyruraland
agriculturalintourbanandindustrial.2
warehousing)andanalytics(datascience)environmentsdifferently.Thesetwo
environmentshaveverydifferentcharacteristicsandservedifferentpurposes.The
datalakecanmakebothoftheBIanddatascienceenvironmentsmoreagileand
moreproductive(Figure1.2).
<b>Figure1.2</b>Moderndata/analyticsenvironment
Chapter7(”TheDataLake“)introducestheconceptofadatalakeandtherole
thedatalakeplaysinsupportingyourexistingdatawarehouseandBusiness
Intelligenceinvestmentswhileprovidingthefoundationforyourdatascience
environment.Chapter7discusseshowthedatalakecanun-cuffyourdata
scientistsfromthedatawarehousetouncoverthosevariablesandmetricsthat
mightbebetterpredictorsofbusinessperformance.Italsodiscusseshowthe
datalakecanfreeupexpensivedatawarehouseresources,especiallythose
resourcesassociatedwithExtract,Transform,andLoad(ETL)dataprocesses.
Businessusershavebeentrainedtocontemplatebusinessquestionsthatmonitor
thecurrentstateofthebusinessandtofocusonretrospectivereportingonwhat
happened.BusinessusershavebecomeconditionedbytheirBIanddata
warehouseenvironmentstoonlyconsiderquestionsthatreportoncurrent
businessperformance,suchas“HowmanywidgetsdidIselllastmonth?”and
“Whatweremygrosssaleslastquarter?”
Unfortunately,thattypeofthinkinghasledtosiloeddatafiefdoms,siloed
decisions,andanun-empoweredandfrustratedbusinessteam.Organizations
needtothinkdifferentlyabouthowtheyempoweralloftheiremployees.
Organizationsneedtofindawaytopromoteandnurturecreativethinkingand
groundbreakingideasacrossalllevelsoftheorganization.Thereisnoedictthat
statesthatthebestideasonlycomefromseniormanagement.
Thekeytobigdatasuccessisempoweringcross-functionalcollaborationand
exploratorythinkingtochallengelong-heldorganizationalrulesofthumb,
heuristics,and“gut”decisionmaking.Thebusinessneedsanapproachthatis
inclusiveofallthekeystakeholders—IT,businessusers,businessmanagement,
channelpartners,andultimatelycustomers.Thebusinesspotentialofbigdatais
onlylimitedbythecreativethinkingoftheorganization.
Chapter13(“PowerofEnvisioning”)discusseshowtheBIanddatascience
teamscancollaboratetobrainstorm,test,andrefinenewvariablesthatmight
bebetterpredictorsofbusinessperformance.Wewillintroduceseveral
techniquesandconceptsthatcanbeusedtodrivecollaborationbetweenthe
businessandITstakeholdersandultimatelyhelpyourdatascienceteam
uncovernewcustomer,product,andoperationalinsightsthatleadtobetter
businessperformance.Chapter14(“OrganizationalRamifications”)
Usethefollowingexercisestoapplywhatyoulearnedinthischapter.
<b>Exercise#1:Identifyakeybusinessinitiativeforyourorganization,</b>
somethingthebusinessistryingtoaccomplishoverthenext9to12months.It
mightbesomethinglikeimprovecustomerretention,optimizecustomer
acquisition,reducecustomerchurn,optimizepredictivemaintenance,reduce
revenuetheft,andsoon.
<b>Exercise#2:Brainstormandwritedownwhat(1)customer,(2)product,and</b>
(3)operationalinsightsyourorganizationwouldliketouncoverinorderto
supportthetargetedbusinessinitiative.Startbycapturingthedifferenttypes
ofdescriptive,predictive,andprescriptivequestionsyou'dliketoanswerabout
thetargetedbusinessinitiative.Tip:Don'tworryaboutwhetherornotyou
havethedatasourcesyouneedtoderivetheinsightsyouwant(yet).
<b>Exercise#3:Brainstormandwritedowndatasourcesthatmightbeusefulin</b>
uncoveringthosekeyinsights.Lookbothinternallyandexternallyfor
1<sub>Hopkins,Brian,FatemehKhatibloowithKyleMcNabb,JamesStaten,Andras</sub>
Cser,HolgerKisker,Ph.D.,LeslieOwens,JenniferBelissent,Ph.D.,Abigail
Organizationsdonotunderstandhowfarbigdatacantakethemfromabusiness
transformationperspective.Organizationsdon'thaveawayofunderstanding
whattheultimatebigdataendstatewouldorcouldlooklikeoranswering
questionssuchas:
WhereandhowshouldIstartmybigdatajourney?
HowcanIcreatenewrevenueormonetizationopportunities?
HowdoIcomparetootherswithrespecttomyorganization'sadoptionofbig
dataasabusinessenabler?
HowfarcanIpushbigdatatopower—eventransform—mybusinessmodels?
<i><b>Tohelpaddressthesetypesofquestions,I'vecreatedtheBigDataBusiness</b></i>
<i><b>ModelMaturityIndex.Notonlycanorganizationscanusethisindexto</b></i>
understandwheretheysitwithrespecttootherorganizationsinexploitingbig
dataandadvancedanalyticstopowertheirbusinessmodels,buttheindex
providesaroadmaptohelporganizationsacceleratetheintegrationofdataand
analyticsintotheirbusinessmodels.
IntroducetheBigDataBusinessModelMaturityIndexasaframeworkfor
organizationstomeasurehoweffectivetheyareatleveragingdataand
analyticstopowertheirbusinessmodels
Discusstheobjectivesandcharacteristicsofeachofthefivephasesofthe
BigDataBusinessModelMaturityIndex:BusinessMonitoring,Business
Insights,BusinessOptimization,DataMonetization,andBusiness
Metamorphosis
<i><b>Discusshowtheeconomicsofbigdataandthefourbigdatavalue</b></i>
<i><b>driverscanenableorganizationstocrosstheanalyticschasmand</b></i>
advancepasttheBusinessMonitoringphaseintotheBusinessInsights
andBusinessOptimizationphases
Organizationsaremovingatdifferentpaceswithrespecttowhereandhowthey
areadoptingbigdataandadvancedanalyticstocreatebusinessvalue.Some
organizationsaremovingverycautiously,astheyareunclearastowhereandhow
tostartandwhichofthebevyofnewtechnologyinnovationstheyneedtodeploy
inordertostarttheirbigdatajourneys.Othersaremovingatamoreaggressive
pacebyacquiringandassemblingabigdatatechnologyfoundationbuiltonmany
newbigdatatechnologiessuchasHadoop,Spark,MapReduce,YARN,Mahout,
Hive,HBase,andmore.
However,aselectfewarelookingbeyondjustthetechnologytoidentifywhereand
howtheyshouldbeintegratingbigdataintotheirexistingbusinessprocesses.
monetizationopportunities;thatis,seekingoutbusinessopportunitieswherethey
can
Packageandselltheiranalyticinsightstoothers
Integrateadvancedanalyticsintotheirproductsandservicestocreate
“intelligent”products
Createentirelynewproductsandservicesthathelpthementernewmarkets
andtargetnewcustomers
Thesearethefolkswhorealizethattheydon'tneedabigdatastrategyasmuchas
theyneedabusinessstrategythatincorporatesbigdata.Andwhenorganizations
“flipthatbyte”onthefocusoftheirbigdatainitiatives,thebusinesspotentialis
almostboundless.
OrganizationscanusetheBigDataBusinessModelMaturityIndexasa
<b>Figure2.1</b>BigDataBusinessModelMaturityIndex
OrganizationstendtofindthemselvesinoneoffivephasesontheBigData
BusinessModelMaturityIndex:
<b>Phase1:BusinessMonitoring.IntheBusinessMonitoringphase,</b>
organizationsareapplyingdatawarehousingandBusinessIntelligence
techniquesandtoolstomonitortheorganization'sbusinessperformance(also
calledBusinessPerformanceManagement).
<b>Phase2:BusinessInsights.IntheBusinessInsightsphase,organizations</b>
<i>aggressivelyexpandtheirdataassetsbyamassingalloftheirdetailed</i>
transactionalandoperationaldataandcouplingthattransactionaland
operationaldatawithnewsourcesofinternaldata(e.g.,consumercomments,
e-mailconversations,techniciannotes)andexternaldata(e.g.,socialmedia,
weather,traffic,economic,data.gov)sources.OrganizationsintheBusiness
Insightsphasethenusepredictiveanalyticstouncovercustomer,product,and
operationalinsightsburiedinandacrossthesedatasources.
<b>Phase3:BusinessOptimization.IntheBusinessOptimizationphase,</b>
organizationsbuildonthecustomer,product,andoperationalinsights
uncoveredintheBusinessInsightsphasebyapplyingprescriptiveanalyticsto
optimizekeybusinessprocesses.OrganizationsintheBusinessOptimization
phasepushtheanalyticresults(e.g.,recommendations,scores,rules)to
frontlineemployeesandbusinessmanagerstohelpthemoptimizethetargeted
businessprocessthroughimproveddecisionmaking.TheBusiness
Optimizationphasealsoprovidesopportunitiesfororganizationstopush
analyticinsightstotheircustomersinordertoinfluencecustomerbehaviors.
AnexampleoftheBusinessOptimizationphaseisaretailerthatdelivers
analytic-basedmerchandisingrecommendationstothestoremanagersto
optimizemerchandisemarkdownsbasedonpurchasepatterns,inventory,
<b>Phase4:DataMonetization.TheDataMonetizationphaseiswhere</b>
theaveragecampaignperformanceincertainmarketsoncertaindaysof
theweek
Customersthatarereactingtwotothreestandarddeviationsoutsidethe
normintheirpurchasepatternsforcertainproductcategoriesincertain
weatherconditions
Supplierswhosecomponentsareoperatingoutsidetheupperorlower
limitsofacontrolchartinextremecoldweathersituations
Forthepredictiveanalyticstobeeffective,organizationsneedtobuild
detailedanalyticprofilesforeachindividualbusinessentity—customers,
patients,students,windturbines,jetengines,ATMs,etc.Thecreationand
roleofanalyticprofilesisatopiccoveredinChapter5,“DifferencesBetween
BusinessIntelligenceandDataScience.”
<i><b>BusinessInsightsPhaseChallenge</b></i>
TheBusinessInsightsphaseisthemostdifficultstageoftheBigDataBusiness
ModelMaturityIndexbecauseitrequiresorganizationsto“thinkdifferently”
abouthowtheyapproachdataandanalytics.Therules,techniques,and
approachesthatworkedintheBusinessIntelligenceanddatawarehouseworlds
donotnecessaryapplytotheworldofbigdata.Thisistrulythe“crossingthe
<b>Figure2.2</b>Crossingtheanalyticschasm
<b>Figure2.3</b>Packagingandsellingaudienceinsights
Integratinganalyticinsightsdirectlyintoanorganization'sproductsand
servicestocreate“intelligent”productsorservices,suchas:
Carsthatlearnacustomer'sdrivingpatternsandbehaviorsandadjust
drivercontrols,seats,mirrors,brakepedals,suspension,steering,
dashboarddisplays,etc.tomatchthecustomer'sdrivingstyle
TelevisionsandDVRsthatlearnwhattypesofshowsandmoviesa
customerlikesandsearchacrossthedifferentcableandInternetchannels
tofindandautomaticallyrecordsimilarshowsforthatcustomer
Ovensthatlearnhowacustomerlikescertainfoodspreparedandcooks
theminthatmannerautomaticallyandalsoincluderecommendationsfor
otherfoodsandrecipesthat“otherslikeyou”enjoy
Jetenginesthatcaningestweather,elevation,windspeed,andother
environmentaldatatomakeadjustmentstobladeangles,tilt,yaw,and
rotationspeedstominimizefuelconsumptionduringflight
Repackaginginsightstocreateentirelynewproductsandservicesthathelp
organizationstoenternewmarketsandtargetnewcustomersoraudiences.
Forexample,organizationscancapture,analyze,andpackagecustomer,
product,andoperationalinsightsacrosstheoverallmarketinordertohelp
channelpartnerstomoreeffectivelymarketandselltotheircustomers,such
Onlinedigitalmarketplaces(Yahoo,Google,eBay,Facebook)could
leveragegeneralmarkettrendsandothermerchantperformancedatato
providerecommendationstosmallmerchantsoninventory,ordering,
merchandising,marketing,andpricing.
patterns,localweather,localwaterquality,andlocalenvironmentalconditions
suchaslocalwaterconservationeffortsandenergycosts
Retailersmovingintothe“ShoppingOptimization”businessbyrecommending
specificproductsgivencustomers'currentbuyingpatternsascomparedwith
otherslikethem,includingrecommendationsforproductsthattheymaynot
evensell(think“Miracleon43rdStreet”)
Airlinesmovingintothe“TravelDelight”businessofnotonlyoffering
Therearesomeinterestinglessonsthatorganizationswilldiscoverasthey
progressthroughthephasesoftheBigDataBusinessModelMaturityIndex.
Understandingtheselessonsaheadoftimeshouldhelpprepareorganizationsfor
theirbigdatajourney.
ThefirstthreephasesoftheBigDataBusinessModelMaturityIndexseekto
extractmorefinancialorbusinessvalueoutoftheorganization'sinternal
processesorbusinessinitiatives.Thefirstthreephasesdrivebusinessvalueanda
ReturnonInvestment(ROI)byseekingtointegratenewsourcesofcustomer,
product,operational,andmarketdatawithadvancedanalyticstoimprovethe
decisionsthataremadeaspartoftheorganization'skeyinternalprocessand
businessinitiatives(seeFigure2.4).
<b>Figure2.4</b>Optimizeinternalprocesses
<i><b>Theinternalprocessoptimizationeffortsstartbyseekingtoleveragethe</b></i>
organization'sBusinessIntelligenceanddatawarehouseassets.Thisincludes
buildingonthedatawarehouse'sdatasources,dataextractionandenrichment
algorithms,dimensions,metrics,keyperformanceindicators,reports,and
1. Accesstoalltheorganization'sdetailedtransactionalandoperationaldata
atthelowestlevelofgranularity(attheindividualcustomer,machine,or
devicelevel).
2. Integrationofunstructureddatafrombothinternal(consumercomments,
e-mailthreads,techniciannotes)andexternalsources(socialmedia,
mobile,publiclyavailable)withthedetailedtransactionalandoperational
datatoprovidenewmetricsandnewdimensionsagainstwhichto
optimizekeybusinessprocesses.
3. Leveragereal-time(orright-time)dataanalysistoacceleratethe
organization'sabilitytoidentifyandactoncustomer,product,andmarket
4. Applypredictiveanalyticsanddataminingtouncovercustomer,product,
andoperationalinsightsorareasof“unusualness”buriedinthemassive
volumesofdetailedstructuredandunstructureddatathatareworthyof
furtherbusinessinvestigation.
Organizationsmustleveragethesefourbigdatavaluedriverstocrosstheanalytics
chasmbyuncoveringnewcustomer,product,andoperationalinsightsthatcanbe
usedtooptimizekeybusinessprocesses—whetherdeliveringactionable
recommendationstofrontlineemployeesandbusinessmanagersordelivering
“NextBestOffer”orrecommendationstodelightcustomersandbusiness
partners.
ThelasttwophasesoftheBigDataBusinessModelMaturityIndexarefocusedon
externalmarketopportunities;opportunitiestocreatenewmonetizationor
<b>Figure2.5</b>Createnewmonetizationopportunities
Thisisthepartofthebigdatajourneythatcatchesmostorganizations'attention:
theopportunitytoleveragetheinsightsgatheredthroughtheoptimizationoftheir
keybusinessprocessestocreatenewrevenueormonetizationopportunities.
Organizationsareeagertoleveragenewcorporateassets—data,analytics,and
businessinsights—inordertocreatenewsourcesofrevenue.Thisisthe“4Ms”
phaseoftheBigDataBusinessModelMaturityIndexwhereorganizationsfocus
Tofullyexploitthebigdataopportunity,subtleorganizationalandcultural
changeswillbenecessaryfortheorganizationtoadvancealongthematurity
index.Iforganizationsareseriousaboutintegratingdataandanalyticsintotheir
businessmodels,thenthreeorganizationalorculturaltransformationswillneed
totakeplace:
<b>1.TreatDataasanAsset.Organizationsmuststarttotreatdataasanasset</b>
tobenurturedandgrown,notacosttobeminimized.Organizationsmust
developaninsatiableappetiteformoreandmoredata—eveniftheyare
unclearastohowtheywillusethatdata.Thisisasignificantculturalchange
fromthedatawarehousedayswherewetreateddataasacosttobeminimized.
<b>2.LegallyProtectYourAnalyticsIntellectualProperty.Organizations</b>
mustputintoplaceformalprocessesandprocedurestocapture,track,refine,
andevenlegallyprotecttheiranalyticassets(e.g.,analyticmodels,data
enrichmentalgorithms,andanalyticresultssuchasscores,recommendations,
andassociationrules)askeyorganizationalintellectualproperty.Whilethe
underlyingtechnologiesmaychangeovertime,theresultingdataandanalytic
<b>3.GetComfortableUsingDatatoGuideDecisions.Business</b>
managementandbusinessusersmustgainconfidenceinusingdataand
analyticstoguidetheirdecisionmaking.Organizationsmustgetcomfortable
withmakingbusinessdecisionsbasedonwhatthedataandtheanalyticstell
themversusdefaultingtothe“HighestPaidPerson'sOpinion”(HIPPO).The
organization'sinvestmentsindata,analytics,people,processes,and
technologywillbefornaughtiftheorganizationisn'tpreparedtomake
decisionsbasedonwhatthedataandtheanalyticstellthem.Withthatsaid,
it'simportantthattheanalyticinsightsarepositionedas“recommendations”
thatbusinessusersandbusinessmanagementcanaccept,reject,ormodify.In
thatway,organizationscanleverageanalyticstoestablishorganizational
throughimproveddecisionmaking(orimprovedoperationaleffectivenessfor
non-profitorganizations).Bigdataholdsthepotentialtobothoptimizekey
businessprocessesandcreatenewmonetizationorrevenueopportunities.
Insummary:
TheBigDataBusinessModelMaturityIndexprovidesaframeworkfor
organizationstomeasurehoweffectivetheyareatleveragingdataand
analyticstopowertheirbusinessmodels.
ThefivephasesoftheBigDataBusinessModelMaturityIndexareBusiness
Monitoring,BusinessInsights,BusinessOptimization,DataMonetization,and
BusinessMetamorphosis.
Theeconomicsofbigdataandthefourbigdatavaluedriverscanenable
organizationstocrosstheanalyticschasm.
Usethefollowingexercisestoapplyandreinforcetheinformationpresentedin
thischapter:
<b>Exercise#1:Listtwoorthreeofyourorganization'skeybusinessprocesses.</b>
Thatis,writedowntwoorthreebusinessprocessesthatuniquelydifferentiate
yourorganizationfromyourcompetition.
<b>Exercise#2:Listthefourbigdatavaluedriversthatareenabledbythe</b>
economicsofbigdataanddescribehoweachmightimpactoneofyour
organization'skeybusinessprocessesidentifiedinExercise#1.
<b>Exercise#3:FortheselectedkeybusinessprocessesidentifiedinExercise</b>
#1,describehoweachkeybusinessprocessmightbeimprovedasittransitions
alongthefivephasesoftheBigDataBusinessModelMaturityIndex.Identify
thecustomer,product,andoperationalramificationsthateachofthefive
phasesmighthaveontheselectedkeybusinessprocess.
<b>Exercise#4:Listtheculturalchangesthatyourorganizationmustaddressif</b>
Oneofthebiggestchallengesorganizationsfacewithrespecttobigdatais
<i><b>identifyingwhereandhowtostart.Thebigdatastrategydocument,detailed</b></i>
inthischapter,providesaframeworkforlinkinganorganization'sbusiness
strategyandsupportingbusinessinitiativestotheorganization'sbigdataefforts.
Thebigdatastrategydocumentguidestheorganizationthroughtheprocessof
breakingdownitsbusinessstrategyandbusinessinitiativesintopotentialbigdata
businessusecasesandthesupportingdataandanalyticrequirements.
<i>ThebigdatastrategydocumentfirstappearedinmybookBigData:</i>
<i>UnderstandingHowDataPowersBigBusiness.Sincethenandcourtesyof</i>
severalclientengagements,significantimprovementshavebeenmadetohelp
userstouncoverbigdatausecases.Inparticular,theprocesshasbeen
enhancedtoclarifythebusinessvalueandimplementationfeasibility
assessmentsofthedifferentdatasourcesandusecaseprioritization(see
Figure3.1).
Establishcommonterminologyforbigdata.
Examinetheconceptofabusinessinitiativeandprovidesomeexamplesof
wheretofindthesebusinessinitiatives.
Introducethebigdatastrategydocumentasaframeworkforhelping
organizationstoidentifytheusecasesthatguidewhereandhowtheycan
starttheirbigdatajourneys.
Provideahands-onexampleofthebigdatastrategydocumentinaction
Introduceworksheetstohelporganizationstodeterminethebusiness
valueandimplementationfeasibilityofthedatasourcesthatcomeoutof
thebigdatastrategydocumentprocess.
<i><b>Introducetheprioritizationmatrixasatoolthatcandrivebusiness</b></i>
andITalignmentaroundprioritizingtheusecasesbasedonbusinessvalue
andimplementationfeasibilityovera9-to12-monthwindow.
Beforewelaunchintothebigdatastrategydocumentdiscussion,weneedto
defineafewcriticaltermstoensurethatweareusingconsistentterminology
throughoutthechapterandthebook:
<i><b>CorporateMission.Whytheorganizationexists;defineswhatan</b></i>
organizationisandtheorganization'sreasonforbeing.Forexample,TheWalt
DisneyCompany'scorporatemissionis“tobeoneoftheworld'sleading
producersandprovidersofentertainmentandinformation.”1
<i><b>BusinessStrategy.Howtheorganizationisgoingtoachieveitsmissionover</b></i>
thenexttwotothreeyears.
<i><b>StrategicBusinessInitiatives.Whattheorganizationplanstodoto</b></i>
achieveitsbusinessstrategyoverthenext9to12months;usuallyincludes
businessobjectives,financialtargets,metrics,andtimeframes.
<b>BusinessEntities.Thephysicalobjectsorentities(e.g.,customers,patients,</b>
students,doctors,windturbines,trucks)aroundwhichthebusinessinitiative
willtrytounderstand,predict,andinfluencebehaviorsandperformance
<i><b>(sometimesreferredtoasthestrategicnounsofthebusiness).</b></i>
<b>BusinessStakeholders.Thosebusinessfunctions(sales,marketing,</b>
finance,storeoperations,logistics,andsoon)thatimpactorareimpactedby
thestrategicbusinessinitiative.
<b>BusinessDecisions.Thedecisionsthatthebusinessstakeholdersneedto</b>
makeinsupportofthestrategicbusinessinitiative.
<i><b>BigDataUseCases.Theanalyticusecases(decisionsandcorresponding</b></i>
actions)thatsupportthestrategicbusinessinitiative.
<b>Data.Thestructuredandunstructureddatasources,bothinternaland</b>
Thebigdatastrategydocumenthelpsorganizationsaddressthechallengeof
identifyingwhereandhowtostarttheirbigdatajourneys.Thebigdatastrategy
documentusesasingle-pageformatthatanyorganizationcanuse(profitornon-profit)thatlinksanorganization'sbigdataeffortstoitsbusinessstrategyandkey
It'sconcise.Itfitsonasinglepagesothatanyoneintheorganizationcan
quicklyreviewittoensureheorsheisworkingonthetoppriorityitems.
It'sclear.Itclearlydefineswhattheorganizationneedstodoinorderto
achievetheorganization'skeybusinessinitiatives.
It'sbusinessrelevant.Itstartsbyfocusingonthebusinessstrategyand
supportinginitiativesbeforeitdivesintothedataandtechnology
requirements.
Thebigdatastrategydocumentiscomposedofthefollowingsections(seeFigure
3.2):
Businessstrategy
Keybusinessinitiatives
Keybusinessentities
Keydecisions
<b>Figure3.2</b>Bigdatastrategydocument
Therestofthechapterwilldetaileachofthesesectionsandprovideguidelinesfor
howtheorganizationcantriagetheorganization'sbusinessstrategyintothe
financialdrivers(orusecases)onwhichtheorganizationcanfocusitsbigdata
efforts.WewilluseacasestudyaroundChipotleMexicanGrillstoreinforcethe
triageandanalysisprocess.
Thestartingpointforthebigdatastrategydocumentprocessistoidentifythe
organization'sbusinessinitiativesoverthenext9to12months.Thatis,whatis
thebusinesstryingtoaccomplishoverthenext9to12months?This9-to12-monthtimeframeiscritical,asit
Focusestheorganization'sbigdataeffortsonsomethingthatisofimmediate
valueandrelevancetothebusiness
Createsasenseofurgencyfortheorganizationtomovequicklyanddiligently
Givesthebigdataprojectamorerealisticchanceofdeliveringapositive
ReturnonInvestment(ROI)andafinancialpaybackin12monthsorless
Abusinessinitiativesupportsthebusinessstrategyandhasthefollowing
characteristics:
Criticaltoimmediate-termbusinessand/orfinancialperformance(usually9-to12-monthtimeframe)
<b>Figure3.3</b>Chipotle's2012lettertotheshareholders
FromthePresident'sLetter,wecanidentifyatleastfourkeybusinessinitiatives
forthecomingyear:
Improveemployee(talent)acquisition,maturation,andretention(whichis
especiallyimportantforanorganizationwhere90percentofitsmanagement
hascomeupthroughtheranksofthestore).
Continuedouble-digitrevenuegrowth(up20.3percentin2012)byopening
Increasesamestoresalesgrowth(7.1percentgrowthin2012).
ImprovemarketingeffectivenessonbuildingtheChipotlebrandandengaging
withcustomersinwaysthatcreatestronger,deeperbonds.
Whileanyfourofthesebusinessinitiativesareripeforthebigdatastrategy
document,fortheremainderofthisexercise,we'llfocusonthe“increasesame
storesales”businessinitiativebecauseincreasingsalesofabusinessentityor
outletisrelevantacrossanumberofdifferentindustries(i.e.,hospitality,gaming,
banking,insurance,retail,highereducation,healthcareproviders).
Itisaroundthesebusinessentitiesthatwearegoingtowanttocapturethe
behaviors,tendencies,patterns,trends,preferences,etc.attheindividual
entitylevel.Forexample,acreditcardcompanywouldwanttocaptureBill
Schmarzo'sspecifictravelandbuyingpatternsandtendenciesinorderto
betterdetectfraudandimprovemerchantmarketingoffers.
Figure3.4showsthetemplatethatwearegoingtousetosupportthebigdata
strategydocumentprocess.Wehavealreadycapturedourtargeted“increasesame
storesales”businessinitiative.
<b>Figure3.4</b>Chipotle's“increasesamestoresales”businessinitiative
Takeamomenttowritedownwhatyouthinkmightbethekeybusinessentitiesor
HerearethreebusinessentitiesthatIcameupwith:
Stores
Businessentity:Localevents
Decision:
Decision:
Businessentity:Localcompetitors
Decision:
Decision:
Someofthedecisionswillbeverysimilar.That'sgoodbecauseitallowsthe
organizationtoapproachthedecisionsfrommultipleperspectives.
Figure3.5showsiswhattheChipotlebigdatastrategydocumentlooksatthis
pointintheexercisewiththeadditionofsomeofthebusinessdecisions.
<b>Figure3.5</b>Chipotlekeybusinessentitiesanddecisions
Whileitishardtoactuallydothisgroupingprocessinabook,theuseof
ForChipotle's“increasesamestoresales”businessinitiative,thefollowingare
likelyfinancialdriversorusecases:
Increasestoretraffic(acquirenewcustomers,increasefrequencyofrepeat
customers)
Increaseshoppingbagrevenueandmargins(cross-sellcomplementary
products,up-sell)
Increasenumberofcorporateevents(catering,repeatcateringevents)
Improvepromotionaleffectiveness(HalloweenBoo-ritto,Christmasgiftcards,
graduation,holiday,andspecialeventgiftcards)
Improvenewproductintroductioneffectiveness(seasonal,holiday)
Theentirebigdatastrategydocumentprocesshasbeendesignedtouncoverthese
usecases—toidentifythosefinancialdriversthatsupportourtargetedbusiness
initiative.Theusecasesandfinancialdriversarethepointofthebigdatastrategy
<i>documentwherewefocustheorganizationonthe“MakeMeMoreMoney”big</i>
dataopportunities.
<b>Figure3.6</b>CompletedChipotlebigdatastrategydocument
Withtheusecasesandfinancialdriversidentified,wearenowreadytomoveinto
thedataandmetricsenvisioningprocess.Wewanttobrainstormdatasources
(regardlessofwhetherornotyoucurrentlyhaveaccesstothesedatasources)that
<i><b>mightyieldnewinsightsthatsupportthetargetedbusinessinitiative.Wewantto</b></i>
unleashthebusinessandITteams'creativethinkingtobrainstormdatasources
thatmightyieldnewcustomer,product,store,campaign,andoperationalinsights
thatcouldimprovetheeffectivenessofthedifferentusecases.
Forexample,Chipotledatasourcesthatwereidentifiedaspartoftheenvisioning
exercisescouldinclude:
PointofSalesTransactions
MarketBaskets
Weather
TrafficPatterns
Yelp
Zillow/Realtor.com
Twitter/Facebook/Instagram
Twellow/Twellowhood
ZipCodeDemographics
EventBrite
MaxPreps
MobileApp
Butnotalldatasourcesareofequalbusinessvalueorhaveequalimplementation
feasibility.Thedatasourcesneedtobeevaluatedinlightof
Thebusinessvaluethatdatasourcecouldprovideinsupportoftheindividual
usecase
Thefeasibility(orease)ofacquiring,cleaning,aligning,normalizing,
enriching,andanalyzingthosedatasources
Sowewanttoaddtwoprocesses(worksheets)tothebigdatastrategydocument
processthatwillevaluatethebusinessvalueandimplementationfeasibilityof
eachofthepotentialdatasources.
<b>Figure3.7</b>BusinessvalueofpotentialChipotledatasources
You'dwanttogothroughagroupbrainstormingprocesswiththebusiness
stakeholderstoassesstherelativevalueofeachdatasourcewithrespecttoeach
<i>usecase.Thebusinessusersownthebusinessvaluedeterminationbecausethey</i>
arebestpositionedtobeabletounderstandandquantifythebusinessvaluethat
eachdatasourcecouldprovidetotheusecases.
IlikeusingHarveyBalls(in
boththedatavalueandthefeasibilityassessmentcharts.TheHarveyBalls
quicklyandeasilycommunicatetherelativevalueofeachdatasourcewith
respecttoeachusecase.
ReviewingthedatavalueassessmentchartinFigure3.7,youcanquicklyuncover
somekeyobservations,suchasthefollowing:
Detailedpoint-of-saledataisimportanttoalloftheusecases.
InsightsfromtheStoreDemographicsdataareimportanttofourofthefive
usecases.
MiningConsumerCommentshasasurprisingstrongimpactacrossfourofthe
fiveusecases.
promotionaleffectiveness”usecasesbuthaslittleimpactonthe“increase
shoppingbagrevenue,”“increasenumberofcorporateevents,”or“improve
newproductintroductioneffectiveness”usecases.
Next,youwanttounderstandtheimplementationfeasibilityforeachofthe
potentialdatasources.ThispartoftheexerciseisprimarilydrivenbytheIT
organizationsinceitisbestpositionedtounderstandtheimplementation
challengesandrisksassociatedwitheachofthedatasources,suchaseaseofdata
acquisition,cleanlinessofthedata,dataaccuracy,datagranularity,costof
acquiringthedata,organizationalskillsets,toolproficiencies,andotherrisk
<i><b>factors.TheimplementationfeasibilityassessmentchartforChipotle's</b></i>
“increasesamestoresales”businessinitiativelookslikeFigure3.8.
<b>Figure3.8</b>ImplementationfeasibilityofpotentialChipotledatasources
FromtheChipotleimplementationfeasibilityassessmentchartinFigure3.8,we
canquicklymakethefollowingobservations:
PointofSales,MarketBaskets,andStoreManagerDemographicsdatais
readilyavailableandeasytointegrate(likelyduetothemasterdata
managementanddatagovernanceeffortsnecessarytoloadthisdataintoa
datawarehouse).
ConsumerCommentsdata,whichwasveryvaluableinthebusinessvalue
assessment,hasseveralimplementationrisks.Lackoforganizational
SocialMediadata,whichwasratedaboutmid-valueinthevalueassessment
exercise,alsolookstobearealchallenge.Manyofthesamecleanliness,
accuracy,andgranularityissuesexist,withtheaddedissuethatthisisdata
thatwillneedtobe“acquired”throughsomemeans.Probablynotthefirstdata
sourceyouwanttodealwithinthisusecase.
Thefinalstepinthebigdatastrategydocumentprocessistotakethebusinessand
ITstakeholdersthroughausecaseprioritizationprocess.Whilewewillcoverthe
<i><b>prioritizationmatrixindetailinChapter13,Iwanttointroducetheconcept</b></i>
hereasthenaturalpointofconcludingthebigdatastrategydocumentprocess.
Aspartofthebigdatastrategydocument,wehavenowdonetheworktoidentify
theusecasesthatsupporttheorganization'skeybusinessinitiative,brainstormed
additionaldatasources,anddeterminedtheapplicabilityofthosedatasources
fromabusinessvalueandimplementationfeasibilityassessment.Wearenow
readytoprioritizetheusecasesbasedontheirrelativebusinessvalueand
implementationfeasibilityoverthenext9to12months(seeFigure3.9).
<b>Figure3.9</b>Chipotleprioritizationofusecases
Preservestartingpitchingeffectivenessthroughouttheregularseasonand
playoffsbyoptimizingpitchcounts,pitcherrotations,pitcherrests,etc.
Improvebattingandsluggingproficiencybyoptimizingtrades,freeagent
signings,minorleaguepromotions,andcontractextensions
Increasein-game“smallball”runsscoredeffectivenessthroughtheoptimal
combinationofbatters,hitting,stealing,baserunning,andsacrificehitting
strategies
Accelerateminorleagueplayerdevelopmentthroughplayerstrengthand
conditioningtraining,gamesituations,andminorleagueassignments
Optimizein-gamepitchselectiondecisionsthroughimprovedunderstanding
ofbatterandpitchermatchups
Figure3.10showstheresultingbigdatastrategydocument.
<b>Figure3.10</b>SanFranciscoGiantsbigdatastrategydocument
Next,wewouldbrainstormthepotentialdatasourcestosupporttheusecases,
including:
<b>PersonnelPlayerHealth.Thisshouldincludepersonalhealthhistory</b>
(weight,health,BMI,injuries,therapy,medications),physicalperformance
metrics(60-footdashtime,longtossdistances,fastballvelocity),andworkout
history(benchpress,deadlift,crunchesandpushupsin60seconds,frequency
<b>StartingPitcherPerformance.Thisshouldincludeadetailedpitching</b>
Thischapterfocusedonthebigdatastrategydocumentandkeyrelatedtopics
including:
Introducedtheconceptofabusinessinitiativeandprovidedsomeexamplesof
wheretofindthesebusinessinitiatives
Introducedthebigdatastrategydocumentasaframeworkforhelping
organizationstoidentifytheusecasesthatguidewhereandhowtheycanstart
theirbigdatajourneys
Providedahands-onexampleofthebigdatastrategydocumentinactionusing
Chipotle,achainoforganicMexicanfoodrestaurants
Introducedworksheetstohelporganizationstodeterminethebusinessvalue
andimplementationfeasibilityofthedatasourcesthatcomeoutofthebig
datastrategydocumentprocess
IntroducedtheprioritizationmatrixasatooltohelpdrivebusinessandIT
alignmentaroundthetoppriorityusecasesovera9-to12-monthwindow
Hadsomefunbyapplyingthebigdatastrategydocumenttotheworldof
professionalbaseball
Thischapteroutlinedthebigdatastrategydocumentasaframeworktohelpan
documentisatooltoensurethatyourbigdatajourneyisvaluableandrelevant
fromabusinessperspective.
ToswingbackaroundtotheChipotlecasestudy,Figure3.11showssomeinitial
resultsofthecompany'ssuccesswithits“increasesamestorerevenues”business
initiative.(Formoreinformation,seethearticleat
<b>Figure3.11</b>Chipotle'ssamestoresalesresults
It'snicetoseethatourChipotleusecaseactuallyhasarealbusinessstorybehind
it.Butthenagain,everybigdatainitiativeshouldhavearealbusinessstory
Usethefollowingexercisestoapplythebigdatastrategydocumenttoyour
organization(oroneofyourfavoriteorganizations).
<b>Exercise#1:Startbyidentifyingyourorganization'skeybusinessinitiatives</b>
overthenext9to12months.
<b>Exercise#2:Selectoneofyourbusinessinitiatives,andthenbrainstormthe</b>
keybusinessentitiesorstrategicnounsthatimpactthatselectedbusiness
initiative.Asareminder,itisaroundtheindividualbusinessentitiesthatwe
wanttocapturethebehaviors,tendencies,patterns,trends,preferences,etc.at
theindividualbusinessentitylevel.
<b>Exercise#3:Next,brainstormthekeydecisionsthatneedtobemadeabout</b>
eachkeybusinessentitywithrespecttothetargetedbusinessinitiative.
<b>Exercise#4:Nextwewanttogroupthedecisionsintocommonusecases;</b>
thatis,clusterthosedecisionsthatseemsimilarintheirbusinessorfinancial
objectives.
<b>Exercise#5:Thenbrainstormthedifferentdatasourcesthatyoumightneed</b>
tosupportthoseusecases:
Identifypotentialinternalstructured(transactionaldatasources,
operationaldatasources)andunstructured(consumercomments,notes,
workorders,purchaserequests)datasources
Identifypotentialexternaldatasources(socialmedia,blogs,publicly
available,data.gov,websites,mobileapps)thatyoualsomightwantto
consider
<b>Exercise#6:Usethedataassessmentworksheetstodeterminetherelative</b>
businessvalueandimplementationfeasibilityofeachoftheidentifieddata
sourceswithrespecttothedifferentusecases.
<b>Exercise#7:Finally,usetheprioritizationmatrixtorankeachoftheuse</b>
Theuserexperienceisoneofthesecretstobigdatasuccess,andoneofmy
favoritetopics.Iforganizationscannotdeliverinsightstoitsemployees,
managers,partners,andcustomersinawaythatisactionable,thenwhyeven
<i>bother.OneofthekeystosuccessintheBigDataMBAisto“beginwithanendin</i>
mind”withrespecttounderstandinghowtheanalyticresultsaregoingtobe
deliveredtofrontlineemployees,businessmanagers,channelpartners,and
<i>customersinawaythatisactionable.TheBigDataMBAseeksto“closethe</i>
Reviewanexampleofan“unintelligent”userexperience.
Highlighttheimportanceof“thinkingdifferently”withrespecttocreating
anactionabledashboardversusbuildingatraditionalBusinessIntelligence
dashboard.
Reviewasampleactionabledashboardtargetingfrontlinestoremanagers.
Reviewanothersampleactionabledashboard(financialadvisor
dashboard)targetingbusiness-to-businesschannelpartners.
ThischapterwillchallengethetraditionalBusinessIntelligenceapproachesto
OneofmyfavoritesubjectsagainstwhichIlovetorailisthe“unintelligent”user
experience.Thisisaproblemcausedby,inmyhumbleopinion,thelackofeffort
byorganizationstounderstandtheirkeybusinessstakeholderswellenoughtobe
abletodeliveractionableinsightsinsupportoftheorganizations'keybusiness
initiatives.Andthisuserexperienceproblemisoftenonlyexacerbatedbybigdata.
Hereisareal-worldexampleofhowNOTtoleverageactionableanalyticsinyour
organization'sengagementswithyourcustomers.Thenameshavebeenchanged
toprotecttheguilty.
MydaughterAmeliagotthee-mail(seeFigure4.1)fromourcellphoneprovider
warningherthatshewasabouttoexceedhermonthlydatausagelimitof2GB.
Shewasveryupsetthatshewasabouttogooverherlimit,anditwouldstart
costingher(actually,me)anadditional$10.00perGBoverthelimit.(Note:The
“Monday,August13,2012”dateinthefigurewillplayanimportantroleinthis
story.)
<b>Figure4.1</b>Originalsubscribere-mail
IaskedAmeliawhatinformationshethoughtsheneededinordertomakea
<i><b>decisionaboutalteringherFacebook,Pandora,Vine,Snapchat,andInstagram</b></i>
usage(sincethosearethemaindatahogculpritsinhercase)sothatshewould
notexceedherdataplanlimits.Shethoughtforawhileandthensaidthatshe
thoughtsheneededthefollowinginformation:
Howmuchofherdataplandoesshehaveleftinthecurrentmonth?
Whendoeshernewmonthorbillingperiodstart?
experiencethatorganizationsshouldbetargeting.
Ourcellularprovidercouldhaveprovidedauserexperiencethathighlightedthe
informationandinsightsnecessarytohelpAmeliamakeadecisionaboutdata
usage.Theuserexperiencecouldhavelookedsomethinglikethee-mailmessage
showninFigure4.2.
<b>Figure4.2</b>Improvedsubscribere-mail
Thissamplee-mailhasalltheinformationthatAmelianeedstomakeadecision
aboutusagebehaviorsincluding:
Actualusagetodate(65percent)
Aforecastofusagebytheendoftheperiod(67percent)
Thedatewhenthedataplanwillreset(in1dayonAugust14)
Butlet'stakethiscasestudyonestepfurther.Let'ssaythatthereactuallywas
goingtobeaproblemwithAmelia'susageandherdataplan.Whatif82percentof
datausagehadbeenconsumedwith50percentofusageperiodremaining?How
dowemaketheuserexperienceandthecustomerengagementuseful,relevant,
andactionable?
Themock-upshowninFigure4.3offersonepotentialapproachbasedonthe
sameprinciplesdiscussedearlier:provideenoughinformationtohelpAmelia
Forexample,FutureTelcocouldofferprescriptiveadviceabouthowtoreduce
dataconsumptionsuchas:
Transitioningtoappsthataremoredatausageefficient(i.e.,transitioning
fromPandoratoRdiooriHeartRadioforstreamingradio,assumingthatRdio
andiHeartRadioaremoreefficientintheirusageofthedatabandwidth)
Turningoffappsinthebackgroundthatareunnecessarilyconsumingdata
suchasmappingapps(likeAppleMaporWaze)orappsthatareusingGPS
tracking
FutureTelcocouldevenofferAmeliaoptionstoavoidpayinganoveragepenalty
(seeFigure4.3)suchas:
Purchasea1-monthdatausageupgradefor$2.00(whichischeaperthanthe
$10overagepenalty)
<b>Figure4.3</b>Actionablesubscribere-mail
Butwait,thereisevenmorethatFutureTelcocoulddotoimprovethecustomer
experience.FutureTelcocouldanalyzeAmelia'sappusagetendenciesand
recommendnewappsbasedonotherappsthatuserslikeAmeliause,similarto
whatAmazonandNetflixdo(seeFigure4.4).
<b>Figure4.4</b>Apprecommendations
Thislevelofcustomerintimacycanopenupallsortsofnewmonetization
opportunitiessuchas:
Helpappdeveloperstobemoresuccessfulwhilecollectingreferralfees,co-marketingfees,andothermonetizationideasthatalignwiththeapp
developers'businessobjectives
Cellularprovidersarenotaloneinmissingopportunitiestoleveragecustomer
insightsinordertoprovideamorerelevant,moremeaningfulcustomer
experience.Manyorganizationsaresittingongoldminesofinsightsabouttheir
customers'buyingandusagepatterns,tendencies,propensities,andareasof
Bigdatacantransformthebusinessbyenablingacompletelynewuserexperience
(UEX)builtaroundinsightandrecommendationsversusjusttraditionalBusiness
Intelligencechartsandtables.Retailers,likemostorganizations,canleverage
detailed,historicaltransactionaldata—coupledwithnewsourcesof“right-time”
datalikelocalcompetitors'promotions(e.g.,“bestfooddays,”whichistheday
whengrocerystoresposttheirweeklypromotions),weather,andevents—to
uncovernewinsightabouttheircustomers,products,merchandising,competitors
andoperations.Bigdataprovidesorganizationstheabilityto(1)rapidlyingest
thesenewsourcesofcustomer,product,andoperationaldataandthen(2)
leveragedatasciencetoyieldreal-time,actionableinsights.
Let'swalkthroughanexampleofintegratingbigdatawithatraditionalBI
dashboardtocreateamoreactionableuserexperiencethatempowersfrontline
employeesandmanagers.
<b>Figure4.5</b>TraditionalBusinessIntelligencedashboard
ThechallengewiththesetraditionalBIdashboardsisthatunlessyouarean
analyst,it'snotclearwhatactiontheuserissupposedtotake.Arrowsup,
sideways,anddown…Icanseemyperformance,butthedashboarddoesn't
provideanyinsightstotellthestoremanagerwhatactionstotake.
Theotherchallengeisthatthestoremanager(likemostfrontlineemployeesand
managers)likelydoesnothaveaBIorananalyticsbackground(likelyworkedhis
wayuptheranksinthegrocerystore).Asaresult,UEXandtheactionable
insightsandrecommendationsarecriticalbecausethestoremanagerdoesnot
knowhowtodrillintotheBIreportsanddashboardstouncoverinsightsbasedon
therawdata.
<b>Figure4.6</b>Actionablestoremanagerdashboard
InFigure4.6,SectionAshowsspecificproduct,promotion,placement,and
pricingrecommendationsbasedonthelayoutofaspecificstore.SectionB
providesspecificrecommendationsconcerningpricing,merchandising,inventory,
staffing,promotions,etc.forthestoremanager.
EachrecommendationinSectionBispresentedwithAccept[+]orReject[-]
options.Ifthestoremanageracceptstherecommendationbyselecting[+],that
recommendationisexecuted(e.g.,raiseprices,addpromotion,addinventory,
etc.).However,ifthestoremanagerrejectstherecommendation,thenthe
actionabledashboardcapturesthereasonfortherejectionsothatthesupporting
analyticmodelscanbeconstantlyfine-tuned(seeFigure4.7).
Finally,thestoremanagercanselecttheMoreoptioninSectionBandmodifythe
recommendationbasedonhisownexperience.Allowingthestoremanagerto
modifytherecommendationsbasedonhispersonalexperiencesallowsthe
underlyinganalyticmodelstoconstantlylearnwhatworksandwhatdoesn'twork
andbuildonthebestpracticesandlearningsfromtheorganization'smost
<b>Figure4.7</b>Storemanageraccept/rejectrecommendations
Oneusecaseforthestoremanagerdashboardenablesthestoremanagerto
monitorlocalcompetitiveactivityandpromotions.Thegroceryindustryisvery
locallycompetitive.Competitors,forthemostpart,arewithinjustafewmilesor
evenblocksofeachother.Inthiscompetitiveanalysisusecase,thedashboard
providesamapofthelocalgroceryandbeveragecompetitors(seeSectionCof
Figure4.7).Hoveringoveranyparticularcompetitoronthemapimmediately
bringsupitscurrentmarketingflyer.Thestoremanager(orhisbusinessanalyst)
canbrowsethrougheachofthecompetitors'flyersandmakecustomstore
<b>Figure4.8</b>Competitiveanalysisusecase
Liketheotherrecommendations,thestoremanager'scustomrecommendations
willbemonitoredforeffectivenesssothattheanalyticmodelscanbeconstantly
updatedandrefined.
<b>Figure4.9</b>Localeventsusecase
Anotherusecaseistointegratethelocalweatherforecastintothestoremanager
dashboard.Thestoremanagercananalyzethelocalweatherforecastsandmake
adjustmentsforinventory,merchandising,andpromotionsbasedonwhetherthe
weatherwillbewarmerorcolderthanexpected(seeFigure4.10).Thedashboard
canautomaticallyanalyzesimilarweatherconditionsandpredicttheimpacton
storetrafficandproductcategorysalesanddeliverrelevantrecommendationsto
thestoremanager.
<b>Figure4.10</b>Localweatherusecase
Adjustinvestmentstrategies(short-term,long-term)
Reallocatefinancialportfolio
Changeinvestmentvehicles(stocks,bonds,mutualfunds,etc.)
<b>Figure4.11</b>Financialadvisordashboard
Thegoalofthefinancialadvisordashboardistouncoverinsightsabouttheclient's
investmentperformanceandprovideclient-specificrecommendationsthathelp
theseclientsreachtheirfinancialgoals.Togenerateactionable,accurate
recommendations,we'regoingtoneedtoknowasmuchabouttheclientas
Currentandhistoricalpersonalbackgroundinformation(e.g.,maritalstatus,
spouse'sfinancialandemploymentsituation,numberandageofchildren,
outstandingmortgageonhome(s)andanysecondaryrealestateinvestments)
Currentfinancialinvestmentsandotherassets(e.g.,stocks,bonds,mutual
funds,IRAs,401-Ks,REITs)
Currentandhistoricalincome(andexpenditures,ifpossible)
Financialgoalswithspecifictimelines
Weneedtoensurethatthefinancialadvisordashboardprovidesenoughvalueto
boththefinancialadvisorandtheadvisor'sclientsinordertoincenttheclientsto
shareasmuchofthisdataaspossible.
<b>ClientPersonalInformation:Thefirstpartofthedashboardpresents</b>
relevantclientpersonalandfinancialinformation.FSIwantstogatherasmuch
personalinformationasisrelevantwhentheclientfirstopenshisaccounts.But
aftertheclientopenshisaccount,thereneedstobeaconcertedefforttokeepthe
dataupdatedandcapturenewlifestyle,lifestage,employment,andfamily
information.Muchofthatclientdatacanbecapturedviadiscussionsand
interactionsthatthefinancialadvisorishavingwiththeclient(e.g.,informational
calls,e-maildialogues,officevisits,annualreviews).Whilethisinformationis
goldtoFSI,muchofthisdatanevergetspastthefinancialadvisors'personal
contactmanagementande-mailsystems.FSImustprovidecompellingreasonsto
Someleading-edgeorganizationsareprovidingincentives(e.g.,discounts,
promotions,contests,rewards)forclientstosharetheirsocialmediainteractions.
Obviously,accesstotheclient'scurrentsituationandplansaspostedonsocial
mediasitesisgoldwhenitcanbeminedtouncoveractivitiesthatmightaffecthis
financialneeds(e.g.,vacations,buyinganewcar,upcomingweddingplans,
promotions,jobchanges,childrenchangingschools).
<b>ClientFinancialStatus:Thenextsectionofthedashboardprovidesan</b>
overviewoftheclient'scurrentfinancialstatus.Again,themoredatathatcanbe
gatheredabouttheclient'sfinancialsituation(e.g.,investments,home,spending,
debt),themoreaccurateandprescriptivetheanalyticmodelswillbe(seeFigure
4.13).
<b>Figure4.13</b>Clientfinancialinformation
Inthisexample,wehavedetailsonalltheclient'sfinancialinvestmentswithFSI.
However,theclientmight(andlikelydoes)havefinancialinvestmentswithother
firmscourtesyofhisemployer's401kprograms,wholelifeinsurancepolicies,and
otherstocks,bonds,andfunds.Andthatdoesn'tevenconsidersubstantial
investmentsinnonfinancialinstrumentslikehisprimaryresidence,vacation
home,antiques,andcollectibles.
Incentingclientstosharetheirentirefinancialportfolioiscomplicatedbyhow
recommendations.
<b>ClientFinancialGoals:Thefinalinformationalsectionofthedashboard</b>
containstheclient'sfinancialgoals.Therearelikelyonlyasmallnumberofgoals,
andtheyprobablydon'tchangethatoften.However,itisdifficulttodevelop
meaningfulclientfinancialrecommendationswithoutup-to-dateclientfinancial
goals.Fromadatacollectionperspective,thisisprobablytheeasiestdatato
<b>Figure4.14</b>Clientfinancialgoals
However,let'ssaythattheclienteitherwon'tsharehisfinancialgoalsorhasn't
eventhoughtthroughwhathisfinancialgoalsneedtobe.Thisiscommonwhen
dealingwithretirementplanning,sincemanyclientsaren'tclearorrealisticabout
theirretirementgoals.Inthesesituations,FSIcouldleveragetheinformationthat
ithasabout“similar”clientstomakeretirementgoalrecommendations.IfFSIhas
theclient'scurrentfinancialinvestmentsandcurrentsalary,FSIcouldmakea
prettyintelligentguessastotheclient'sretirementgoals.
Nowlet'sgetintothemeatofthefinancialadvisordashboard.Theclient
informationsectionsofthedashboardweremeanttoprovideaneasyandefficient
waytocapturetheclient'skeylifestyle,demographic,andfinancialdata,aswellas
resultsofdifferentfinancialoptionsandactions,andthencreateprescriptive
modelsinordertodeliverclient-specificrecommendationsthathelptheclientto
reachhisfinancialgoals.Thisfinancialadvisordashboardcoversfourdifferent
areasfordeliveringclient-specificfinancialrecommendations:
Financialcontributions
Spendinganalysis
Assetallocation
Otherfinancialinvestments
<i><b>FinancialContributionsRecommendations</b></i>
Thefirstsetofrecommendationsisfocusedonhelpingtheclientoptimize
financialcontributions(seeFigure4.15).Thetypesofclientdecisionsthatcould
bemodeledinclude:
Onetimepaymentstojump-startlaggingfinancialgoals
Reallocatemonthlyorperiodicpaymentsagainstdifferentfinancialgoals
Changeretirement,newcar,andnewhometargetdates
<b>Figure4.15</b>Financialcontributionsrecommendations
Wecouldemploydatasciencetoanalyzetheclient'sdetailedfinancialdata,
<i><b>SpendingAnalysisRecommendations</b></i>
Thesecondsetofrecommendationsisfocusedonhelpingtheclientoptimize
spendinghabits.Thisiswhereaccesstotheclient'screditcardandbanking
statements(maybeviaMint.comand/orhischeckingaccounts)couldyield
valuableinsightstohelptheclientminimizecashoutflowandincreasefinancial
investments(seeFigure4.16).Thetypesofspendingdecisionsthatwouldneedto
bemodeledinclude:
Consolidatingexpendituresofsimilarproductsandservices
Flaggingexpendituresthatareabnormallyhighgiventheclient'sfamily
situation,homelocation,etc.
Integratingcustomerloyaltyprograminformationtofindretailerswhocan
providebestpricesonfoodandhouseholdstaples
Increasinginsurancedeductiblestolowerpremiums
<b>Figure4.16</b>Spendanalysisandrecommendations
Therearelotsofopportunitiestoleverageexternaldatasourcesandbestpractices
acrosstheFSIclientbasetofindbetterdealsinanattempttoreducetheclient's
discretionaryspending.Thereareseveralretail,insurance,travel,hospitality,
entertainment,cellphone,andotherwebsitesfromwhichdatacouldbegathered.
<i><b>AssetAllocationRecommendations</b></i>
Thethirdsetofrecommendationsisfocusedonhelpingclientsoptimizetheir
assetallocationinlightoftheirfinancialgoals.Byleveragingbestpracticesacross
otherclients,portfolios,andinvestmentinstruments,prescriptiveanalyticscanbe
developedtomakespecificassetallocationrecommendationsthatsupportasset
allocationdecisionssuchas(seeFigure4.17):
Whichstocksandbondstosellorbuyagainstspecificfinancialgoalportfolios
Portfolioallocationdecisionsthatproperlybalancetherisk-returnratioofthe
client'sportfolioinlightofrisktoleranceandfinancialgoals
Otherfinancialinstrumentsthatcanacceleratetheclient'sprogressagainst
financialgoalsorreduceriskforthoseshort-termfinancialgoals
<b>Figure4.17</b>Assetallocationrecommendations
client'sdesiredrisklevel.Tofurtherprotecttheclient'sinvestmentassets,an
aggregatedviewofthemarketplacecouldyieldmoretimelyinsightsintostocks
andbondsthataresuddenlyhotorcold.Thisisalsoanareawherereal-time
analyticscanbeleveragedtoensurethatnosuddenmarketmovementsexposethe
clienttounnecessaryassetallocationrisks.Thedashboardcouldalsosupportan
interactive“whatif”collaborationdirectlywiththeclienttogleanevenmoredata
andinsightsabouttheclient'sinvestmentpreferencesandtoleranceforrisk.
<i><b>OtherInvestmentRecommendations</b></i>
Thefourthsetofrecommendationsisfocusedonotherassetsthatclientsneedto
consideraspartoftheiroverallfinancialstrategy.Realestate(theclient'shome
andanyvacationhomes)isprobablythemostobvious.Thisisanareawhere
recommendationsaboutotherinvestmentoptionscanbedeliveredtohelp
supportclientdecisionsregarding(seeFigure4.18):
Identifyingtheidealamountofinsuranceneededgivenhomevaluation
changes
HomeimprovementprojectsthatyieldthebestROIforparticularhousetypes,
budgets,andlocationsovertime
Identifyingtherighttimetobuyorsellahome,andevenmaking
recommendationsastowhatpricetobidforhomesinselectareas
Bestareastolookforsecondaryand/orvacationhomeinvestments
Mostcost-effectivelocationstoliveinafterretirement
<b>Figure4.18</b>Otherinvestmentrecommendations
Thereisabevyofexternaldatasourcesthatcanbeleveragedtohelpfacilitate
analyticsinthisarea.Forexample,ZillowandRealtor.comproviderealestate
valuationsandmonthlychangesinrealestatevaluationsthatcouldbe
Bigdatacanpoweramorerelevantandmoreactionableuserexperience.Instead
ofoverwhelmingbusinessuserswithanendlessarrayofcharts,reports,and
dashboardsandforcingusersto“sliceanddice”theirwaytoinsights,wecan
operationalinsightsburiedinthedata.Wecanleveragethoseinsightstocreate
frontlineemployee,manager,andcustomerrecommendationsandthenmeasure
theeffectivenessofthoserecommendationssothatwearecontinuouslyrefining
ouranalyticmodels.
Usethefollowingexercisetoapplywhatyoulearnedinthischapter.
<b>Exercise#1:Selectoneofyourorganization'soutward-facingdashboards,</b>
websites,ormobileapps.Ifnotsomethingfromyourorganization,thenselect
awebsiteordashboardthatyouuseregularly.Thatmightincludesomething
fromyourbank,creditcardprovider,cellularprovider,orutilitycompany.
Grabafewscreencapturesofthedashboardorwebsite.
<b>Exercise#2:Thinkthroughhowyouastheuserusethisdashboard,website,</b>
ormobiletomakedecisions.Writedownthosedecisionsthatyoutrytomake
fromthewebsite.Forexample,fromyourutility,youmightwanttomake
decisionsaboutenergyandwaterconsumption,yourwaste/garbageplan,and
maybeevenwhichofthedifferentappliancerebatesyoumightwantto
consider.
<b>Exercise#3:Next,addarecommendationspanelthathassuggestionsfor</b>
eachofthedecisionsthatyoucapturedinStep2.Forourutilityexample,one
recommendationmightbe“Onlywater3daysaweekfrom6:00a.m.to7:00
a.m.tosaveapproximately$12.50permonthonyourmonthlywaterbill.”Or
anotherrecommendationmightbe“Replaceyourexistingdryerwithamore
efficientmodelliketheSamsungDV457tosave$21.75onyourmonthly
energybill.”
<b>Exercise#4:Finally,identifypotentialexternaldatasourcesthatmight</b>
Thesethreechaptersintroducedatascienceasakeybusinessdisciplinethathelps
organizations“crosstheanalyticschasm”fromtheBusinessMonitoringto
BusinessInsightsandBusinessOptimizationphases.Thesechapterswill
introducetheconceptofdatascienceandthenbroadenthediscussiontocover
whatdatasciencetechniquestouseinwhichbusinessscenarios.
Chapter5:DifferencesBetweenBusinessIntelligenceandDataScience
Chapter6:DataScience101
IwashiredbyalargeInternetportalcompanyin2007toheadupeffortsto
developitsadvertiseranalytics.Theobjectiveoftheadvertiseranalyticsproject
wastohelptheInternetportalcompany'sadvertisersandagenciesoptimizetheir
advertisingspendacrosstheInternetportal'sadnetwork.Theinternalcodename
fortheprojectwas“LookingGlass”becausewewantedtotaketheadvertisersand
agenciesthroughan“AliceinWonderland”typeofexperienceinhowwedelivered
actionableinsightstohelpourkeybusinessstakeholders—MediaPlanners&
BuyersandCampaignManagers—successfullyoptimizetheiradvertisingspendon
theInternetportal'sadnetwork.Butinmanyways,itwasmethatwentthrough
thelookingglass.
Severalmonthslater(August2008),IhadtheopportunitytokeynoteatTheData
WarehouseInstitute(TDWI)conferenceinSanDiego.ItaughtaclassatTDWIon
howtobuildanalyticapplications,soIwasbothfamiliarwithandabigfanofthe
TDWIconferences(andstillam).However,inmykeynote,Itoldtheaudiencethat
everythingthatIhadtaughtthemabouthowtobuildanalyticapplicationswas
wrong(seeFigure5.1).
<b>Figure5.1</b>SchmarzoTDWIkeynote,August2008
Likewithmyownpersonalexperience,manyorganizationsandindividualsare
confusedbythedifferencesintroducedbybigdata,especiallythedifferences
betweenBusinessIntelligence(BI)anddatascience.BigdataisnotbigBI.Big
dataisakeyenablerofanewdisciplinecalleddatasciencethatseekstoleverage
newsourcesofstructuredandunstructureddata,coupledwithpredictiveand
prescriptiveanalytics,touncovernewvariablesandmetricsthatarebetter
ThischapterdiscussesthedifferencesbetweenBIanddatascience:
Theanalyticcharacteristicsaredifferent.
Theanalyticengagementprocessesaredifferent.
Thedatamodelsaredifferent.
Thebusinessviewisdifferent.
Datascienceisacomplicatednewdisciplinethatrequiresadvancedskillsand
competenciesinareassuchasstatistics,computerscience,datamining,
mathematics,andcomputerprogramming.Ashadbeenstatedcountlesstimes,
datascientistsarethebusiness“rockstars”ofthe21stcentury.
Althoughwhatdatascientistsdocanbequitecomplex,whattheyaretryingto
achieveisnot.Infact,Ifindthattheverybestintroductorybooktodatascienceis
<i>Moneyball:TheArtofWinninganUnfairGamebyMichaelLewis(W.W.Norton</i>
&Company,2004).ThebookisabouttheOaklandA'sGeneralManagerBilly
Beane'suseofsabermetricstohelpthesmall-marketOaklandA'sprofessional
baseballteamoutperformcompetitorswithsignificantlylargerbankrolls.The
bookyieldsthemostaccuratedescriptionofdatascience:
<i>Datascienceisaboutfindingnewvariablesandmetricsthatarebetter</i>
<i>predictorsofperformance.</i>
That'sit—nothingmore—andyes,datascienceisthatsimple.Butthepowerof
thatsimplestatementisgamechanging,ascanbeseeninFigure5.2andthe
<b>Figure5.2</b>OaklandA'sversusNewYorkYankeescostperwin
Thebookalsohasanothervaluablelesson:goodideascanbecopied.So
predictive“on-basepercentage”metric.
WhenclientsaskmetoexplainthedifferencebetweenaBusinessIntelligence
analystandadatascientist,Istartbyexplainingthatthetwodisciplineshave
differentobjectivesandseektoanswerdifferenttypesofquestions(seeFigure
5.3).
<b>Figure5.3</b>BusinessIntelligenceversusdatascience
BIfocusesondescriptiveanalytics:thatis,the“Whathappened?”typesof
questions.Examplesinclude:
HowmanywidgetsdidIselllastmonth?
WhatweresalesbyzipcodeforChristmaslastyear?
HowmanyunitsofProductXwerereturnedlastmonth?
Whatwerecompanyrevenuesandprofitsforthepastquarter?
BIfocusesonreportingonthecurrentstateofthebusiness,orasisnow
commonlycalledBusinessPerformanceManagement(BPM).BIprovides
retrospectivereportstohelpbusinessuserstomonitorthecurrentstateofthe
businessandanswerquestionsabouthistoricalbusinessperformance.These
reportsandquestionsarecriticaltothebusiness,sometimesrequiredfor
regulatoryandcompliancereasons.
ofunder-andover-performance.Buteventheseanalyticsarefocusedon
monitoringwhathappenedtothebusiness.
Ontheotherhand,datascientistsareinsearchofvariablesandmetricsthatare
betterpredictorsofbusinessperformance.Consequently,datascientistsfocuson
predictiveanalytics(“Whatislikelytohappen?”)andprescriptiveanalytics
(“WhatshouldIdo?”)typesofquestions.Forexample:
PredictiveQuestions(Whatislikelytohappen?)
HowmanywidgetswillIsellnextmonth?
WhatwillsalesbyzipcodebeoverthisChristmasseason?
HowmanyunitsofProductXwillbereturnednextmonth?
Whatareprojectedcompanyrevenuesandprofitsfornextquarter?
HowmanyemployeeswillIneedtohirenextyear?
PrescriptiveQuestions(WhatshouldIdo?)
Order[5,000]ComponentZtosupportwidgetsalesfornextmonth.
Hire[Y]newsalesrepsbythesezipcodestohandleprojectedChristmas
sales.
Setaside[$125K]infinancialreservetocoverProductXreturns.
Sellthefollowingproductmixtoachievequarterlyrevenueandmargin
goals.
Increasehiringpipelineby35percenttoachievehiringgoals.
Toanswerthesepredictiveandprescriptivequestions,datascientistsbuild
AnotherareaofdifferencebetweenBIanddatascienceisintheattitudinal
characteristicsandworkapproachofthepeoplewhofillthoseroles(seeTable
5.1).
<b>Table5.1</b>BIAnalystVersusDataScientistCharacteristics
<b>Area</b> <b>BIAnalyst</b> <b>DataScientist</b>
Focus Reports,KPIs,trends Patterns,correlations,models
Process Static,comparative Exploratory,experimentation,visual
Datasources Pre-planned,addedslowly Onthefly,asneeded
Transform Upfront,carefullyplanned In-database,ondemand,enrichment
Datamodel Schemaonload Schemaonquery
Analysis Retrospective,descriptive Predictive,prescriptive
<b>Figure5.4</b>CRISP:CrossIndustryStandardProcessforDataMining
Datasciencetakesaverysimilarapproach:establishabusinesshypothesisor
question;exploredifferentcombinationsofdataandanalyticstobuild,test,and
refinetheanalyticmodel;andwash,rinse,andrepeatuntilthemodelprovesthat
itcanprovidetherequired“analyticlift”whilereachingasatisfactorygoodnessof
fit.Finallytheanalyticsaredeployedoroperationalizedincludingpossibly
Unfortunately,theseexplanationsareinsufficienttoanswersatisfactorilythe
questionofwhat'sdifferentbetweenBusinessIntelligenceanddatascience.So
let'sexaminecloselythedifferentengagementapproaches(includinggoals,tools,
andtechniques)thattheBIanalystandthedatascientistusetodotheirjobs.
TheBIanalystengagementprocessisadisciplinethathasbeendocumented,
taughtandrefinedoverthreedecadesofbuildingdatawarehousesandBI
environments.Figure5.5providesahigh-levelviewoftheprocessthatatypicalBI
analystuseswhenengagingwiththebusinessuserstobuildouttheBIand
supportingdatawarehouseenvironments.
<b>Figure5.5</b>BusinessIntelligenceengagementprocess
<b>Step1:Pre-buildDataModel.Theprocessstartsbybuildingthe</b>
foundationaldatamodel.Whetheryouuseadatawarehouseordatamartor
hub-and-spokeapproach,whetheryouuseastar,snowflake,normalizedor
dimensionalschema,theBIanalystmustgothroughaformalrequirements
gatheringprocesswiththebusinessuserstoidentifyall(oratleastthevast
majorityof)thequestionsthatthebusinessuserswanttoanswer.Inthis
requirementsgatheringprocess,theBIanalystmustidentifythefirst-and
second-levelquestionsthebusinessuserswanttoaddressinordertobuilda
robustandextensibledatamodel.Forexample:
SQLrequest.TheBIanalystscanalsospecifygraphicalrenderingoptions(bar
charts,linecharts,piecharts)untiltheygettheexactreportand/orgraphic
thattheywant(seeFigure5.6).
<b>Figure5.6</b>TypicalBItoolgraphicoptions
TheBItoolsareverypowerfulandrelativelyeasytouseifthedatamodelis
configuredproperly.Bytheway,thisisagoodexampleofthepowerofschemaon
load.Thistraditionalschemaonloadapproachremovesmuchoftheunderlying
datacomplexityfromthebusinessuserswhocanthenusetheBItoolsgraphical
userinterfacetomoreeasilyqueryandexplorethedata(thinkself-serviceBI).
Insummary,theBIapproachreliesonapre-builtdatamodel(schemaonload),
whichenablesuserstoquicklyandeasilyquerythedata—aslongasthedatathat
theywanttoqueryisalreadydefinedandloadedintothedatawarehouse.Ifthe
dataisnotinthedatawarehouse,thenaddingdatatoanexistingwarehousecan
Thedatascienceprocessissignificantlydifferent.Infact,thereisverylittlefrom
theBIanalystengagementprocessthatcanbereusedinthedatascience
<b>Figure5.7</b>Datascientistengagementprocess
<b>Step1:DefineHypothesistoTest.Step1ofthedatascienceengagement</b>
processstartswiththedatascientistsidentifyingthepredictiontheywantto
makeorhypothesisthattheywanttotest.Thisisaresultofcollaboratingwith
thebusinesssubjectmatterexperttounderstandthekeysourcesofbusiness
differentiation(e.g.,howtheorganizationdeliversvalue)andthenconstruct
theassociatedhypothesesorpredictions.
<b>Step2:GatherData…andMoreData.Instep2ofthedatascience</b>
engagementprocess,thedatascientistgathersrelevantorpotentially
interestingdatafromamultitudeofsources—bothinternalandexternaltothe
organization—andpushesthatdataintothedatalakeoranalyticsandbox.The
datalakeisagreatfoundationalcapabilityforthisprocess,asthedata
scientistscanacquireandingestanydatatheywant(as-is),testthedataforits
valuegiventhehypothesisorprediction,andthendecidewhethertoinclude
thatdataintheanalyticmodel.Thisiswhereanenvisioningexercisecanadd
considerablevalueinfacilitatingthecollaborationbetweenthebusinessusers
<i>andthedatascientiststoidentifydatasourcesthatmayhelpimprove</i>
predictiveresults.
<b>Step3:BuildDataModel.Step3iswherethedatascientistsdefineand</b>
Thedatamodelsthatareusedinthedatawarehousetosupportanorganization's
BIeffortsaresignificantlydifferentfromthedatamodelsthedatascientistsprefer
touse.
TheworldofBI(akaquery,reporting,dashboards)requiresadatamodeling
techniquethatallowsbusinessuserstocreatetheirownreportingandqueries.To
supportthisneed,RalphKimballpioneereddimensionalmodeling—orstar
schemas—whileatMetaphorComputersbackinthe1980s(seeFigure5.9).
<b>Figure5.9</b>Dimensionalmodel(starschema)
Thedimensionalmodelwasdesignedtoaccommodatetheanalysisneedsofthe
businessusers,withtwoimportantdesignconcepts:
<b>Facttables(populatedwithmetricsormeasures)correspondtotransactional</b>
<b>Dimensiontables(populatedwithattributesaboutthatdimension)</b>
representthe“nouns”ofthatparticulartransactionalsystemsuchasproducts,
markets,stores,employees,customers,anddifferentvariationsoftime.
Dimensionsaregroupsofhierarchiesanddescriptorsthatdescribethefacts.It
isthesedimensionalattributesthatenableanalyticexploration,attributessuch
assize,weight,location(street,city,state,zip),age,gender,tenure,etc.
Dimensionalmodelingisidealforbusinessusersbecauseitsupportstheirnatural
question-and-answerexplorationprocesses.DimensionalmodelingsupportsBI
conceptssuchasdrillacross(navigatingacrossdimensions)anddrillup/drill
down(navigatingupanddownthedimensionalhierarchiessuchastheproduct
dimensionhierarchyofproduct⇨brand⇨category).
Today,allBItoolsusedimensionalmodelingasthestandardwayforinteracting
withtheunderlyingdatawarehouse.
<i>Intheworldofdatascience,Hadoopprovidesanopportunitytothinkdifferently</i>
abouthowwedodatamodeling.HadoopwasoriginallydesignedbyYahootodeal
withverylong,flatweblogs.Hadoopwasdesignedwithverylargedatablocks
(Hadoopdefaultblocksizeis64MBto128MBversusrelationaldatabaseblock
sizesthataretypically32Kborless).Tooptimizethisblocksizeadvantage,the
datascienceteamwantsverylong,flatrecordsandlong,flatdatamodels.1
<b>Figure5.10</b>UsingflatfilestoeliminateorreducejoinsonHadoop
AsanexampleinFigure5.10,insteadofthreedifferentstarschemaswith
conformedorshareddimensionstolinkthedifferentstarschemas,thedata
scienceteamwantsthreelong,flatfileswiththefollowingcustomerdata:
Customerdemographics(age,gender,currentandprevioushomeaddresses,
valueofcurrentandprevioushome,historyofmaritalstatus,kidsandtheir
agesandgenders,currentandpreviousincome,etc.)
Customerpurchasehistory(annualpurchasesincludingitemspurchased,
returns,pricespaid,discounts,coupons,location,dayofweek,timeofday,
weathercondition,temperatures)
visitastore,recencyofstorevisit,frequencyofstorevisitsinpast
week/month/quarter,howlongdoIstayatwhichstores(“passthru”or
“linger”),etc.
<b>Classifications.Nowwewanttocreatesome“classifications”aboutBill</b>
Schmarzo'slifethatmighthaveimpactonStarbucks'skeybusinessinitiatives
suchaslifestageclassification(longmarriage,kidincollege,kidathome,
weight/dietconscious,etc.),lifestyleclassification(heavytraveler,heavychai
teadrinker,lightexerciser,andsoon),orproductclassification(morning
coffee/oatmealconsumer,afternoonfrap/cookieconsumer,etc.).
<b>AssociationRules.Wemightalsowanttocapturesomepropensitiesabout</b>
Bill'susagepatternsthatwecanusetosupportStarbucks'skeybusiness
initiatives,includingpropensitytobuyoatmealwhenhebuyshisventichai
lattewhentravelinginthemorning,propensitytobuyacookie/pastrywhen
<b>Scores.Wealsomaywanttocreatescorestosupportdecision-makingand</b>
processoptimization.Scoresthatwemightwanttocreate(again,depending
onStarbucks'skeybusinessinitiatives)couldincludeadvocacyscore(which
measuresmylikelihoodtorecommendStarbucksandmakepositivecomments
forStarbucksonsocialmedia),loyaltyscore(whichmeasuresmylikelihoodto
continuetovisitStarbucksstoresandbuyStarbucksproductsversus
competitors),productusagescore(whichisameasureofhowmuchStarbucks
productIconsume—andrevenueIgenerate—whenIvisitaStarbucksstore),
etc.
Aprofilecouldbemadeupofhundredsofmetricsandscoresthat—whenusedin
combinationagainstaspecificbusinessinitiativelikecustomerretention,
<b>Figure5.11</b>Samplecustomeranalyticprofile
Somemetricsandscoresaremoreimportantthanothers,dependingonthe
businessinitiativebeingaddressed.Forexample,afinancialservicesfirmfocused
oncustomeracquisition,disposableincome,retirementreadiness,lifestage,age,
educationlevel,andnumberoffamilymembersdatamaybethemostimportant
predictivemetrics.However,forthatsamefinancialservicesfirmfocusedon
customerretention,metricssuchasadvocacy,customersatisfaction,attritionrisk,
socialnetworkassociations,andselectsocialmediarelationshipsmaybethemost
importantpredictivemetrics.
<b>Figure5.12</b>Improvecustomerretentionexample
Theanalysisprocessworkslikethis:
<b>Step1:Establishahypothesisthatyouwanttotest.Inourcustomerretention</b>
example,ourtesthypothesisisthat“Premiumgoldcardmemberswithgreater
thanfivedayswithoutapurchaseormobileappengagementhave25to30
percenthigherprobabilityofchurnthansimilarcustomers.”
<b>Step2:Identifyandquantifythemostimportantmetricsorscorestopredicta</b>
certainbusinessoutcome.Inourexample,themetricsandscoresthatwe're
goingtousetotestourcustomerattritionhypothesisincludesCustomer
Tenure(inmonths),CustomerSatisfactionScore,AverageMonthlyPurchases,
andCustomerLoyaltyScore.Noticethatthemetricsdonothavethesame
weight(orconfidencelevel).Somemetricsandscoresaremoreimportantthan
othersinpredictingperformancegiventhetesthypothesis.
<b>Step3:Employthepredictivemetricstobuilddetailedprofilesforeach</b>
individualcustomerwithrespecttothehypothesistobetested.
<b>Step4:Compareanindividual'srecentactivitiesandcurrentstatewithhisor</b>
herprofileinordertoflagunusualbehaviorsandactionsthatmaybe
indicativeofacustomerretentionproblem.Inourcustomerretention
example,wemightwanttocreatea“CustomerAttrition”scorethatquantifies
specificrecommendationsastowhatactionsor“nextbestoffers”canbe
deliveredtoretainthatcustomer.
<b>Step5:Continuetoseekoutnewdatasourcesandnewmetricsthatmaybe</b>
andscoresusingsensitivityanalysisandsimulationsliketheMonteCarlo
experiments.
<b>Step6:Integratetheanalyticinsights,scores,andrecommendationsintothe</b>
OrganizationsarerealizingthatdatascienceisverydifferentfromBIandthatone
doesnotreplacetheother.Bothcombinetoprovidethe“dynamicduo”of
analytics—onefocusedonmonitoringthecurrentstateofthebusinessandthe
othertryingtopredictwhatislikelytohappenandthenprescribewhatactionsto
take.
Bigdataisakeyenablerofanewdisciplinecalleddatascience.Datascienceseeks
toleveragenewsourcesofstructuredandunstructureddata,coupledwith
advancedpredictiveandprescriptiveanalytics,touncovernewvariablesand
metricsthatarebetterpredictorsofperformance.
Asdiscussedinthischapter,BIisdifferentfromdatascienceinthefollowing
ways:
Thequestionsaredifferent.
Theanalyticcharacteristicsaredifferent.
Theanalyticengagementprocessesaredifferent.
Thedatamodelsaredifferent.
Thebusinessviewisdifferent.
Thischapteralsointroducedtheveryimportantdatascienceconceptcalled
analyticprofiles.Organizationsarelearningthatmoreimportantthantryingto
createa360-degreeprofileofthecustomerisidentifyingandquantifyingthose
fewerbutmoreimportantmetricsthatarebetterpredictorsofbusinessor
customerperformancesuchasoptimizingkeybusinessprocesses,influencing
customerbehaviors,anduncoveringnewmonetizationopportunities.
Usethefollowingexercisestoapplywhatyoulearnedinthischapter.
<b>Exercise#1:DescribethekeydifferencesbetweenBIanddatascienceand</b>
whatthosedifferencesmeantoyourorganization.
<b>Exercise#2:Listsampledescriptive(Whathappened?),predictive(Whatis</b>
likelytohappen?),andprescriptive(WhatactionsshouldItake?)questions
thatarerelevanttothetargetedbusinessinitiativethatyouidentifiedin
Chapter2.
1<sub>ApacheHadoopisanopen-sourcesoftwareframeworkwritteninJavafor</sub>
distributedstorageanddistributedprocessingofverylargedatasetson
computerclustersbuiltfromcommodityhardware.AllthemodulesinHadoop
aredesignedwithafundamentalassumptionthathardwarefailures(of
Therearemanyexcellentbooksandcoursesfocusedonteachingpeoplehowto
becomeadatascientist.Thosebooksandcoursesprovidedetailedmaterialand
exercisesthatteachthekeycapabilitiesofdatasciencesuchasstatisticalanalysis,
datamining,textmining,SQLprogramming,andothercomputing,mathematical,
andanalytictechniques.Thatisnotthepurposeofthischapter.
ThepurposeofChapter6istointroducesomedifferentanalyticalgorithmsthat
businessusersshouldbeawareofandtodiscusswhenitmightbemost
appropriatetousewhichtypesofalgorithms.Youdonotneedtobeadata
scientisttounderstandwhenandwhytoapplytheseanalyticalgorithms.Amore
detailedunderstandingofthesedifferentanalyticalgorithmswillhelpthe
startviralmarketingcampaigns.
Thischapterreviewsanumberofdifferentanalytictechniques.Youarenot
expectedtobecomeanexpertinthesedifferentanalyticalgorithms.However,the
moreyouunderstandwhattheseanalyticalgorithmscando,thebetterposition
youareintocollaboratewithyourdatascienceteamandsuggesttheartofthe
possibletoyourbusinessleadershipteam.
FundamentalexploratoryanalyticalgorithmsthatarecoveredinChapter6are:
Trendanalysis
Boxplots
Geography(spatial)analysis
Pairsplot
Timeseriesdecomposition
Moreadvancedanalyticalgorithmsthatarecoveredinthischapterare:
Clusteranalysis
Normalcurveequivalent(NCE)analysis
Associationanalysis
Graphanalysis
Textmining
Sentimentanalysis
Traversepatternanalysis
Decisiontreeclassifieranalysis
Throughoutthechapter,youwillcontemplatehowTheParkscouldleverageeach
ofthesedifferentanalytictechniques.
ThroughoutthischapterIprovidelinkstositesthatcanhelpyouget
Let'sstartbycoveringsomebasicstatisticalanalysisthatwaslikelycoveredin
yourfirststatisticscourse(yes,Irealizethatyouprobablysoldyourstatsbookthe
minutethestatsclasswasover).Trendanalysis,boxplots,geographicalanalysis,
pairsplot,andtimeseriesdecompositionareexamplesofexploratoryanalytic
algorithmsthatthedatascientistsusetogeta“feelforthedata.”These
exploratoryanalyticalgorithmshelpthedatascienceteamtobetterunderstand
thedatacontentandgainahigh-levelunderstandingofrelationshipsandpatterns
inthedata.
Trendanalysisisafundamentalvisualizationtechniquetospotpatterns,trends,
relationships,andoutliersacrossatimeseriesofdata.Oneofthemostbasicyet
verypowerfulexploratoryanalytics,trendanalysis(applyingdifferentplotting
techniquesandgraphicvisualizations)canquicklyuncovercustomer,operational,
orproducttrendsandeventsthattendtohappentogetherorhappenatsome
<b>Figure6.1</b>Basictrendanalysis
InFigure6.1,thedatascientistmanuallytestedanumberofdifferenttrending
optionsinordertoidentifythe“bestfit”trendline(inthisexample,using
MicrosoftExcel).Oncethedatascientistidentifiesthebesttrendingoption,the
datascientistcanautomatethegenerationofthetrendlinesusingR.
differentbusinessdimensions(e.g.,products,geographies,salesterritories,
markets)inordertoundercoverpatternsandtrendsatthenextlevelof
granularity.Thedatascientistcanthenwriteaprogramtojuxtaposethedetailed
trendlinesintothesamechartsothatitiseasiertospottrends,patterns,
relationships,andoutliersburiedinthegranulardata(seeFigure6.2).
<b>Figure6.2</b>Compoundtrendanalysis
Finally,trendanalysisyieldsmathematicalmodelsforeachofthetrendlines.
Thesemathematicalmodelscanbeusedtoquantifyreoccurringpatternsor
behaviorsinthedata.Themostinterestinginsightsfromthetrendlinescanthen
beflaggedforfurtherinvestigationbythedatascienceteam(seeFigure6.3).
TheParkscouldusetrendanalysistoidentifythevariables(e.g.,waittimes,
socialmediaposts,consumercomments)thatarehighlycorrelatedtothe
increaseordecreaseinguestsatisfactionforeachattraction,restaurant,retail
outlet,andentertainment.TheParkscouldleveragetheresultsfromthetrend
analysisto
1. Flagproblemareasandtakecorrectiveactionssuchasopeningmorelines,
promotinglessbusyattractions,movingkiosksthatareblockingtraffic
flow,andresituatingcharactersatdifferentpointsintheparks;
2. Identifythelocationandtypesoffutureattractions,restaurants,retail
outlets,andentertainment.
Formoreinformationabouthowtomakesimpleplotsandgraphs(line
charts,barcharts,histograms,dotcharts)inR,checkout
/>
Boxplotsareoneofthemoreinterestingandvisuallycreativeexploratoryanalytic
algorithms.Boxplotsquicklyvisualizevariationsinthebasedataandcanbeused
toidentifyoutliersinthedataworthyoffurtherinvestigation.Aboxplotisa
convenientwayofgraphicallydepictinggroupsofnumericaldatathroughtheir
quartiles.Boxplotsmayalsohavelinesextendingverticallyfromtheboxes
(whiskers)indicatingvariabilityoutsidetheupperandlowerquartiles,hencethe
<i>termsbox-and-whiskerplotandbox-and-whiskerdiagram(seeFigure6.4).</i>
TheParkscanemployboxplotstodetermineitsmostloyalguestsforeachof
thepark'sattractions(e.g.,CanyonCopterRide,MonsterMansion,Space
Adventure,GhoulishGulch).TheParkscanusetheresultsoftheboxplot
analysistocreateguestcurrentandmaximumlifetimevaluescoresagainst
whichtoprioritizetowhomtorewardwithPriorityAccesspassesandother
couponsanddiscounts.
FormoreinformationaboutcreatingboxplotsinR,checkout
/>
Geographicalorspatialanalysisincludestechniquesforanalyzinggeographical
activitiesandconditionsusingabusinessentity'stopological,geometric,or
geographicproperties.Forexample,geographicalanalysissupportsthe
integrationofzipcodeanddata.goveconomicdatawithaclient'sinternaldatato
provideinsightsaboutthesuccessoftheorganization'sgeographicalreachand
marketpenetration(seeFigure6.5).
<b>Figure6.5</b>Geographical(spatial)trendanalysis
IntheexampleinFigure6.5,geographicalanalysisiscombinedwithtrend
TheParkscanconductgeographicaltrendanalysistospotanychanges(at
boththezip+4andhouseholdlevels)inthegeo-demographicsofguestsover
timeandbyseasonalityandholidays.TheParkscanusetheresultsofthis
geographicalplusseasonalityanalysistocreategeo-specificcampaignsand
promotionswiththeobjectiveofincreasingattendancefromunder-penetratedgeographicalareasbydayofweek,holidays,andseasonality.
Pairsplotanalysismaybemyfavoriteanalyticsalgorithm.Pairsplotanalysis
allowsthedatascientisttospotpotentialcorrelationsusingpairwisecomparisons
acrossmultiplevariables.Pairsplotanalysisprovidesadeepviewintothe
differentvariablesthatmaybecorrelatedandcanformthebasisforguidingthe
datascienceteamintheidentificationofkeyvariablesormetricstoincludeinthe
developmentofpredictivemodels(seeFigure6.6).
<b>Figure6.6</b>Pairsplotanalysis
TheParkscanleveragepairsplotanalysistocompareamultitudeofvariables
toidentifythosevariablesthatdrivegueststoparticularattractions,
entertainment,retailoutlets,andrestaurants.TheParkscanusetheresultsof
theanalysistodrivein-parkpromotionaldecisionsandoffersthatdirect
gueststounder-utilizedattractions,entertainment,retailoutlets,and
restaurants.
AdditionalpairedplotoptionsinR(e.g.,pairs,splom,plotmatrix,ggcorplot,
panelcor)canbefoundat
/>
Timeseriesdecompositionexpandsonthebasictrendanalysisbydecomposing
thetraditionaltrendanalysisintothreeunderlyingcomponentsthatcanprovide
valuablecustomer,product,oroperationalperformanceinsights.Thesetrend
analysiscomponentsare
<b>Cyclicalcomponentthatdescribesrepeatedbutnon-periodicfluctuations,</b>
<b>Seasonalcomponentthatreflectsseasonality(seasonalvariation),</b>
<b>Irregularcomponent(or“noise”)thatdescribesrandom,irregularinfluences</b>
andrepresentstheresidualsofthetimeseriesaftertheothercomponentshave
beenremoved.
Fromthetimeseriesdecompositionanalysis,abusinessusercanspotparticular
areasofinterestinthedecomposedtrenddatathatmaybeworthyoffurther
analysis(seeFigure6.7).
TheParkscandeploytimeseriesdecompositionanalysistoidentifyand
quantifytheimpactthatseasonalityandspecificeventsarehavingonguest
visitsandassociatedspend.TheParkscanusetheresultsoftheanalysisto
1. Createseason-specificmarketingcampaignsandpromotionstoincrease
guestvisitsandassociatedspend,
2. Determinewhicheventsoutsideofthethemeparks(concerts,professional
sportingevents,BCSfootballgames)areworthyofpromotionaland
sponsorshipspend.
FormoreinformationabouttimeseriesdecompositioninR,checkout
Thefollowinganalyticalgorithmsstarttomovethedatascientistbeyondthedata
explorationstageintothemorepredictivestagesoftheanalysisprocess.These
analyticalgorithmsbytheirnaturearemoreactionable,allowingthedatascientist
toquantifycauseandeffectandprovidethefoundationtopredictwhatislikelyto
happenandrecommendspecificactionstotake.
Clusteranalysisisusedtouncoverinsightsabouthowcustomersand/orproducts
clusterintonaturalgroupingsinordertodrivespecificactionsor
recommendations(e.g.,personalizedmessaging,targetmarketing,maintenance
scheduling).Clusteranalysisorclusteringistheexerciseofgroupingasetof
objectsinsuchawaythatobjectsinthesamegrouparemoresimilartoeachother
thantothoseinothergroups(clusters).
Clusteringanalysiscanuncoverpotentialactionableinsightsacrossmassivedata
volumesofcustomerandproducttransactionsandevents.Clusteranalysiscan
uncovergroupsofcustomersandproductsthatsharecommonbehavioral
tendenciesand,consequently,andcanbetargetedwiththesamemarketing
TheParkscanleverageclusteranalysistocreatemoreactionableprofilesof
thepark'smostprofitableguestclustersandhighestpotentialguestclusters.
TheParkscanusetheresultsoftheanalysistoquantify,prioritize,andfocus
guestacquisitionandguestactivationmarketingefforts.
FormoreinformationaboutclusteranalysisinR,checkout
/>
Atechniquefirstusedinevaluatingstudents'testingperformance,normalcurve
equivalent(NCE),isadatatransformationtechniquethatapproximatelyfitsa
normaldistributionbetween0and100bynormalizingadatasetinpreparation
forpercentilerankanalysis.Forexample,anNCEdatatransformationisawayof
standardizingscoresreceivedonatestintoa0–100scalesimilartoapercentile
<i>rankbutpreservingthevaluableequal-intervalpropertiesofaz-score(seeFigure</i>
6.9).
<b>Figure6.9</b>Normalcurveequivalentanalysis
TheParkscanemploytheNCEtechniquetounderstandpriceinflection
pointsforpackagesofattractionsandrestaurants.TheParkscanleveragethe
priceinflectionpointstooptimizepricing(e.g.,createapackageofattractions
andrestaurantsbyseasonality,holiday,dayofweek,etc.)andcreatenew
PriorityAccesspackages.
<i>Formoreinformationabouthowtousez-scorestonormalizedatausingR,</i>
checkout
Formore
insightsintotheNCEdatatransformationtechnique,see
/>
Associationanalysisisapopularalgorithmfordiscoveringandquantifying
relationshipsbetweenvariablesinlargedatabases.Associationanalysisshows
customerorproducteventsoractivitiesthattendtohappentogether,which
makesthistypeofanalysisveryactionable.Forexample,theassociationrule
{buns,ketchup}→{burger}foundinthepoint-of-salesdataofasupermarket
wouldindicatethatifacustomerbuysbunsandketchuptogether,sheislikelyto
alsobuyhamburgermeat.Suchinformationcanbeusedasthebasisformaking
pricing,productplacement,promotion,andothermarketingdecisions.
Associationanalysisisthebasisformarketbasketanalysis(identifyingproducts
and/orservicesthatsellincombinationorsellwithapredictabletimelag)thatis
usedinmanyindustriesincludingretail,telecommunications,insurance,digital
marketing,creditcards,banking,hospitality,andgaming.
<b>Figure6.11</b>Associationanalysis
Oneveryactionabledatasciencetechniqueistoclustertheresultingassociation
rulesintocommongroupsorsegments.Forexample,inFigure6.12,thedata
scienceteamclusteredtheresultingassociationrulesacrosstensofmillionsof
customersinordertocreatemoreaccurate,relevantcustomersegments.Inthis
process,thedatascienceteam
Runstheassociationanalysisacrossthetensofmillionsofcustomersto
identifyassociationruleswithahighdegreeofconfidence,
Clustersthecustomersandtheirresultingassociationrulesintocommon
groupingsorsegments(e.g.,Chipotle+Starbucks,VirginAmerica+Marriott),
Usesthesenewsegmentsasthebasisforpersonalizedmessaginganddirect
marketing.
<b>Figure6.12</b>Convertingassociationrulesintosegments
TheParkscanleveragemarketbasketanalysistoidentifythemostpopular
andleastpopular“packagesofattractions.”TheParkscanusethis“packages
ofattractions”datato
1. CreatenewpricingandPriorityAccesspackagesforthemostpopular
packagesinordertooptimizein-parktrafficflowandreduceattraction
waittimes,
2. CreatenewpricingandPriorityAccesspackagesfortheleastpopular
“packages”inordertodrivetraffictounder-utilizedattractions.
FormoreinformationaboutassociationanalysisinR,checkout
/>
Graphanalysisisoneofthemorepowerfulanalysistechniquesmadepopularby
socialmediaanalysis.Graphanalysiscanquicklyhighlightcustomerormachine
(thinkInternetofThings)relationshipsobscuredacrossmillionsifnotbillionsof
socialandmachineinteractions.
Graphanalysisusesmathematicalstructurestomodelpairwiserelationsbetween
objects.A“graph”inthiscontextismadeupof“vertices”or“nodes”andlines
callededgesthatconnectthem.Socialnetworkanalysis(SNA)isanexampleof
graphanalysis.SNAisusedtoinvestigatesocialstructuresandrelationships
acrosssocialnetworks.SNAcharacterizesnetworkedstructuresintermsofnodes
(peopleorthingswithinthenetwork)andthetiesoredges(relationshipsor
interactions)thatconnectthem(seeFigure6.13).
<b>Figure6.13</b>Graphanalysis
TheParkscanemploygraphanalysistouncoverstrengthofrelationships
amonggroupsofguests(leaders,followers,influencers,cohorts).TheParks
canleveragethegraphanalysisresultstodirectpromotions(discounts,
restaurantvouchers,travelvouchers)togroupleadersinordertoencourage
theseleaderstobringgroupsbacktotheparksmorefrequently.
FormoreinformationaboutsocialnetworkanalysisinR,checkout
-r-using-package-igraph/.
Textminingreferstotheprocessofderivingusableinformation(metadata)from
textfilessuchasconsumercomments,e-mailconversations,physicianor
techniciannotes,workorders,etc.Basically,textminingcreatesstructureddata
outofunstructureddata.
Textminingisaverypowerfultechniquetoshowduringanenvisioningprocess,
asmanybusinessstakeholdershavestruggledtounderstandhowtheycangain
insightsfromthewealthofinternalcustomer,product,andoperationaldata.Text
miningisnotsomethingthatthedatawarehousecando,somanybusiness
stakeholdershavestoppedthinkingabouthowtheycanderiveactionableinsights
fromtextdata.Consequently,itisimportanttoleverageenvisioningexercisesto
helpthebusinessstakeholderstoimagetherealmofwhatispossiblewithtext
data,especiallywhenthattextdataiscombinedwiththeorganization's
operationalandtransactionaldata.
<b>Figure6.14</b>Textmininganalysis
TheParkscanmineguestcomments,socialmediaposts,ande-mailstoflag
FormoreinformationabouttextmininganalysisusingR,checkout
/>
Sentimentanalysiscanprovideabroadandgeneraloverviewofyourcustomers'
sentimenttowardyourcompanyandbrands.Sentimentanalysiscanbea
powerfulwaytogleaninsightsaboutthecustomers'feelingsaboutyourcompany,
products,andservicesoutoftheever-growingbodyofsocialmediasites
(Facebook,LinkedIn,Twitter,Instagram,Yelp,Snapchat,Vine,etc.)(seeFigure
6.15).
<b>Figure6.15</b>Sentimentanalysis
InFigure6.15,thedatascienceteamconductedcompetitivesentimentanalysisby
classifyingtheemotions(e.g.,anger,disgust,fear,joy,sadness,surprise)of
keycompetitor'sperceivedperformanceandqualityofserviceissuffering).
Unfortunately,itissometimesdifficulttogetthesocialmediadataatthelevelof
theindividual,whichisrequiredtocreatemoreactionableinsightsand
recommendationsattheindividualcustomerlevel.However,leading
TheParkscanestablishasentimentscoreforeachattractionandcharacter
andmonitorsocialmediasentimentfortheattractionsandcharactersinreal-time.TheParkscanleveragethereal-timesentimentscorestotakecorrective
actions(placateunhappyguests,openadditionallines,openadditional
attractions,removekiosks,movecharacters).
FormoreinformationaboutsentimentanalysisusingR,checkout
/>
Traversepatternanalysisisanexampleofcombiningacoupleofanalytic
algorithmstobetterunderstandcustomer,product,oroperationalusagepatterns.
Traverseanalysislinksacustomerorproductusagepatternsandassociationrules
toageographicalorfacilitymapinordertoidentifypotentialpurchase,traffic,
flow,fraud,theft,andotherusagepatternsandrelationships.
Theprocessstartsbycreatingassociationrulesfromthecustomer'sorproduct's
usagedata,andthenmapsthoseassociationrulestoageographicalmap(store,
hospital,school,campus,sportsarena,casino,airport)toidentifypotential
performance,usage,staffing,inventory,logistics,trafficflow,etc.problems.
InFigure6.16,thedatascienceteamcreatedaseriesofassociationrulesabout
slotandtableplayinacasino,andthenusedthoseassociationrulestoidentify
potentialfootflowproblemsandgamelocationoptimizationopportunities.The
datascienceteam
Createdplayerperformanceassociationrulesaboutwhatgamestheplayers
tendtoplayincombination,
<b>Figure6.16</b>Traversepatternanalysis
Theresultsofthisanalysishighlightsareasofthecasinothataresub-optimized
whencertaintypesofgameplayersareinthecasinoandcanleadto
TheParkscanemploytraversepatternanalysistounderstandparkandguest
flowswithrespecttoattractions,entertainment,retailoutlets,restaurants,
characters,etc.TheParkscanusethetraversepatternanalysisresultsto
1. Identifywheretoplacecharactersandsituateportablekiosksinorderto
increaserevenues,
2. Determinewhatpromotionstoofferinordertodrivetraffictoidle
attractionsandrestaurants.
Decisiontreeclassifieranalysisusesdecisiontreestoidentifygroupingsand
clustersburiedintheusageandperformancedata.Decisionclassifieranalysis
usesadecisiontreeasapredictivemodelthatmapsobservationsaboutanitemto
conclusionsabouttheitem'stargetvalue.
InFigure6.17,thedatascienceteamusedthedecisiontreeclassifieranalysis
techniquetoidentifyandgroupperformanceandusagevariablesintosimilar
<b>Rank Player</b> <b>Team MPG RPM</b>
1 StephenCurry,PG GS 32.7 9.34
2 LeBronJames,SF CLE 36.1 8.78
3 JamesHarden,SG HOU 36.8 8.50
4 AnthonyDavis,PF NO 36.1 8.18
5 KawhiLeonard,SF SA 31.8 7.57
Source: />
TheParkscanemploycohortsanalysistoidentifyspecificemployeesand
charactersthatincreasetheoverallpark,attractions,characters,customer,
andhouseholdsatisfactionandspendlevels.TheParkscanusetheresultsof
thecohortsanalysisto
1. Decidehowmanyandwheretosituatespecific,popularcharacters;
2. Rewardparkassociatesthatdrivehighercustomersatisfactionscores.
FormoreinformationaboutcohortsanalysisinR,checkoutthearticle
“CohortAnalysiswithR–RetentionCharts”at
Usethefollowingexercisestoapplywhatyoulearnedinthischapter.
<b>Exercise#1:Revieweachoftheanalyticalgorithmscoveredinthischapter</b>
andwritedownoneortwousecaseswherethatparticularanalyticalgorithm
mightbeusefulgivenyourbusinesssituations.
<b>Exercise#2:Revisitthekeybusinessinitiativethatyouidentifiedin</b>Chapter
2.Writedowntwoorthreeoftheanalyticalgorithmscoveredinthischapter
thatyouthinkmightbeappropriatetothedecisionsthatyouaretryingto
makeinsupportofthatkeybusinessinitiative.
<b>Exercise#3:Writedowntwoorthreebulletpointsaboutwhyyouthink</b>
1<sub>Risaprogramminglanguageandsoftwareenvironmentforstatistical</sub>
Itistypicalthat40to60percentofthedatawarehouseprocessingloadis
performingETLwork.Off-loadingsomeoftheETLprocessestothedata
lakecanfreeupconsiderabledatawarehouseresources.
UnhandcufftheBIanalystsanddatascienceteamfrombeingreliantonthe
summarizedandaggregateddatainthedatawarehouseasthesinglesourceof
datafortheirdataanalytics(andmitigatetheunmanageableproliferation
“spreadmarts”1thatarebeingusedbybusinessanalyststoworkaroundthe
analyticlimitationsofthedatawarehouse).
Thedatalakesolvesagreatmanyproblems.However,itcanalsoraisealotof
questions.Inapapertitled“BewaretheDataLakeFallacy”
(Gartnerraisedcautionsaboutthe
datalake,specificallyaroundtheassumptionthatallenterpriseaudiencesare
highlyskilledatdatamanipulationandanalysis.Gartner'spointwasthatifadata
lakefocusesonlyonstoringdisparatedataandignoreshoworwhydataisused,
governed,defined,andsecuredorhowdescriptivemetadataiscapturedand
maintained,thedatalakerisksturningintoadataswamp.Andwithoutan
adequatemetadatastrategy,everysubsequentuseofdatameanstheanalysts
muststartfromscratch.
Theabilityofanorganizationtorealizebusinessvaluefrombigdatareliesonthe
organization'sabilitytoeasilyandquickly:
Identifythe“rightand/orbestdata”
Definetheanalyticsrequiredtoextractthevalue
Bringthedataintoananalyticsenvironment(sandbox)suitedforadvanced
analyticsordatasciencework
Curatethedatatoapointwhereitis“suited”foranalysis
Standuptherequiredinfrastructuretosupporttheanalyticsinaccordance
withthedesiredperformanceandthroughputrequirements
Executetheanalyticmodelsagainstthecurateddatatoderivebusinessvalue
Deploytheanalyticsintotheproductioninfrastructure
Thedatalakeisnotanincrementalenhancementtothedatawarehouse,anditis
NOTdatawarehouse2.0.Thedatalakeenablesentirelynewcapabilitiesthat
allowyourorganizationtoaddressdataandanalyticchallengesthatthedata
warehousecouldnotaddress.
Therearefivecharacteristicsthatdifferentiateabusiness-readydatalakefromthe
datawarehouse(seeFigure7.1):
<b>Figure7.1</b>Characteristicsofadatalake
<b>Ingest.Abilitytorapidlyingestdatafromawiderangeofinternaland</b>
externaldatasources,includingstructuredandunstructureddatasources.The
datalakecanaccomplishrapiddataingestionbecauseitcanloadthedataas-
is;thatis,thedatalakedoesnotrequireanydatatransformationsorpre-buildingadataschemabeforeloadingthedata.
<b>Store.AsingleorcentralrepositoryforamassingALLtheorganization'sdata</b>
includingdatafrompotentiallyinterestingexternalsources.Thedatalakecan
storedataeveniftheorganizationhasnotyetdecidedhowitmightusethe
data.AstheDirectorofAnalyticsandBusinessIntelligenceatStarbuckswas
quoted:“AfullquarterofStarbuckstransactionsaremadeviaitspopular
loyaltycards,andthatresultsin“hugeamounts”ofdata,butthecompanyisn't
surewhattodowith[allthatdata]yet.”Thesamegoesforsocialmediadata,
asStarbuckshasateamwhoanalyzessocialdata,but“Wehaven'tfiguredout
whatexactlytodowithityet.”2
<b>Analyze.Providesthefoundationfortheanalyticsenvironment(oranalytics</b>
internalandexternaldatasourceswiththegoalofuncoveringnewcustomer,
product,andoperationalinsightsthatcanbeusedoptimizekeybusiness
processesandfuelnewmonetizationopportunities.
<b>Surface.Supportstheanalyticmodeldevelopmentandtheextractingofthe</b>
analyticresults(e.g.,scores,recommendations,nextbestoffer,businessrules)
thatareusedtoempowerfrontlineemployees'andbusinessmanagers'
decisionmakingandinfluencecustomerbehaviorsandactions.
<b>Act.Enablestheintegrationoftheanalyticresultsbackintotheorganization's</b>
operationalsystems(callcenter,directmarketing,procurement,store
Asadatawarehousemanager,Ihatedtheanalyticsteam.Why?Because
wheneveritsmembersneededdata,theyalwayscametomydatawarehousefor
thedatabecausetheyweretoldthatthedatawarehousewasthe“singleversionof
thetruth.”Andtheanalyticteam'sdataandqueryrequestsusuallyscrewedupmy
productionSLAsintheprocess(seeFigure7.2).
<b>Figure7.2</b>Theanalyticsdilemma
<b>Figure7.3</b>Thedatalakelineofdemarcation
Thedatalakeprovidesa“lineofdemarcation”betweentheproduction