Tải bản đầy đủ (.pdf) (372 trang)

Big Data MBA: Driving Business Strategies with Data Science

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (8.76 MB, 372 trang )

<span class='text_page_counter'>(1)</span><div class='page_container' data-page=1></div>
<span class='text_page_counter'>(2)</span><div class='page_container' data-page=2></div>
<span class='text_page_counter'>(3)</span><div class='page_container' data-page=3></div>
<span class='text_page_counter'>(4)</span><div class='page_container' data-page=4></div>
<span class='text_page_counter'>(5)</span><div class='page_container' data-page=5></div>
<span class='text_page_counter'>(6)</span><div class='page_container' data-page=6>

Chapter14:OrganizationalRamifications
ChiefDataMonetizationOfficer


Privacy,Trust,andDecisionGovernance
UnleashingOrganizationalCreativity
Summary


HomeworkAssignment
Notes


Chapter15:Stories


CustomerandEmployeeAnalytics
ProductandDeviceAnalytics


NetworkandOperationalAnalytics
CharacteristicsofaGoodBusinessStory
Summary


HomeworkAssignment
Notes


</div>
<span class='text_page_counter'>(7)</span><div class='page_container' data-page=7>

<b>ListofIllustrations</b>



Chapter1:TheBigDataBusinessMandate


Figure1.1BigDataBusinessModelMaturityIndex
Figure1.2Moderndata/analyticsenvironment
Chapter2:BigDataBusinessModelMaturityIndex


Figure2.1BigDataBusinessModelMaturityIndex


Figure2.2Crossingtheanalyticschasm


Figure2.3Packagingandsellingaudienceinsights
Figure2.4Optimizeinternalprocesses


Figure2.5Createnewmonetizationopportunities
Chapter3:TheBigDataStrategyDocument


Figure3.1Bigdatastrategydecompositionprocess
Figure3.2Bigdatastrategydocument


Figure3.3Chipotle's2012lettertotheshareholders


Figure3.4Chipotle's“increasesamestoresales”businessinitiative
Figure3.5Chipotlekeybusinessentitiesanddecisions


Figure3.6CompletedChipotlebigdatastrategydocument
Figure3.7BusinessvalueofpotentialChipotledatasources


Figure3.8ImplementationfeasibilityofpotentialChipotledatasources
Figure3.9Chipotleprioritizationofusecases


Figure3.10SanFranciscoGiantsbigdatastrategydocument
Figure3.11Chipotle'ssamestoresalesresults


Chapter4:TheImportanceoftheUserExperience
Figure4.1Originalsubscribere-mail


Figure4.2Improvedsubscribere-mail
Figure4.3Actionablesubscribere-mail


Figure4.4Apprecommendations


Figure4.5TraditionalBusinessIntelligencedashboard
Figure4.6Actionablestoremanagerdashboard


</div>
<span class='text_page_counter'>(8)</span><div class='page_container' data-page=8>

Figure4.9Localeventsusecase
Figure4.10Localweatherusecase
Figure4.11Financialadvisordashboard
Figure4.12Clientpersonalinformation
Figure4.13Clientfinancialinformation
Figure4.14Clientfinancialgoals


Figure4.15Financialcontributionsrecommendations
Figure4.16Spendanalysisandrecommendations
Figure4.17Assetallocationrecommendations
Figure4.18Otherinvestmentrecommendations


Chapter5:DifferencesBetweenBusinessIntelligenceandDataScience
Figure5.1SchmarzoTDWIkeynote,August2008


Figure5.2OaklandA'sversusNewYorkYankeescostperwin
Figure5.3BusinessIntelligenceversusdatascience


Figure5.4CRISP:CrossIndustryStandardProcessforDataMining
Figure5.5BusinessIntelligenceengagementprocess


Figure5.6TypicalBItoolgraphicoptions
Figure5.7Datascientistengagementprocess
Figure5.8Measuringgoodnessoffit



Figure5.9Dimensionalmodel(starschema)


Figure5.10UsingflatfilestoeliminateorreducejoinsonHadoop
Figure5.11Samplecustomeranalyticprofile


Figure5.12Improvecustomerretentionexample
Chapter6:DataScience101


Figure6.1Basictrendanalysis


Figure6.2Compoundtrendanalysis
Figure6.3Trendlineanalysis


Figure6.4Boxplotanalysis


Figure6.5Geographical(spatial)trendanalysis
Figure6.6Pairsplotanalysis


</div>
<span class='text_page_counter'>(9)</span><div class='page_container' data-page=9>

Figure6.8Clusteranalysis


Figure6.9Normalcurveequivalentanalysis


Figure6.10Normalcurveequivalentsellerpricinganalysisexample
Figure6.11Associationanalysis


Figure6.12Convertingassociationrulesintosegments
Figure6.13Graphanalysis


Figure6.14Textmininganalysis
Figure6.15Sentimentanalysis



Figure6.16Traversepatternanalysis


Figure6.17Decisiontreeclassifieranalysis
Figure6.18Cohortsanalysis


Chapter7:TheDataLake


Figure7.1Characteristicsofadatalake
Figure7.2Theanalyticsdilemma


Figure7.3Thedatalakelineofdemarcation
Figure7.4CreateaHadoop-baseddatalake
Figure7.5Createananalyticsandbox


Figure7.6MoveETLtothedatalake


Figure7.7HubandSpokeanalyticsarchitecture
Figure7.8Datascienceengagementprocess
Figure7.9Whatdoesthefuturehold?


Figure7.10EMCFederationBusinessDataLake
Chapter8:ThinkingLikeaDataScientist


Figure8.1FootLocker'skeybusinessinitiatives


Figure8.2ExamplesofFootLocker'sin-storemerchandising
Figure8.3FootLocker'sstoremanagerpersona


Figure8.4FootLocker'sstrategicnounsorkeybusinessentities


Figure8.5Thinkinglikeadatascientistdecompositionprocess
Figure8.6Recommendationsworksheettemplate


Figure8.7FootLocker'srecommendationsworksheet


</div>
<span class='text_page_counter'>(10)</span><div class='page_container' data-page=10>

Figure8.9Thinkinglikeadatascientistdecompositionprocess
Chapter9:“By”AnalysisTechnique


Figure9.1Identifyingmetricsthatmaybebetterpredictorsofperformance
Figure9.2NBAshootingeffectiveness


Figure9.3LeBronJames'sshootingeffectiveness
Chapter10:ScoreDevelopmentTechnique


Figure10.1FICOscoreconsiderations
Figure10.2FICOscoredecisionrange
Figure10.3Recommendationsworksheet


Figure10.4Updatedrecommendationsworksheet
Figure10.5Completedrecommendationsworksheet
Figure10.6PotentialFootLockercustomerscores
Figure10.7FootLockerrecommendationsworksheet
Figure10.8CLTVbasedonsales


Figure10.9MorepredictiveCLTVscore
Chapter11:MonetizationExercise


Figure11.1“Adayinthelife”customerpersona
Figure11.2Fitnesstrackerprioritization



Figure11.3Monetizationroadmap
Chapter12:MetamorphosisExercise


Figure12.1BigDataBusinessModelMaturityIndex
Figure12.2Patientactionableanalyticprofile


Chapter13:PowerofEnvisioning


Figure13.1BigDataVisionWorkshopprocessandtimeline
Figure13.2BigDataVisionWorkshopillustrativeanalytics
Figure13.3BigDataVisionWorkshopuserexperiencemock-up
Figure13.4PrioritizeHealthcareSystems'susecases


Figure13.5Prioritizationmatrixtemplate
Figure13.6Prioritizationmatrixprocess
Chapter14:OrganizationalRamifications


</div>
<span class='text_page_counter'>(11)</span><div class='page_container' data-page=11></div>
<span class='text_page_counter'>(12)</span><div class='page_container' data-page=12>

<b>ListofTables</b>



Chapter1:TheBigDataBusinessMandate


Table1.1ExploitingTechnologyInnovationtoCreateEconomic-Driven
BusinessOpportunities


Table1.2EvolutionoftheBusinessQuestions
Chapter2:BigDataBusinessModelMaturityIndex


Table2.1BigDataBusinessModelMaturityIndexSummary
Chapter3:TheBigDataStrategyDocument



Table3.1MappingChipotleUseCasestoAnalyticModels


Chapter5:DifferencesBetweenBusinessIntelligenceandDataScience
Table5.1BIAnalystVersusDataScientistCharacteristics


Chapter6:DataScience101


Table6.12014–2015TopNBARPMRankings
Table6.2CaseStudySummary


Chapter7:TheDataLake


Table7.1DataLakeDataTypes


Chapter8:ThinkingLikeaDataScientist


Table8.1EvolutionofFootLocker'sBusinessQuestions
Chapter9:“By”AnalysisTechnique


Table9.1LeBronJames'sShootingPercentages
Chapter10:ScoreDevelopmentTechnique


Table10.1PotentialScoresforOtherIndustries
Chapter11:MonetizationExercise


Table11.1PotentialFitnessTrackerRecommendations
Table11.2RecommendationDataRequirements


Table11.3RecommendationsValueVersusFeasibilityAssessment
Chapter12:MetamorphosisExercise



</div>
<span class='text_page_counter'>(13)</span><div class='page_container' data-page=13></div>
<span class='text_page_counter'>(14)</span><div class='page_container' data-page=14></div>
<span class='text_page_counter'>(15)</span><div class='page_container' data-page=15>

<b>OverviewoftheBookandTechnology</b>



Thedayswhenbusinessstakeholderscouldrelinquishcontrolofdataand
analyticstoITareover.Thebusinessstakeholdersmustbefrontandcenterin
championingandmonetizingtheorganization'sdatacollectionandanalysis
efforts.Businessleadersneedtounderstandwhereandhowtoleveragebigdata,
exploitingthecollisionofnewsourcesofcustomer,product,andoperationaldata
coupledwithdatasciencetooptimizekeybusinessprocesses,uncovernew


monetizationopportunities,andcreatenewsourcesofcompetitivedifferentiation.
Andwhileit'snotrealistictoconvertyourbusinessusersintodatascientists,it's
<i>criticalthatweteachthebusinessuserstothinklikedatascientistssotheycan</i>
collaboratewithITandthedatascientistsonusecaseidentification,requirements
definition,businessvaluation,andultimatelyanalyticsoperationalization.


Thisbookprovidesabusiness-hardenedframeworkwithsupportingmethodology
andhands-onexercisesthatnotonlywillhelpbusinessuserstoidentifywhere
andhowtoleveragebigdataforbusinessadvantagebutwillalsoprovide


</div>
<span class='text_page_counter'>(16)</span><div class='page_container' data-page=16>

<b>HowThisBookIsOrganized</b>


Thebookisorganizedintofoursections:


<b>PartI:BusinessPotentialofBigData.</b>PartIincludesChapters1through
4andsetsthebusiness-centricfoundationforthebook.HereiswhereI


introducetheBigDataBusinessModelMaturityIndexandframethebigdata
discussionaroundtheperspectivethat“organizationsdonotneedabigdata
strategyasmuchastheyneedabusinessstrategythatincorporatesbigdata.”



<b>PartII:DataScience.</b>PartIIincludesChapters5through7andcoversthe
principlebehinddatascience.Thesechaptersintroducesomedatascience
basicsandexplorethecomplementarynatureofBusinessIntelligenceanddata
scienceandhowthesetwodisciplinesarebothcomplementaryanddifferentin
theproblemsthattheyaddress.


<b>PartIII:DataScienceforBusinessStakeholders.</b>PartIIIincludes
Chapters8through12andseekstoteachthebusinessusersandbusiness
leadersto“thinklikeadatascientist.”Thispartintroducesamethodologyand
severalexercisestoreinforcethedatasciencethinkingandapproach.Ithasa
lotofhands-onwork.


</div>
<span class='text_page_counter'>(17)</span><div class='page_container' data-page=17></div>
<span class='text_page_counter'>(18)</span><div class='page_container' data-page=18></div>
<span class='text_page_counter'>(19)</span><div class='page_container' data-page=19>

<b>WhoShouldReadThisBook</b>



Thisbookistargetedtowardbusinessusersandbusinessmanagement.Iwrote
thisbooksothatIcoulduseitinteachingmyBigDataMBAclass,soincludedall
ofthehands-onexercisesandtemplatesthatmystudentswouldneedto


successfullyearntheirBigDataMBAgraduationcertificate.


<i>Ithinkfolkswouldbenefitbyalsoreadingmyfirstbook,BigData:</i>


</div>
<span class='text_page_counter'>(20)</span><div class='page_container' data-page=20>

<b>ToolsYouWillNeed</b>



</div>
<span class='text_page_counter'>(21)</span><div class='page_container' data-page=21>

<b>What'sontheWebsite</b>



Youcandownloadthe“ThinkingLikeaDataScientist”workbookfromthebook's
websiteatwww.wiley.com/go/bigdatamba.Andoh,theremightbeanothersurprise


</div>
<span class='text_page_counter'>(22)</span><div class='page_container' data-page=22>

<b>WhatThisMeansforYou</b>




AsstudentsfrommyclassatUSFhavetoldme,thismaterialallowsthemtotakea


problemorchallengeanduseawell-thought-outprocesstodrivecross-organizationalcollaborationtocomeupwithideastheycanturnintoactions


</div>
<span class='text_page_counter'>(23)</span><div class='page_container' data-page=23></div>
<span class='text_page_counter'>(24)</span><div class='page_container' data-page=24>

<b>PartI</b>



<b>BusinessPotentialofBigData</b>



Chapters1through4setthefoundationfordrivingbusinessstrategieswithdata
science.Inparticular,theBigDataBusinessModelMaturityIndexhighlightsthe
realmofwhat'spossiblefromabusinesspotentialperspectivebyprovidingaroad
mapthatmeasurestheeffectivenessofyourorganizationtoleveragedataand
analyticstopoweryourbusinessmodels.


<b>InThisPart</b>



Chapter1:TheBigDataBusinessMandate


Chapter2:BigDataBusinessModelMaturityIndex
Chapter3:TheBigDataStrategyDocument


</div>
<span class='text_page_counter'>(25)</span><div class='page_container' data-page=25></div>
<span class='text_page_counter'>(26)</span><div class='page_container' data-page=26>

<b>Chapter1</b>



<b>TheBigDataBusinessMandate</b>



<i>Havingtroublegettingyourseniormanagementteamtounderstandthe</i>
<i>businesspotentialofbigdata?Can'tgetyourmanagementleadershipto</i>
<i>considerbigdatatobesomethingotherthananITscienceexperiment?Are</i>


<i>yourline-of-businessleadersunwillingtocommitthemselvesto</i>


<i>understandinghowdataandanalyticscanpowertheirtopinitiatives?</i>
<i>Ifso,thenthis“BigDataSeniorExecutiveCarePackage”isforyou!</i>
<i>Andforalimitedtime,yougetanunlimitedlicensetosharethiscare</i>
<i>packagewithasmanyseniorexecutivesasyoudesire.Butyoumustact</i>
<i>NOW!Becomethelifeofthecompanypartieswithyourextensive</i>


<i>knowledgeofhownewcustomer,product,andoperationalinsightscan</i>
<i>guideyourorganization'svaluecreationprocesses.Andmaybe,justmaybe,</i>
<i>getapromotionintheprocess!!</i>


<b>NOTE</b>



</div>
<span class='text_page_counter'>(27)</span><div class='page_container' data-page=27>

<b>BigDataMBAIntroduction</b>


Thedayswhenbusinessusersandbusinessmanagementcanrelinquishcontrolof
dataandanalyticstoITareover,oratleastfororganizationsthatwanttosurvive
beyondtheimmediateterm.Thebigdatadiscussionnowneedstofocusonhow
organizationscancouplenewsourcesofcustomer,product,andoperationaldata
withadvancedanalytics(datascience)topowertheirkeybusinessprocessesand
<i>elevatetheirbusinessmodels.Organizationsneedtounderstandthattheydonot</i>
<i>needabigdatastrategyasmuchastheyneedabusinessstrategythat</i>
<i>incorporatesbigdata.</i>
<i>TheBigDataMBAchallengesthethinkingthatdataandanalyticsareancillaryor</i>
<i>a“bolton”tothebusiness;thatdataandanalyticsaresomeoneelse'sproblem.In</i>
agrowingnumberofleadingorganizations,dataandanalyticsarecriticalto
businesssuccessandlong-termsurvival.Businessleadersandbusinessusers
readingthisbookwilllearnwhytheymusttakeresponsibilityforidentifying
whereandhowtheycanapplydataandanalyticstotheirbusinesses—otherwise


theyputtheirbusinessesatriskofbeingmadeobsoletebymorenimble,data-drivencompetitors.
<i>TheBigDataMBAintroducesanddescribesconcepts,techniques,methodologies,</i>
<i><b>andhand-onexercisestoguideyouasyouseektoaddressthebigdatabusiness</b></i>
<i><b>mandate.Thebookprovideshands-onexercisesandhomeworkassignmentsto</b></i>
maketheseconceptsandtechniquescometolifeforyourorganization.Itprovides
recommendationsandactionsthatenableyourorganizationtostarttoday.Andin
<i>theprocess,BigDataMBAteachesyouto“thinklikeadatascientist.”</i>


</div>
<span class='text_page_counter'>(28)</span><div class='page_container' data-page=28>

<b>Figure1.1</b>BigDataBusinessModelMaturityIndex


TheBigDataBusinessModelMaturityIndexprovidesaroadmapforhow


organizationscanintegratedataandanalyticsintotheirbusinessmodels.TheBig
DataBusinessModelMaturityIndexiscomposedofthefollowingfivephases:


<b>Phase1:BusinessMonitoring.IntheBusinessMonitoringphase,</b>


organizationsareleveragingdatawarehousingandBusinessIntelligenceto
monitortheorganization'sperformance.


<b>Phase2:BusinessInsights.TheBusinessInsightsphaseisabout</b>


leveragingpredictiveanalyticstouncovercustomer,product,andoperational
insightsburiedinthegrowingwealthofinternalandexternaldatasources.In
thisphase,organizationsaggressivelyexpandtheirdataacquisitioneffortsby
couplingalloftheirdetailedtransactionalandoperationaldatawithinternal
datasuchasconsumercomments,e-mailconversations,andtechniciannotes,
aswellasexternalandpubliclyavailabledatasuchassocialmedia,weather,
traffic,economic,demographics,homevalues,andlocaleventsdata.



<b>Phase3:BusinessOptimization.IntheBusinessOptimizationphase,</b>


organizationsapplyprescriptiveanalyticstothecustomer,product,and
operationalinsightsuncoveredintheBusinessInsightsphasetodeliver
actionableinsightsorrecommendationstofrontlineemployees,business


managers,andchannelpartners,aswellascustomers.ThegoaloftheBusiness
Optimizationphaseistoenableemployees,partners,andcustomersto


optimizetheirkeydecisions.


<b>Phase4:DataMonetization.IntheDataMonetizationphase,</b>


organizationsleveragethecustomer,product,andoperationalinsightsto
createnewsourcesofrevenue.Thiscouldincludesellingdata—orinsights—
intonewmarkets(acellularphoneprovidersellingcustomerbehavioraldata
toadvertisers),integratinganalyticsintoproductsandservicestocreate


</div>
<span class='text_page_counter'>(29)</span><div class='page_container' data-page=29>

<b>Phase5:BusinessMetamorphosis.TheholygrailoftheBigData</b>


BusinessModelMaturityIndexiswhenanorganizationtransitionsits


businessmodelfromsellingproductstoselling“business-as-a-service.”Think
GEselling“thrust”insteadofjetengines.ThinkJohnDeereselling“farming
optimization”insteadoffarmingequipment.ThinkBoeingselling“airmiles”
insteadofairplanes.Andintheprocess,theseorganizationswillcreatea


platformenablingthird-partydeveloperstobuildandmarketsolutionsontop
oftheorganization'sbusiness-as-a-servicebusinessmodel.



Ultimately,bigdataonlymattersifithelpsorganizationsmakemoremoneyand
improveoperationaleffectiveness.Examplesincludeincreasingcustomer


acquisition,reducingcustomerchurn,reducingoperationalandmaintenance
costs,optimizingpricesandyield,reducingrisksanderrors,improving


compliance,improvingthecustomerexperience,andmore.


</div>
<span class='text_page_counter'>(30)</span><div class='page_container' data-page=30></div>
<span class='text_page_counter'>(31)</span><div class='page_container' data-page=31></div>
<span class='text_page_counter'>(32)</span><div class='page_container' data-page=32>

innovationslikeHadoopandSpark,therealdiscussionshouldbeaboutthe


economicimpactofbigdata.Newtechnologiesdon'tdisruptbusinessmodels;it's
whatorganizationsdowiththesenewtechnologiesthatdisruptsbusinessmodels
andenablesnewones.Let'sreviewanexampleofonesucheconomic-driven
businesstransformation:thesteamengine.


Thesteamengineenabledurbanization,industrialization,andtheconqueringof
newterritories.Itliterallyshrankdistanceandtimebyreducingthetimerequired
tomovepeopleandgoodsfromonesideofacontinenttotheother.Thesteam
engineenabledpeopletoleavelow-payingagriculturaljobsandmoveintocities
forhigher-payingmanufacturingandclericaljobsthatledtoahigherstandardof
living.


Forexample,citiessuchasLondonshotupintermsofpopulation.In1801,before
theadventofGeorgeStephenson'sRocketsteamengine,Londonhad1.1million
residents.Aftertheinvention,thepopulationofLondonmorethandoubledto2.7
millionresidentsby1851.Londontransformedthenucleusofsocietyfromsmall
tight-knitcommunitieswheretextileproductionandagriculturewereprevalent
intobigcitieswithavarietyofjobs.Thesteamlocomotiveprovidedquicker
transportationandmorejobs,whichinturnbroughtmorepeopleintothecities
anddrasticallychangedthejobmarket.By1861,only2.4percentofLondon's


populationwasemployedinagriculture,while49.4percentwereinthe


manufacturingortransportationbusiness.Thesteamlocomotivewasamajor
turningpointinhistoryasittransformedsocietyfromlargelyruraland


agriculturalintourbanandindustrial.2


</div>
<span class='text_page_counter'>(33)</span><div class='page_container' data-page=33></div>
<span class='text_page_counter'>(34)</span><div class='page_container' data-page=34></div>
<span class='text_page_counter'>(35)</span><div class='page_container' data-page=35></div>
<span class='text_page_counter'>(36)</span><div class='page_container' data-page=36>

warehousing)andanalytics(datascience)environmentsdifferently.Thesetwo
environmentshaveverydifferentcharacteristicsandservedifferentpurposes.The
datalakecanmakebothoftheBIanddatascienceenvironmentsmoreagileand
moreproductive(Figure1.2).


<b>Figure1.2</b>Moderndata/analyticsenvironment


<b>CROSS-REFERENCE</b>



Chapter7(”TheDataLake“)introducestheconceptofadatalakeandtherole
thedatalakeplaysinsupportingyourexistingdatawarehouseandBusiness
Intelligenceinvestmentswhileprovidingthefoundationforyourdatascience
environment.Chapter7discusseshowthedatalakecanun-cuffyourdata
scientistsfromthedatawarehousetouncoverthosevariablesandmetricsthat
mightbebetterpredictorsofbusinessperformance.Italsodiscusseshowthe
datalakecanfreeupexpensivedatawarehouseresources,especiallythose
resourcesassociatedwithExtract,Transform,andLoad(ETL)dataprocesses.


<b>Don'tThink“WhatHappened,”Think“WhatWillHappen”</b>



Businessusershavebeentrainedtocontemplatebusinessquestionsthatmonitor
thecurrentstateofthebusinessandtofocusonretrospectivereportingonwhat
happened.BusinessusershavebecomeconditionedbytheirBIanddata



warehouseenvironmentstoonlyconsiderquestionsthatreportoncurrent
businessperformance,suchas“HowmanywidgetsdidIselllastmonth?”and
“Whatweremygrosssaleslastquarter?”


</div>
<span class='text_page_counter'>(37)</span><div class='page_container' data-page=37></div>
<span class='text_page_counter'>(38)</span><div class='page_container' data-page=38>

Unfortunately,thattypeofthinkinghasledtosiloeddatafiefdoms,siloed
decisions,andanun-empoweredandfrustratedbusinessteam.Organizations
needtothinkdifferentlyabouthowtheyempoweralloftheiremployees.


Organizationsneedtofindawaytopromoteandnurturecreativethinkingand
groundbreakingideasacrossalllevelsoftheorganization.Thereisnoedictthat
statesthatthebestideasonlycomefromseniormanagement.


Thekeytobigdatasuccessisempoweringcross-functionalcollaborationand
exploratorythinkingtochallengelong-heldorganizationalrulesofthumb,
heuristics,and“gut”decisionmaking.Thebusinessneedsanapproachthatis
inclusiveofallthekeystakeholders—IT,businessusers,businessmanagement,
channelpartners,andultimatelycustomers.Thebusinesspotentialofbigdatais
onlylimitedbythecreativethinkingoftheorganization.


<b>CROSS-REFERENCE</b>



Chapter13(“PowerofEnvisioning”)discusseshowtheBIanddatascience
teamscancollaboratetobrainstorm,test,andrefinenewvariablesthatmight
bebetterpredictorsofbusinessperformance.Wewillintroduceseveral


techniquesandconceptsthatcanbeusedtodrivecollaborationbetweenthe
businessandITstakeholdersandultimatelyhelpyourdatascienceteam
uncovernewcustomer,product,andoperationalinsightsthatleadtobetter
businessperformance.Chapter14(“OrganizationalRamifications”)



</div>
<span class='text_page_counter'>(39)</span><div class='page_container' data-page=39></div>
<span class='text_page_counter'>(40)</span><div class='page_container' data-page=40></div>
<span class='text_page_counter'>(41)</span><div class='page_container' data-page=41>

<b>HomeworkAssignment</b>



Usethefollowingexercisestoapplywhatyoulearnedinthischapter.


<b>Exercise#1:Identifyakeybusinessinitiativeforyourorganization,</b>


somethingthebusinessistryingtoaccomplishoverthenext9to12months.It
mightbesomethinglikeimprovecustomerretention,optimizecustomer


acquisition,reducecustomerchurn,optimizepredictivemaintenance,reduce
revenuetheft,andsoon.


<b>Exercise#2:Brainstormandwritedownwhat(1)customer,(2)product,and</b>


(3)operationalinsightsyourorganizationwouldliketouncoverinorderto
supportthetargetedbusinessinitiative.Startbycapturingthedifferenttypes
ofdescriptive,predictive,andprescriptivequestionsyou'dliketoanswerabout
thetargetedbusinessinitiative.Tip:Don'tworryaboutwhetherornotyou
havethedatasourcesyouneedtoderivetheinsightsyouwant(yet).


<b>Exercise#3:Brainstormandwritedowndatasourcesthatmightbeusefulin</b>


uncoveringthosekeyinsights.Lookbothinternallyandexternallyfor


</div>
<span class='text_page_counter'>(42)</span><div class='page_container' data-page=42>

<b>Notes</b>



1<sub>Hopkins,Brian,FatemehKhatibloowithKyleMcNabb,JamesStaten,Andras</sub>


Cser,HolgerKisker,Ph.D.,LeslieOwens,JenniferBelissent,Ph.D.,Abigail


Komlenic,“ResetOnBigData:EmbraceBigDatatoEngageCustomersat
Scale,”ForresterResearch,2014.


</div>
<span class='text_page_counter'>(43)</span><div class='page_container' data-page=43></div>
<span class='text_page_counter'>(44)</span><div class='page_container' data-page=44>

<b>Chapter2</b>



<b>BigDataBusinessModelMaturityIndex</b>



Organizationsdonotunderstandhowfarbigdatacantakethemfromabusiness
transformationperspective.Organizationsdon'thaveawayofunderstanding
whattheultimatebigdataendstatewouldorcouldlooklikeoranswering
questionssuchas:


WhereandhowshouldIstartmybigdatajourney?


HowcanIcreatenewrevenueormonetizationopportunities?


HowdoIcomparetootherswithrespecttomyorganization'sadoptionofbig
dataasabusinessenabler?


HowfarcanIpushbigdatatopower—eventransform—mybusinessmodels?
<i><b>Tohelpaddressthesetypesofquestions,I'vecreatedtheBigDataBusiness</b></i>
<i><b>ModelMaturityIndex.Notonlycanorganizationscanusethisindexto</b></i>
understandwheretheysitwithrespecttootherorganizationsinexploitingbig
dataandadvancedanalyticstopowertheirbusinessmodels,buttheindex


providesaroadmaptohelporganizationsacceleratetheintegrationofdataand
analyticsintotheirbusinessmodels.


</div>
<span class='text_page_counter'>(45)</span><div class='page_container' data-page=45>

<b>Chapter2Objectives</b>




IntroducetheBigDataBusinessModelMaturityIndexasaframeworkfor
organizationstomeasurehoweffectivetheyareatleveragingdataand
analyticstopowertheirbusinessmodels


Discusstheobjectivesandcharacteristicsofeachofthefivephasesofthe
BigDataBusinessModelMaturityIndex:BusinessMonitoring,Business
Insights,BusinessOptimization,DataMonetization,andBusiness


Metamorphosis


<i><b>Discusshowtheeconomicsofbigdataandthefourbigdatavalue</b></i>
<i><b>driverscanenableorganizationstocrosstheanalyticschasmand</b></i>
advancepasttheBusinessMonitoringphaseintotheBusinessInsights
andBusinessOptimizationphases


</div>
<span class='text_page_counter'>(46)</span><div class='page_container' data-page=46>

<b>IntroducingtheBigDataBusinessModelMaturity</b>


<b>Index</b>



Organizationsaremovingatdifferentpaceswithrespecttowhereandhowthey
areadoptingbigdataandadvancedanalyticstocreatebusinessvalue.Some


organizationsaremovingverycautiously,astheyareunclearastowhereandhow
tostartandwhichofthebevyofnewtechnologyinnovationstheyneedtodeploy
inordertostarttheirbigdatajourneys.Othersaremovingatamoreaggressive
pacebyacquiringandassemblingabigdatatechnologyfoundationbuiltonmany
newbigdatatechnologiessuchasHadoop,Spark,MapReduce,YARN,Mahout,
Hive,HBase,andmore.


However,aselectfewarelookingbeyondjustthetechnologytoidentifywhereand
howtheyshouldbeintegratingbigdataintotheirexistingbusinessprocesses.


Theseorganizationsareaggressivelylookingtoidentifyandexploitopportunities
tooptimizekeybusinessprocesses.Andtheseorganizationsareseekingnew


monetizationopportunities;thatis,seekingoutbusinessopportunitieswherethey
can


Packageandselltheiranalyticinsightstoothers


Integrateadvancedanalyticsintotheirproductsandservicestocreate
“intelligent”products


Createentirelynewproductsandservicesthathelpthementernewmarkets
andtargetnewcustomers


Thesearethefolkswhorealizethattheydon'tneedabigdatastrategyasmuchas
theyneedabusinessstrategythatincorporatesbigdata.Andwhenorganizations
“flipthatbyte”onthefocusoftheirbigdatainitiatives,thebusinesspotentialis
almostboundless.


OrganizationscanusetheBigDataBusinessModelMaturityIndexasa


</div>
<span class='text_page_counter'>(47)</span><div class='page_container' data-page=47>

<b>Figure2.1</b>BigDataBusinessModelMaturityIndex


OrganizationstendtofindthemselvesinoneoffivephasesontheBigData
BusinessModelMaturityIndex:


<b>Phase1:BusinessMonitoring.IntheBusinessMonitoringphase,</b>


organizationsareapplyingdatawarehousingandBusinessIntelligence



techniquesandtoolstomonitortheorganization'sbusinessperformance(also
calledBusinessPerformanceManagement).


<b>Phase2:BusinessInsights.IntheBusinessInsightsphase,organizations</b>


<i>aggressivelyexpandtheirdataassetsbyamassingalloftheirdetailed</i>
transactionalandoperationaldataandcouplingthattransactionaland


operationaldatawithnewsourcesofinternaldata(e.g.,consumercomments,
e-mailconversations,techniciannotes)andexternaldata(e.g.,socialmedia,
weather,traffic,economic,data.gov)sources.OrganizationsintheBusiness
Insightsphasethenusepredictiveanalyticstouncovercustomer,product,and
operationalinsightsburiedinandacrossthesedatasources.


<b>Phase3:BusinessOptimization.IntheBusinessOptimizationphase,</b>


organizationsbuildonthecustomer,product,andoperationalinsights


uncoveredintheBusinessInsightsphasebyapplyingprescriptiveanalyticsto
optimizekeybusinessprocesses.OrganizationsintheBusinessOptimization
phasepushtheanalyticresults(e.g.,recommendations,scores,rules)to


frontlineemployeesandbusinessmanagerstohelpthemoptimizethetargeted
businessprocessthroughimproveddecisionmaking.TheBusiness


Optimizationphasealsoprovidesopportunitiesfororganizationstopush
analyticinsightstotheircustomersinordertoinfluencecustomerbehaviors.
AnexampleoftheBusinessOptimizationphaseisaretailerthatdelivers
analytic-basedmerchandisingrecommendationstothestoremanagersto
optimizemerchandisemarkdownsbasedonpurchasepatterns,inventory,


weatherconditions,holidays,consumercomments,andsocialmediapostings.


<b>Phase4:DataMonetization.TheDataMonetizationphaseiswhere</b>


</div>
<span class='text_page_counter'>(48)</span><div class='page_container' data-page=48></div>
<span class='text_page_counter'>(49)</span><div class='page_container' data-page=49></div>
<span class='text_page_counter'>(50)</span><div class='page_container' data-page=50></div>
<span class='text_page_counter'>(51)</span><div class='page_container' data-page=51></div>
<span class='text_page_counter'>(52)</span><div class='page_container' data-page=52>

theaveragecampaignperformanceincertainmarketsoncertaindaysof
theweek


Customersthatarereactingtwotothreestandarddeviationsoutsidethe
normintheirpurchasepatternsforcertainproductcategoriesincertain
weatherconditions


Supplierswhosecomponentsareoperatingoutsidetheupperorlower
limitsofacontrolchartinextremecoldweathersituations


<b>CROSS-REFERENCE</b>



Forthepredictiveanalyticstobeeffective,organizationsneedtobuild
detailedanalyticprofilesforeachindividualbusinessentity—customers,
patients,students,windturbines,jetengines,ATMs,etc.Thecreationand
roleofanalyticprofilesisatopiccoveredinChapter5,“DifferencesBetween
BusinessIntelligenceandDataScience.”


<i><b>BusinessInsightsPhaseChallenge</b></i>


TheBusinessInsightsphaseisthemostdifficultstageoftheBigDataBusiness
ModelMaturityIndexbecauseitrequiresorganizationsto“thinkdifferently”
abouthowtheyapproachdataandanalytics.Therules,techniques,and


approachesthatworkedintheBusinessIntelligenceanddatawarehouseworlds
donotnecessaryapplytotheworldofbigdata.Thisistrulythe“crossingthe


analyticschasm”moment(seeFigure2.2).


<b>Figure2.2</b>Crossingtheanalyticschasm


</div>
<span class='text_page_counter'>(53)</span><div class='page_container' data-page=53></div>
<span class='text_page_counter'>(54)</span><div class='page_container' data-page=54></div>
<span class='text_page_counter'>(55)</span><div class='page_container' data-page=55>

<b>Figure2.3</b>Packagingandsellingaudienceinsights


Integratinganalyticinsightsdirectlyintoanorganization'sproductsand
servicestocreate“intelligent”productsorservices,suchas:


Carsthatlearnacustomer'sdrivingpatternsandbehaviorsandadjust
drivercontrols,seats,mirrors,brakepedals,suspension,steering,
dashboarddisplays,etc.tomatchthecustomer'sdrivingstyle
TelevisionsandDVRsthatlearnwhattypesofshowsandmoviesa


customerlikesandsearchacrossthedifferentcableandInternetchannels
tofindandautomaticallyrecordsimilarshowsforthatcustomer


Ovensthatlearnhowacustomerlikescertainfoodspreparedandcooks
theminthatmannerautomaticallyandalsoincluderecommendationsfor
otherfoodsandrecipesthat“otherslikeyou”enjoy


Jetenginesthatcaningestweather,elevation,windspeed,andother
environmentaldatatomakeadjustmentstobladeangles,tilt,yaw,and
rotationspeedstominimizefuelconsumptionduringflight


Repackaginginsightstocreateentirelynewproductsandservicesthathelp
organizationstoenternewmarketsandtargetnewcustomersoraudiences.
Forexample,organizationscancapture,analyze,andpackagecustomer,
product,andoperationalinsightsacrosstheoverallmarketinordertohelp
channelpartnerstomoreeffectivelymarketandselltotheircustomers,such


as:


Onlinedigitalmarketplaces(Yahoo,Google,eBay,Facebook)could
leveragegeneralmarkettrendsandothermerchantperformancedatato
providerecommendationstosmallmerchantsoninventory,ordering,
merchandising,marketing,andpricing.


</div>
<span class='text_page_counter'>(56)</span><div class='page_container' data-page=56></div>
<span class='text_page_counter'>(57)</span><div class='page_container' data-page=57>

patterns,localweather,localwaterquality,andlocalenvironmentalconditions
suchaslocalwaterconservationeffortsandenergycosts


Retailersmovingintothe“ShoppingOptimization”businessbyrecommending
specificproductsgivencustomers'currentbuyingpatternsascomparedwith
otherslikethem,includingrecommendationsforproductsthattheymaynot
evensell(think“Miracleon43rdStreet”)


Airlinesmovingintothe“TravelDelight”businessofnotonlyoffering


</div>
<span class='text_page_counter'>(58)</span><div class='page_container' data-page=58>

<b>BigDataBusinessModelMaturityIndexLessons</b>


<b>Learned</b>



Therearesomeinterestinglessonsthatorganizationswilldiscoverasthey
progressthroughthephasesoftheBigDataBusinessModelMaturityIndex.
Understandingtheselessonsaheadoftimeshouldhelpprepareorganizationsfor
theirbigdatajourney.


<b>Lesson1:FocusInitialBigDataEffortsInternally</b>



ThefirstthreephasesoftheBigDataBusinessModelMaturityIndexseekto
extractmorefinancialorbusinessvalueoutoftheorganization'sinternal



processesorbusinessinitiatives.Thefirstthreephasesdrivebusinessvalueanda
ReturnonInvestment(ROI)byseekingtointegratenewsourcesofcustomer,
product,operational,andmarketdatawithadvancedanalyticstoimprovethe
decisionsthataremadeaspartoftheorganization'skeyinternalprocessand
businessinitiatives(seeFigure2.4).


<b>Figure2.4</b>Optimizeinternalprocesses


<i><b>Theinternalprocessoptimizationeffortsstartbyseekingtoleveragethe</b></i>
organization'sBusinessIntelligenceanddatawarehouseassets.Thisincludes
buildingonthedatawarehouse'sdatasources,dataextractionandenrichment
algorithms,dimensions,metrics,keyperformanceindicators,reports,and


</div>
<span class='text_page_counter'>(59)</span><div class='page_container' data-page=59>

<b>TheFourBigDataValueDrivers</b>



1. Accesstoalltheorganization'sdetailedtransactionalandoperationaldata
atthelowestlevelofgranularity(attheindividualcustomer,machine,or
devicelevel).


2. Integrationofunstructureddatafrombothinternal(consumercomments,
e-mailthreads,techniciannotes)andexternalsources(socialmedia,


mobile,publiclyavailable)withthedetailedtransactionalandoperational
datatoprovidenewmetricsandnewdimensionsagainstwhichto


optimizekeybusinessprocesses.


3. Leveragereal-time(orright-time)dataanalysistoacceleratethe


organization'sabilitytoidentifyandactoncustomer,product,andmarket


opportunitiesinatimeliermanner.


4. Applypredictiveanalyticsanddataminingtouncovercustomer,product,
andoperationalinsightsorareasof“unusualness”buriedinthemassive
volumesofdetailedstructuredandunstructureddatathatareworthyof
furtherbusinessinvestigation.


Organizationsmustleveragethesefourbigdatavaluedriverstocrosstheanalytics
chasmbyuncoveringnewcustomer,product,andoperationalinsightsthatcanbe
usedtooptimizekeybusinessprocesses—whetherdeliveringactionable


recommendationstofrontlineemployeesandbusinessmanagersordelivering
“NextBestOffer”orrecommendationstodelightcustomersandbusiness
partners.


<b>Lesson2:LeverageInsightstoCreateNewMonetization</b>


<b>Opportunities</b>



ThelasttwophasesoftheBigDataBusinessModelMaturityIndexarefocusedon
externalmarketopportunities;opportunitiestocreatenewmonetizationor


</div>
<span class='text_page_counter'>(60)</span><div class='page_container' data-page=60>

<b>Figure2.5</b>Createnewmonetizationopportunities


Thisisthepartofthebigdatajourneythatcatchesmostorganizations'attention:
theopportunitytoleveragetheinsightsgatheredthroughtheoptimizationoftheir
keybusinessprocessestocreatenewrevenueormonetizationopportunities.


Organizationsareeagertoleveragenewcorporateassets—data,analytics,and
businessinsights—inordertocreatenewsourcesofrevenue.Thisisthe“4Ms”
phaseoftheBigDataBusinessModelMaturityIndexwhereorganizationsfocus


<i>onleveragingdataandanalyticstocreatenewopportunitiesto“MakeMeMore</i>
<i>Money!”</i>


<b>Lesson3:PreparingforOrganizationalTransformation</b>



Tofullyexploitthebigdataopportunity,subtleorganizationalandcultural
changeswillbenecessaryfortheorganizationtoadvancealongthematurity
index.Iforganizationsareseriousaboutintegratingdataandanalyticsintotheir
businessmodels,thenthreeorganizationalorculturaltransformationswillneed
totakeplace:


<b>1.TreatDataasanAsset.Organizationsmuststarttotreatdataasanasset</b>


tobenurturedandgrown,notacosttobeminimized.Organizationsmust
developaninsatiableappetiteformoreandmoredata—eveniftheyare
unclearastohowtheywillusethatdata.Thisisasignificantculturalchange
fromthedatawarehousedayswherewetreateddataasacosttobeminimized.


<b>2.LegallyProtectYourAnalyticsIntellectualProperty.Organizations</b>


mustputintoplaceformalprocessesandprocedurestocapture,track,refine,
andevenlegallyprotecttheiranalyticassets(e.g.,analyticmodels,data


enrichmentalgorithms,andanalyticresultssuchasscores,recommendations,
andassociationrules)askeyorganizationalintellectualproperty.Whilethe
underlyingtechnologiesmaychangeovertime,theresultingdataandanalytic


</div>
<span class='text_page_counter'>(61)</span><div class='page_container' data-page=61>

<b>3.GetComfortableUsingDatatoGuideDecisions.Business</b>



managementandbusinessusersmustgainconfidenceinusingdataand
analyticstoguidetheirdecisionmaking.Organizationsmustgetcomfortable
withmakingbusinessdecisionsbasedonwhatthedataandtheanalyticstell
themversusdefaultingtothe“HighestPaidPerson'sOpinion”(HIPPO).The
organization'sinvestmentsindata,analytics,people,processes,and


technologywillbefornaughtiftheorganizationisn'tpreparedtomake
decisionsbasedonwhatthedataandtheanalyticstellthem.Withthatsaid,
it'simportantthattheanalyticinsightsarepositionedas“recommendations”
thatbusinessusersandbusinessmanagementcanaccept,reject,ormodify.In
thatway,organizationscanleverageanalyticstoestablishorganizational


</div>
<span class='text_page_counter'>(62)</span><div class='page_container' data-page=62></div>
<span class='text_page_counter'>(63)</span><div class='page_container' data-page=63>

throughimproveddecisionmaking(orimprovedoperationaleffectivenessfor
non-profitorganizations).Bigdataholdsthepotentialtobothoptimizekey
businessprocessesandcreatenewmonetizationorrevenueopportunities.
Insummary:


TheBigDataBusinessModelMaturityIndexprovidesaframeworkfor
organizationstomeasurehoweffectivetheyareatleveragingdataand
analyticstopowertheirbusinessmodels.


ThefivephasesoftheBigDataBusinessModelMaturityIndexareBusiness
Monitoring,BusinessInsights,BusinessOptimization,DataMonetization,and
BusinessMetamorphosis.


Theeconomicsofbigdataandthefourbigdatavaluedriverscanenable
organizationstocrosstheanalyticschasm.


</div>
<span class='text_page_counter'>(64)</span><div class='page_container' data-page=64>

<b>HomeworkAssignment</b>




Usethefollowingexercisestoapplyandreinforcetheinformationpresentedin
thischapter:


<b>Exercise#1:Listtwoorthreeofyourorganization'skeybusinessprocesses.</b>


Thatis,writedowntwoorthreebusinessprocessesthatuniquelydifferentiate
yourorganizationfromyourcompetition.


<b>Exercise#2:Listthefourbigdatavaluedriversthatareenabledbythe</b>


economicsofbigdataanddescribehoweachmightimpactoneofyour
organization'skeybusinessprocessesidentifiedinExercise#1.


<b>Exercise#3:FortheselectedkeybusinessprocessesidentifiedinExercise</b>


#1,describehoweachkeybusinessprocessmightbeimprovedasittransitions
alongthefivephasesoftheBigDataBusinessModelMaturityIndex.Identify
thecustomer,product,andoperationalramificationsthateachofthefive
phasesmighthaveontheselectedkeybusinessprocess.


<b>Exercise#4:Listtheculturalchangesthatyourorganizationmustaddressif</b>


</div>
<span class='text_page_counter'>(65)</span><div class='page_container' data-page=65></div>
<span class='text_page_counter'>(66)</span><div class='page_container' data-page=66>

<b>Chapter3</b>



<b>TheBigDataStrategyDocument</b>



Oneofthebiggestchallengesorganizationsfacewithrespecttobigdatais


<i><b>identifyingwhereandhowtostart.Thebigdatastrategydocument,detailed</b></i>
inthischapter,providesaframeworkforlinkinganorganization'sbusiness



strategyandsupportingbusinessinitiativestotheorganization'sbigdataefforts.
Thebigdatastrategydocumentguidestheorganizationthroughtheprocessof
breakingdownitsbusinessstrategyandbusinessinitiativesintopotentialbigdata
businessusecasesandthesupportingdataandanalyticrequirements.


<b>NOTE</b>



<i>ThebigdatastrategydocumentfirstappearedinmybookBigData:</i>


<i>UnderstandingHowDataPowersBigBusiness.Sincethenandcourtesyof</i>
severalclientengagements,significantimprovementshavebeenmadetohelp
userstouncoverbigdatausecases.Inparticular,theprocesshasbeen


enhancedtoclarifythebusinessvalueandimplementationfeasibility
assessmentsofthedifferentdatasourcesandusecaseprioritization(see
Figure3.1).


</div>
<span class='text_page_counter'>(67)</span><div class='page_container' data-page=67>

<b>Chapter3Objectives</b>



Establishcommonterminologyforbigdata.


Examinetheconceptofabusinessinitiativeandprovidesomeexamplesof
wheretofindthesebusinessinitiatives.


Introducethebigdatastrategydocumentasaframeworkforhelping
organizationstoidentifytheusecasesthatguidewhereandhowtheycan
starttheirbigdatajourneys.


Provideahands-onexampleofthebigdatastrategydocumentinaction


usingChipotle,achainoforganicMexicanfoodrestaurants(andoneof
myfavoriteplacestoeat!).


Introduceworksheetstohelporganizationstodeterminethebusiness
valueandimplementationfeasibilityofthedatasourcesthatcomeoutof
thebigdatastrategydocumentprocess.


<i><b>Introducetheprioritizationmatrixasatoolthatcandrivebusiness</b></i>
andITalignmentaroundprioritizingtheusecasesbasedonbusinessvalue
andimplementationfeasibilityovera9-to12-monthwindow.


</div>
<span class='text_page_counter'>(68)</span><div class='page_container' data-page=68>

<b>EstablishingCommonBusinessTerminology</b>



Beforewelaunchintothebigdatastrategydocumentdiscussion,weneedto
defineafewcriticaltermstoensurethatweareusingconsistentterminology
throughoutthechapterandthebook:


<i><b>CorporateMission.Whytheorganizationexists;defineswhatan</b></i>


organizationisandtheorganization'sreasonforbeing.Forexample,TheWalt
DisneyCompany'scorporatemissionis“tobeoneoftheworld'sleading


producersandprovidersofentertainmentandinformation.”1


<i><b>BusinessStrategy.Howtheorganizationisgoingtoachieveitsmissionover</b></i>


thenexttwotothreeyears.


<i><b>StrategicBusinessInitiatives.Whattheorganizationplanstodoto</b></i>



achieveitsbusinessstrategyoverthenext9to12months;usuallyincludes
businessobjectives,financialtargets,metrics,andtimeframes.


<b>BusinessEntities.Thephysicalobjectsorentities(e.g.,customers,patients,</b>


students,doctors,windturbines,trucks)aroundwhichthebusinessinitiative
willtrytounderstand,predict,andinfluencebehaviorsandperformance
<i><b>(sometimesreferredtoasthestrategicnounsofthebusiness).</b></i>


<b>BusinessStakeholders.Thosebusinessfunctions(sales,marketing,</b>


finance,storeoperations,logistics,andsoon)thatimpactorareimpactedby
thestrategicbusinessinitiative.


<b>BusinessDecisions.Thedecisionsthatthebusinessstakeholdersneedto</b>


makeinsupportofthestrategicbusinessinitiative.


<i><b>BigDataUseCases.Theanalyticusecases(decisionsandcorresponding</b></i>


actions)thatsupportthestrategicbusinessinitiative.


<b>Data.Thestructuredandunstructureddatasources,bothinternaland</b>


</div>
<span class='text_page_counter'>(69)</span><div class='page_container' data-page=69>

<b>IntroducingtheBigDataStrategyDocument</b>



Thebigdatastrategydocumenthelpsorganizationsaddressthechallengeof
identifyingwhereandhowtostarttheirbigdatajourneys.Thebigdatastrategy

documentusesasingle-pageformatthatanyorganizationcanuse(profitornon-profit)thatlinksanorganization'sbigdataeffortstoitsbusinessstrategyandkey


businessinitiatives.Thebigdatastrategydocumentiseffectiveforthefollowing
reasons:


It'sconcise.Itfitsonasinglepagesothatanyoneintheorganizationcan
quicklyreviewittoensureheorsheisworkingonthetoppriorityitems.
It'sclear.Itclearlydefineswhattheorganizationneedstodoinorderto
achievetheorganization'skeybusinessinitiatives.


It'sbusinessrelevant.Itstartsbyfocusingonthebusinessstrategyand
supportinginitiativesbeforeitdivesintothedataandtechnology


requirements.


Thebigdatastrategydocumentiscomposedofthefollowingsections(seeFigure
3.2):


Businessstrategy


Keybusinessinitiatives
Keybusinessentities
Keydecisions


</div>
<span class='text_page_counter'>(70)</span><div class='page_container' data-page=70>

<b>Figure3.2</b>Bigdatastrategydocument


Therestofthechapterwilldetaileachofthesesectionsandprovideguidelinesfor
howtheorganizationcantriagetheorganization'sbusinessstrategyintothe


financialdrivers(orusecases)onwhichtheorganizationcanfocusitsbigdata
efforts.WewilluseacasestudyaroundChipotleMexicanGrillstoreinforcethe
triageandanalysisprocess.



<b>IdentifyingtheOrganization'sKeyBusinessInitiatives</b>



Thestartingpointforthebigdatastrategydocumentprocessistoidentifythe
organization'sbusinessinitiativesoverthenext9to12months.Thatis,whatis

thebusinesstryingtoaccomplishoverthenext9to12months?This9-to12-monthtimeframeiscritical,asit


Focusestheorganization'sbigdataeffortsonsomethingthatisofimmediate
valueandrelevancetothebusiness


Createsasenseofurgencyfortheorganizationtomovequicklyanddiligently
Givesthebigdataprojectamorerealisticchanceofdeliveringapositive
ReturnonInvestment(ROI)andafinancialpaybackin12monthsorless
Abusinessinitiativesupportsthebusinessstrategyandhasthefollowing
characteristics:



Criticaltoimmediate-termbusinessand/orfinancialperformance(usually9-to12-monthtimeframe)


</div>
<span class='text_page_counter'>(71)</span><div class='page_container' data-page=71></div>
<span class='text_page_counter'>(72)</span><div class='page_container' data-page=72>

<b>Figure3.3</b>Chipotle's2012lettertotheshareholders


FromthePresident'sLetter,wecanidentifyatleastfourkeybusinessinitiatives
forthecomingyear:


Improveemployee(talent)acquisition,maturation,andretention(whichis
especiallyimportantforanorganizationwhere90percentofitsmanagement
hascomeupthroughtheranksofthestore).


Continuedouble-digitrevenuegrowth(up20.3percentin2012)byopening


newstores(opened183over100in2012).


Increasesamestoresalesgrowth(7.1percentgrowthin2012).


ImprovemarketingeffectivenessonbuildingtheChipotlebrandandengaging
withcustomersinwaysthatcreatestronger,deeperbonds.


Whileanyfourofthesebusinessinitiativesareripeforthebigdatastrategy
document,fortheremainderofthisexercise,we'llfocusonthe“increasesame
storesales”businessinitiativebecauseincreasingsalesofabusinessentityor
outletisrelevantacrossanumberofdifferentindustries(i.e.,hospitality,gaming,
banking,insurance,retail,highereducation,healthcareproviders).


<b>IdentifyKeyBusinessEntitiesandKeyDecisions</b>



</div>
<span class='text_page_counter'>(73)</span><div class='page_container' data-page=73>

<b>NOTE</b>



Itisaroundthesebusinessentitiesthatwearegoingtowanttocapturethe
behaviors,tendencies,patterns,trends,preferences,etc.attheindividual
entitylevel.Forexample,acreditcardcompanywouldwanttocaptureBill
Schmarzo'sspecifictravelandbuyingpatternsandtendenciesinorderto
betterdetectfraudandimprovemerchantmarketingoffers.


Figure3.4showsthetemplatethatwearegoingtousetosupportthebigdata
strategydocumentprocess.Wehavealreadycapturedourtargeted“increasesame
storesales”businessinitiative.


<b>Figure3.4</b>Chipotle's“increasesamestoresales”businessinitiative


Takeamomenttowritedownwhatyouthinkmightbethekeybusinessentitiesor


strategicnounsforthe“increasesamestoresales”businessinitiative:


HerearethreebusinessentitiesthatIcameupwith:
Stores


</div>
<span class='text_page_counter'>(74)</span><div class='page_container' data-page=74></div>
<span class='text_page_counter'>(75)</span><div class='page_container' data-page=75></div>
<span class='text_page_counter'>(76)</span><div class='page_container' data-page=76>

Businessentity:Localevents
Decision:


Decision:


Businessentity:Localcompetitors
Decision:


Decision:


<b>NOTE</b>



Someofthedecisionswillbeverysimilar.That'sgoodbecauseitallowsthe
organizationtoapproachthedecisionsfrommultipleperspectives.


Figure3.5showsiswhattheChipotlebigdatastrategydocumentlooksatthis
pointintheexercisewiththeadditionofsomeofthebusinessdecisions.


<b>Figure3.5</b>Chipotlekeybusinessentitiesanddecisions


<b>IdentifyFinancialDrivers(UseCases)</b>



</div>
<span class='text_page_counter'>(77)</span><div class='page_container' data-page=77>

<b>CROSS-REFERENCE</b>



Whileitishardtoactuallydothisgroupingprocessinabook,theuseof


facilitationtechniquestohelpbrainstormandgroupthesedecisionswillbe
<i>coveredintheFacilitationTechniquessectionofChapter13,“Powerof</i>
Envisioning.”


ForChipotle's“increasesamestoresales”businessinitiative,thefollowingare
likelyfinancialdriversorusecases:


Increasestoretraffic(acquirenewcustomers,increasefrequencyofrepeat
customers)


Increaseshoppingbagrevenueandmargins(cross-sellcomplementary
products,up-sell)


Increasenumberofcorporateevents(catering,repeatcateringevents)


Improvepromotionaleffectiveness(HalloweenBoo-ritto,Christmasgiftcards,
graduation,holiday,andspecialeventgiftcards)


Improvenewproductintroductioneffectiveness(seasonal,holiday)


Theentirebigdatastrategydocumentprocesshasbeendesignedtouncoverthese
usecases—toidentifythosefinancialdriversthatsupportourtargetedbusiness
initiative.Theusecasesandfinancialdriversarethepointofthebigdatastrategy
<i>documentwherewefocustheorganizationonthe“MakeMeMoreMoney”big</i>
dataopportunities.


</div>
<span class='text_page_counter'>(78)</span><div class='page_container' data-page=78></div>
<span class='text_page_counter'>(79)</span><div class='page_container' data-page=79>

<b>Figure3.6</b>CompletedChipotlebigdatastrategydocument


<b>IdentifyandPrioritizeDataSources</b>




Withtheusecasesandfinancialdriversidentified,wearenowreadytomoveinto
thedataandmetricsenvisioningprocess.Wewanttobrainstormdatasources
(regardlessofwhetherornotyoucurrentlyhaveaccesstothesedatasources)that
<i><b>mightyieldnewinsightsthatsupportthetargetedbusinessinitiative.Wewantto</b></i>
unleashthebusinessandITteams'creativethinkingtobrainstormdatasources
thatmightyieldnewcustomer,product,store,campaign,andoperationalinsights
thatcouldimprovetheeffectivenessofthedifferentusecases.


Forexample,Chipotledatasourcesthatwereidentifiedaspartoftheenvisioning
exercisescouldinclude:


PointofSalesTransactions
MarketBaskets


</div>
<span class='text_page_counter'>(80)</span><div class='page_container' data-page=80>

Weather


TrafficPatterns
Yelp


Zillow/Realtor.com


Twitter/Facebook/Instagram
Twellow/Twellowhood


ZipCodeDemographics
EventBrite


MaxPreps
MobileApp



Butnotalldatasourcesareofequalbusinessvalueorhaveequalimplementation
feasibility.Thedatasourcesneedtobeevaluatedinlightof


Thebusinessvaluethatdatasourcecouldprovideinsupportoftheindividual
usecase


Thefeasibility(orease)ofacquiring,cleaning,aligning,normalizing,
enriching,andanalyzingthosedatasources


Sowewanttoaddtwoprocesses(worksheets)tothebigdatastrategydocument
processthatwillevaluatethebusinessvalueandimplementationfeasibilityof
eachofthepotentialdatasources.


</div>
<span class='text_page_counter'>(81)</span><div class='page_container' data-page=81>

<b>Figure3.7</b>BusinessvalueofpotentialChipotledatasources


You'dwanttogothroughagroupbrainstormingprocesswiththebusiness


stakeholderstoassesstherelativevalueofeachdatasourcewithrespecttoeach
<i>usecase.Thebusinessusersownthebusinessvaluedeterminationbecausethey</i>
arebestpositionedtobeabletounderstandandquantifythebusinessvaluethat
eachdatasourcecouldprovidetotheusecases.


<b>NOTE</b>



IlikeusingHarveyBalls(in


boththedatavalueandthefeasibilityassessmentcharts.TheHarveyBalls
quicklyandeasilycommunicatetherelativevalueofeachdatasourcewith
respecttoeachusecase.



ReviewingthedatavalueassessmentchartinFigure3.7,youcanquicklyuncover
somekeyobservations,suchasthefollowing:


Detailedpoint-of-saledataisimportanttoalloftheusecases.


InsightsfromtheStoreDemographicsdataareimportanttofourofthefive
usecases.


MiningConsumerCommentshasasurprisingstrongimpactacrossfourofthe
fiveusecases.


</div>
<span class='text_page_counter'>(82)</span><div class='page_container' data-page=82>

promotionaleffectiveness”usecasesbuthaslittleimpactonthe“increase
shoppingbagrevenue,”“increasenumberofcorporateevents,”or“improve
newproductintroductioneffectiveness”usecases.


Next,youwanttounderstandtheimplementationfeasibilityforeachofthe
potentialdatasources.ThispartoftheexerciseisprimarilydrivenbytheIT
organizationsinceitisbestpositionedtounderstandtheimplementation


challengesandrisksassociatedwitheachofthedatasources,suchaseaseofdata
acquisition,cleanlinessofthedata,dataaccuracy,datagranularity,costof


acquiringthedata,organizationalskillsets,toolproficiencies,andotherrisk
<i><b>factors.TheimplementationfeasibilityassessmentchartforChipotle's</b></i>
“increasesamestoresales”businessinitiativelookslikeFigure3.8.


<b>Figure3.8</b>ImplementationfeasibilityofpotentialChipotledatasources


FromtheChipotleimplementationfeasibilityassessmentchartinFigure3.8,we
canquicklymakethefollowingobservations:



PointofSales,MarketBaskets,andStoreManagerDemographicsdatais
readilyavailableandeasytointegrate(likelyduetothemasterdata


managementanddatagovernanceeffortsnecessarytoloadthisdataintoa
datawarehouse).


ConsumerCommentsdata,whichwasveryvaluableinthebusinessvalue
assessment,hasseveralimplementationrisks.Lackoforganizational


</div>
<span class='text_page_counter'>(83)</span><div class='page_container' data-page=83>

SocialMediadata,whichwasratedaboutmid-valueinthevalueassessment
exercise,alsolookstobearealchallenge.Manyofthesamecleanliness,
accuracy,andgranularityissuesexist,withtheaddedissuethatthisisdata
thatwillneedtobe“acquired”throughsomemeans.Probablynotthefirstdata
sourceyouwanttodealwithinthisusecase.


</div>
<span class='text_page_counter'>(84)</span><div class='page_container' data-page=84>

<b>IntroducingthePrioritizationMatrix</b>



Thefinalstepinthebigdatastrategydocumentprocessistotakethebusinessand
ITstakeholdersthroughausecaseprioritizationprocess.Whilewewillcoverthe
<i><b>prioritizationmatrixindetailinChapter13,Iwanttointroducetheconcept</b></i>
hereasthenaturalpointofconcludingthebigdatastrategydocumentprocess.
Aspartofthebigdatastrategydocument,wehavenowdonetheworktoidentify
theusecasesthatsupporttheorganization'skeybusinessinitiative,brainstormed
additionaldatasources,anddeterminedtheapplicabilityofthosedatasources
fromabusinessvalueandimplementationfeasibilityassessment.Wearenow
readytoprioritizetheusecasesbasedontheirrelativebusinessvalueand
implementationfeasibilityoverthenext9to12months(seeFigure3.9).


<b>Figure3.9</b>Chipotleprioritizationofusecases



<b>WARNING</b>



</div>
<span class='text_page_counter'>(85)</span><div class='page_container' data-page=85></div>
<span class='text_page_counter'>(86)</span><div class='page_container' data-page=86></div>
<span class='text_page_counter'>(87)</span><div class='page_container' data-page=87></div>
<span class='text_page_counter'>(88)</span><div class='page_container' data-page=88>

Preservestartingpitchingeffectivenessthroughouttheregularseasonand
playoffsbyoptimizingpitchcounts,pitcherrotations,pitcherrests,etc.
Improvebattingandsluggingproficiencybyoptimizingtrades,freeagent
signings,minorleaguepromotions,andcontractextensions


Increasein-game“smallball”runsscoredeffectivenessthroughtheoptimal
combinationofbatters,hitting,stealing,baserunning,andsacrificehitting
strategies


Accelerateminorleagueplayerdevelopmentthroughplayerstrengthand
conditioningtraining,gamesituations,andminorleagueassignments


Optimizein-gamepitchselectiondecisionsthroughimprovedunderstanding
ofbatterandpitchermatchups


Figure3.10showstheresultingbigdatastrategydocument.


<b>Figure3.10</b>SanFranciscoGiantsbigdatastrategydocument


Next,wewouldbrainstormthepotentialdatasourcestosupporttheusecases,
including:


<b>PersonnelPlayerHealth.Thisshouldincludepersonalhealthhistory</b>


(weight,health,BMI,injuries,therapy,medications),physicalperformance
metrics(60-footdashtime,longtossdistances,fastballvelocity),andworkout
history(benchpress,deadlift,crunchesandpushupsin60seconds,frequency


andrecencyofworkouts).


<b>StartingPitcherPerformance.Thisshouldincludeadetailedpitching</b>


</div>
<span class='text_page_counter'>(89)</span><div class='page_container' data-page=89></div>
<span class='text_page_counter'>(90)</span><div class='page_container' data-page=90>

<b>Summary</b>



Thischapterfocusedonthebigdatastrategydocumentandkeyrelatedtopics
including:


Introducedtheconceptofabusinessinitiativeandprovidedsomeexamplesof
wheretofindthesebusinessinitiatives


Introducedthebigdatastrategydocumentasaframeworkforhelping


organizationstoidentifytheusecasesthatguidewhereandhowtheycanstart
theirbigdatajourneys


Providedahands-onexampleofthebigdatastrategydocumentinactionusing
Chipotle,achainoforganicMexicanfoodrestaurants


Introducedworksheetstohelporganizationstodeterminethebusinessvalue
andimplementationfeasibilityofthedatasourcesthatcomeoutofthebig
datastrategydocumentprocess


IntroducedtheprioritizationmatrixasatooltohelpdrivebusinessandIT
alignmentaroundthetoppriorityusecasesovera9-to12-monthwindow
Hadsomefunbyapplyingthebigdatastrategydocumenttotheworldof
professionalbaseball


Thischapteroutlinedthebigdatastrategydocumentasaframeworktohelpan


organizationidentifywhereandhowtostartitsbigdatajourneyinsupportofthe
organization's9-to12-monthkeybusinessinitiatives.Thebigdatastrategy


documentisatooltoensurethatyourbigdatajourneyisvaluableandrelevant
fromabusinessperspective.


ToswingbackaroundtotheChipotlecasestudy,Figure3.11showssomeinitial
resultsofthecompany'ssuccesswithits“increasesamestorerevenues”business
initiative.(Formoreinformation,seethearticleat


</div>
<span class='text_page_counter'>(91)</span><div class='page_container' data-page=91>

<b>Figure3.11</b>Chipotle'ssamestoresalesresults


It'snicetoseethatourChipotleusecaseactuallyhasarealbusinessstorybehind
it.Butthenagain,everybigdatainitiativeshouldhavearealbusinessstory


</div>
<span class='text_page_counter'>(92)</span><div class='page_container' data-page=92>

<b>HomeworkAssignment</b>



Usethefollowingexercisestoapplythebigdatastrategydocumenttoyour
organization(oroneofyourfavoriteorganizations).


<b>Exercise#1:Startbyidentifyingyourorganization'skeybusinessinitiatives</b>


overthenext9to12months.


<b>Exercise#2:Selectoneofyourbusinessinitiatives,andthenbrainstormthe</b>


keybusinessentitiesorstrategicnounsthatimpactthatselectedbusiness
initiative.Asareminder,itisaroundtheindividualbusinessentitiesthatwe
wanttocapturethebehaviors,tendencies,patterns,trends,preferences,etc.at
theindividualbusinessentitylevel.



<b>Exercise#3:Next,brainstormthekeydecisionsthatneedtobemadeabout</b>


eachkeybusinessentitywithrespecttothetargetedbusinessinitiative.


<b>Exercise#4:Nextwewanttogroupthedecisionsintocommonusecases;</b>


thatis,clusterthosedecisionsthatseemsimilarintheirbusinessorfinancial
objectives.


<b>Exercise#5:Thenbrainstormthedifferentdatasourcesthatyoumightneed</b>


tosupportthoseusecases:


Identifypotentialinternalstructured(transactionaldatasources,


operationaldatasources)andunstructured(consumercomments,notes,
workorders,purchaserequests)datasources


Identifypotentialexternaldatasources(socialmedia,blogs,publicly
available,data.gov,websites,mobileapps)thatyoualsomightwantto
consider


<b>Exercise#6:Usethedataassessmentworksheetstodeterminetherelative</b>


businessvalueandimplementationfeasibilityofeachoftheidentifieddata
sourceswithrespecttothedifferentusecases.


<b>Exercise#7:Finally,usetheprioritizationmatrixtorankeachoftheuse</b>



</div>
<span class='text_page_counter'>(93)</span><div class='page_container' data-page=93>

<b>Notes</b>



</div>
<span class='text_page_counter'>(94)</span><div class='page_container' data-page=94></div>
<span class='text_page_counter'>(95)</span><div class='page_container' data-page=95>

<b>Chapter4</b>



<b>TheImportanceoftheUserExperience</b>



Theuserexperienceisoneofthesecretstobigdatasuccess,andoneofmy
favoritetopics.Iforganizationscannotdeliverinsightstoitsemployees,


managers,partners,andcustomersinawaythatisactionable,thenwhyeven
<i>bother.OneofthekeystosuccessintheBigDataMBAisto“beginwithanendin</i>
mind”withrespecttounderstandinghowtheanalyticresultsaregoingtobe


deliveredtofrontlineemployees,businessmanagers,channelpartners,and
<i>customersinawaythatisactionable.TheBigDataMBAseeksto“closethe</i>


</div>
<span class='text_page_counter'>(96)</span><div class='page_container' data-page=96></div>
<span class='text_page_counter'>(97)</span><div class='page_container' data-page=97>

<b>Chapter4Objectives</b>



Reviewanexampleofan“unintelligent”userexperience.


Highlighttheimportanceof“thinkingdifferently”withrespecttocreating
anactionabledashboardversusbuildingatraditionalBusinessIntelligence
dashboard.


Reviewasampleactionabledashboardtargetingfrontlinestoremanagers.
Reviewanothersampleactionabledashboard(financialadvisor


dashboard)​targetingbusiness-to-businesschannelpartners.


ThischapterwillchallengethetraditionalBusinessIntelligenceapproachesto


buildingdashboardsbyseekingtoleverageanalyticinsights(e.g.,


</div>
<span class='text_page_counter'>(98)</span><div class='page_container' data-page=98>

<b>TheUnintelligentUserExperience</b>



OneofmyfavoritesubjectsagainstwhichIlovetorailisthe“unintelligent”user
experience.Thisisaproblemcausedby,inmyhumbleopinion,thelackofeffort
byorganizationstounderstandtheirkeybusinessstakeholderswellenoughtobe
abletodeliveractionableinsightsinsupportoftheorganizations'keybusiness
initiatives.Andthisuserexperienceproblemisoftenonlyexacerbatedbybigdata.
Hereisareal-worldexampleofhowNOTtoleverageactionableanalyticsinyour
organization'sengagementswithyourcustomers.Thenameshavebeenchanged
toprotecttheguilty.


MydaughterAmeliagotthee-mail(seeFigure4.1)fromourcellphoneprovider
warningherthatshewasabouttoexceedhermonthlydatausagelimitof2GB.
Shewasveryupsetthatshewasabouttogooverherlimit,anditwouldstart
costingher(actually,me)anadditional$10.00perGBoverthelimit.(Note:The
“Monday,August13,2012”dateinthefigurewillplayanimportantroleinthis
story.)


<b>Figure4.1</b>Originalsubscribere-mail


IaskedAmeliawhatinformationshethoughtsheneededinordertomakea
<i><b>decisionaboutalteringherFacebook,Pandora,Vine,Snapchat,andInstagram</b></i>
usage(sincethosearethemaindatahogculpritsinhercase)sothatshewould
notexceedherdataplanlimits.Shethoughtforawhileandthensaidthatshe
thoughtsheneededthefollowinginformation:


Howmuchofherdataplandoesshehaveleftinthecurrentmonth?
Whendoeshernewmonthorbillingperiodstart?



</div>
<span class='text_page_counter'>(99)</span><div class='page_container' data-page=99></div>
<span class='text_page_counter'>(100)</span><div class='page_container' data-page=100>

experiencethatorganizationsshouldbetargeting.


Ourcellularprovidercouldhaveprovidedauserexperiencethathighlightedthe
informationandinsightsnecessarytohelpAmeliamakeadecisionaboutdata
usage.Theuserexperiencecouldhavelookedsomethinglikethee-mailmessage
showninFigure4.2.


<b>Figure4.2</b>Improvedsubscribere-mail


Thissamplee-mailhasalltheinformationthatAmelianeedstomakeadecision
aboutusagebehaviorsincluding:


Actualusagetodate(65percent)


Aforecastofusagebytheendoftheperiod(67percent)


Thedatewhenthedataplanwillreset(in1dayonAugust14)


</div>
<span class='text_page_counter'>(101)</span><div class='page_container' data-page=101>

<b>ConsumerCaseStudy:ImproveCustomer</b>


<b>Engagement</b>



Butlet'stakethiscasestudyonestepfurther.Let'ssaythatthereactuallywas
goingtobeaproblemwithAmelia'susageandherdataplan.Whatif82percentof
datausagehadbeenconsumedwith50percentofusageperiodremaining?How
dowemaketheuserexperienceandthecustomerengagementuseful,relevant,
andactionable?


Themock-upshowninFigure4.3offersonepotentialapproachbasedonthe
sameprinciplesdiscussedearlier:provideenoughinformationtohelpAmelia


changeherusagebehaviors.However,FutureTelcocouldalsotaketheuser
experienceandcustomerengagementonestepfurtherandofferhersome
recommendationstoavoidthedataplanoverage.


Forexample,FutureTelcocouldofferprescriptiveadviceabouthowtoreduce
dataconsumptionsuchas:


Transitioningtoappsthataremoredatausageefficient(i.e.,transitioning
fromPandoratoRdiooriHeartRadioforstreamingradio,assumingthatRdio
andiHeartRadioaremoreefficientintheirusageofthedatabandwidth)


Turningoffappsinthebackgroundthatareunnecessarilyconsumingdata
suchasmappingapps(likeAppleMaporWaze)orappsthatareusingGPS
tracking


FutureTelcocouldevenofferAmeliaoptionstoavoidpayinganoveragepenalty
(seeFigure4.3)suchas:


Purchasea1-monthdatausageupgradefor$2.00(whichischeaperthanthe
$10overagepenalty)


</div>
<span class='text_page_counter'>(102)</span><div class='page_container' data-page=102>

<b>Figure4.3</b>Actionablesubscribere-mail


Butwait,thereisevenmorethatFutureTelcocoulddotoimprovethecustomer
experience.FutureTelcocouldanalyzeAmelia'sappusagetendenciesand


recommendnewappsbasedonotherappsthatuserslikeAmeliause,similarto
whatAmazonandNetflixdo(seeFigure4.4).


<b>Figure4.4</b>Apprecommendations



Thislevelofcustomerintimacycanopenupallsortsofnewmonetization
opportunitiessuchas:


</div>
<span class='text_page_counter'>(103)</span><div class='page_container' data-page=103>

Helpappdeveloperstobemoresuccessfulwhilecollectingreferralfees,co-marketingfees,andothermonetizationideasthatalignwiththeapp
developers'businessobjectives


Cellularprovidersarenotaloneinmissingopportunitiestoleveragecustomer
insightsinordertoprovideamorerelevant,moremeaningfulcustomer


experience.Manyorganizationsaresittingongoldminesofinsightsabouttheir
customers'buyingandusagepatterns,tendencies,propensities,andareasof


</div>
<span class='text_page_counter'>(104)</span><div class='page_container' data-page=104>

<b>BusinessCaseStudy:EnableFrontlineEmployees</b>


Ihadtheopportunitytorunavisionworkshopforagroceryretailer.Thegoalof
thesessionwastoidentifyhowthegrocerychaincouldleveragebigdataand
advancedanalyticstodeliveractionableinsights(orrecommendations)tostore
managersinordertohelpthemimprovestoreperformance.


Bigdatacantransformthebusinessbyenablingacompletelynewuserexperience
(UEX)builtaroundinsightandrecommendationsversusjusttraditionalBusiness
Intelligencechartsandtables.Retailers,likemostorganizations,canleverage
detailed,historicaltransactionaldata—coupledwithnewsourcesof“right-time”
datalikelocalcompetitors'promotions(e.g.,“bestfooddays,”whichistheday
whengrocerystoresposttheirweeklypromotions),weather,andevents—to


uncovernewinsightabouttheircustomers,products,merchandising,competitors
andoperations.Bigdataprovidesorganizationstheabilityto(1)rapidlyingest
thesenewsourcesofcustomer,product,andoperationaldataandthen(2)
leveragedatasciencetoyieldreal-time,actionableinsights.



Let'swalkthroughanexampleofintegratingbigdatawithatraditionalBI
dashboardtocreateamoreactionableuserexperiencethatempowersfrontline
employeesandmanagers.


<b>StoreManagerDashboard</b>



</div>
<span class='text_page_counter'>(105)</span><div class='page_container' data-page=105>

<b>Figure4.5</b>TraditionalBusinessIntelligencedashboard


ThechallengewiththesetraditionalBIdashboardsisthatunlessyouarean
analyst,it'snotclearwhatactiontheuserissupposedtotake.Arrowsup,
sideways,anddown…Icanseemyperformance,butthedashboarddoesn't
provideanyinsightstotellthestoremanagerwhatactionstotake.


Theotherchallengeisthatthestoremanager(likemostfrontlineemployeesand
managers)likelydoesnothaveaBIorananalyticsbackground(likelyworkedhis
wayuptheranksinthegrocerystore).Asaresult,UEXandtheactionable


insightsandrecommendationsarecriticalbecausethestoremanagerdoesnot
knowhowtodrillintotheBIreportsanddashboardstouncoverinsightsbasedon
therawdata.


</div>
<span class='text_page_counter'>(106)</span><div class='page_container' data-page=106>

<b>Figure4.6</b>Actionablestoremanagerdashboard


InFigure4.6,SectionAshowsspecificproduct,promotion,placement,and
pricingrecommendationsbasedonthelayoutofaspecificstore.SectionB


providesspecificrecommendationsconcerningpricing,merchandising,inventory,
staffing,promotions,etc.forthestoremanager.



EachrecommendationinSectionBispresentedwithAccept[+]orReject[-]
options.Ifthestoremanageracceptstherecommendationbyselecting[+],that
recommendationisexecuted(e.g.,raiseprices,addpromotion,addinventory,
etc.).However,ifthestoremanagerrejectstherecommendation,thenthe


actionabledashboardcapturesthereasonfortherejectionsothatthesupporting
analyticmodelscanbeconstantlyfine-tuned(seeFigure4.7).


Finally,thestoremanagercanselecttheMoreoptioninSectionBandmodifythe
recommendationbasedonhisownexperience.Allowingthestoremanagerto
modifytherecommendationsbasedonhispersonalexperiencesallowsthe


underlyinganalyticmodelstoconstantlylearnwhatworksandwhatdoesn'twork
andbuildonthebestpracticesandlearningsfromtheorganization'smost


</div>
<span class='text_page_counter'>(107)</span><div class='page_container' data-page=107>

<b>Figure4.7</b>Storemanageraccept/rejectrecommendations


<b>SampleUseCase:CompetitiveAnalysis</b>



Oneusecaseforthestoremanagerdashboardenablesthestoremanagerto
monitorlocalcompetitiveactivityandpromotions.Thegroceryindustryisvery
locallycompetitive.Competitors,forthemostpart,arewithinjustafewmilesor
evenblocksofeachother.Inthiscompetitiveanalysisusecase,thedashboard
providesamapofthelocalgroceryandbeveragecompetitors(seeSectionCof
Figure4.7).Hoveringoveranyparticularcompetitoronthemapimmediately
bringsupitscurrentmarketingflyer.Thestoremanager(orhisbusinessanalyst)
canbrowsethrougheachofthecompetitors'flyersandmakecustomstore


</div>
<span class='text_page_counter'>(108)</span><div class='page_container' data-page=108>

<b>Figure4.8</b>Competitiveanalysisusecase



Liketheotherrecommendations,thestoremanager'scustomrecommendations
willbemonitoredforeffectivenesssothattheanalyticmodelscanbeconstantly
updatedandrefined.


<b>AdditionalUseCases</b>



</div>
<span class='text_page_counter'>(109)</span><div class='page_container' data-page=109>

<b>Figure4.9</b>Localeventsusecase


Anotherusecaseistointegratethelocalweatherforecastintothestoremanager
dashboard.Thestoremanagercananalyzethelocalweatherforecastsandmake
adjustmentsforinventory,merchandising,andpromotionsbasedonwhetherthe
weatherwillbewarmerorcolderthanexpected(seeFigure4.10).Thedashboard
canautomaticallyanalyzesimilarweatherconditionsandpredicttheimpacton
storetrafficandproductcategorysalesanddeliverrelevantrecommendationsto
thestoremanager.


<b>Figure4.10</b>Localweatherusecase


</div>
<span class='text_page_counter'>(110)</span><div class='page_container' data-page=110></div>
<span class='text_page_counter'>(111)</span><div class='page_container' data-page=111></div>
<span class='text_page_counter'>(112)</span><div class='page_container' data-page=112>

Adjustinvestmentstrategies(short-term,long-term)
Reallocatefinancialportfolio


Changeinvestmentvehicles(stocks,bonds,mutualfunds,etc.)


<b>Figure4.11</b>Financialadvisordashboard


Thegoalofthefinancialadvisordashboardistouncoverinsightsabouttheclient's
investmentperformanceandprovideclient-specificrecommendationsthathelp
theseclientsreachtheirfinancialgoals.Togenerateactionable,accurate


recommendations,we'regoingtoneedtoknowasmuchabouttheclientas


possible,including:


Currentandhistoricalpersonalbackgroundinformation(e.g.,maritalstatus,
spouse'sfinancialandemploymentsituation,numberandageofchildren,
outstandingmortgageonhome(s)andanysecondaryrealestateinvestments)
Currentfinancialinvestmentsandotherassets(e.g.,stocks,bonds,mutual
funds,IRAs,401-Ks,REITs)


Currentandhistoricalincome(andexpenditures,ifpossible)
Financialgoalswithspecifictimelines


Weneedtoensurethatthefinancialadvisordashboardprovidesenoughvalueto
boththefinancialadvisorandtheadvisor'sclientsinordertoincenttheclientsto
shareasmuchofthisdataaspossible.


<b>InformationalSectionsofFinancialAdvisorDashboard</b>



</div>
<span class='text_page_counter'>(113)</span><div class='page_container' data-page=113>

<b>ClientPersonalInformation:Thefirstpartofthedashboardpresents</b>


relevantclientpersonalandfinancialinformation.FSIwantstogatherasmuch
personalinformationasisrelevantwhentheclientfirstopenshisaccounts.But
aftertheclientopenshisaccount,thereneedstobeaconcertedefforttokeepthe
dataupdatedandcapturenewlifestyle,lifestage,employment,andfamily


information.Muchofthatclientdatacanbecapturedviadiscussionsand


interactionsthatthefinancialadvisorishavingwiththeclient(e.g.,informational
calls,e-maildialogues,officevisits,annualreviews).Whilethisinformationis
goldtoFSI,muchofthisdatanevergetspastthefinancialadvisors'personal
contactmanagementande-mailsystems.FSImustprovidecompellingreasonsto


persuadethefinancialadvisorsandclientstosharemoreofthisdatawithFSI(see
Figure4.12).


Someleading-edgeorganizationsareprovidingincentives(e.g.,discounts,


promotions,contests,rewards)forclientstosharetheirsocialmediainteractions.
Obviously,accesstotheclient'scurrentsituationandplansaspostedonsocial
mediasitesisgoldwhenitcanbeminedtouncoveractivitiesthatmightaffecthis
financialneeds(e.g.,vacations,buyinganewcar,upcomingweddingplans,


promotions,jobchanges,childrenchangingschools).


<b>ClientFinancialStatus:Thenextsectionofthedashboardprovidesan</b>


overviewoftheclient'scurrentfinancialstatus.Again,themoredatathatcanbe
gatheredabouttheclient'sfinancialsituation(e.g.,investments,home,spending,
debt),themoreaccurateandprescriptivetheanalyticmodelswillbe(seeFigure
4.13).


</div>
<span class='text_page_counter'>(114)</span><div class='page_container' data-page=114>

<b>Figure4.13</b>Clientfinancialinformation


Inthisexample,wehavedetailsonalltheclient'sfinancialinvestmentswithFSI.
However,theclientmight(andlikelydoes)havefinancialinvestmentswithother
firmscourtesyofhisemployer's401kprograms,wholelifeinsurancepolicies,and
otherstocks,bonds,andfunds.Andthatdoesn'tevenconsidersubstantial


investmentsinnonfinancialinstrumentslikehisprimaryresidence,vacation
home,antiques,andcollectibles.


Incentingclientstosharetheirentirefinancialportfolioiscomplicatedbyhow


harditisforaclienttopullallthatinformationtogetherinoneplace.However,
Mint.comhasfiguredouthowtoaggregatefinancialspendingfromcreditcards
andbankchecks.Theinclusionoftheclient'sexpendituredatacouldbeinvaluable
inbuildingaclientprofileanddevelopingspecific,actionablefinancial


recommendations.


<b>ClientFinancialGoals:Thefinalinformationalsectionofthedashboard</b>


containstheclient'sfinancialgoals.Therearelikelyonlyasmallnumberofgoals,
andtheyprobablydon'tchangethatoften.However,itisdifficulttodevelop
meaningfulclientfinancialrecommendationswithoutup-to-dateclientfinancial
goals.Fromadatacollectionperspective,thisisprobablytheeasiestdatato


</div>
<span class='text_page_counter'>(115)</span><div class='page_container' data-page=115>

<b>Figure4.14</b>Clientfinancialgoals


However,let'ssaythattheclienteitherwon'tsharehisfinancialgoalsorhasn't
eventhoughtthroughwhathisfinancialgoalsneedtobe.Thisiscommonwhen
dealingwithretirementplanning,sincemanyclientsaren'tclearorrealisticabout
theirretirementgoals.Inthesesituations,FSIcouldleveragetheinformationthat
ithasabout“similar”clientstomakeretirementgoalrecommendations.IfFSIhas
theclient'scurrentfinancialinvestmentsandcurrentsalary,FSIcouldmakea
prettyintelligentguessastotheclient'sretirementgoals.


<b>RecommendationsSectionofFinancialAdvisorDashboard</b>



Nowlet'sgetintothemeatofthefinancialadvisordashboard.Theclient


informationsectionsofthedashboardweremeanttoprovideaneasyandefficient
waytocapturetheclient'skeylifestyle,demographic,andfinancialdata,aswellas


hisfinancialgoals.Nowwecancreatepredictivemodelstopredictthelikely


resultsofdifferentfinancialoptionsandactions,andthencreateprescriptive
modelsinordertodeliverclient-specificrecommendationsthathelptheclientto
reachhisfinancialgoals.Thisfinancialadvisordashboardcoversfourdifferent
areasfordeliveringclient-specificfinancialrecommendations:


Financialcontributions
Spendinganalysis


Assetallocation


Otherfinancialinvestments


<i><b>FinancialContributionsRecommendations</b></i>


Thefirstsetofrecommendationsisfocusedonhelpingtheclientoptimize


financialcontributions(seeFigure4.15).Thetypesofclientdecisionsthatcould
bemodeledinclude:


</div>
<span class='text_page_counter'>(116)</span><div class='page_container' data-page=116>

Onetimepaymentstojump-startlaggingfinancialgoals


Reallocatemonthlyorperiodicpaymentsagainstdifferentfinancialgoals
Changeretirement,newcar,andnewhometargetdates


<b>Figure4.15</b>Financialcontributionsrecommendations


Wecouldemploydatasciencetoanalyzetheclient'sdetailedfinancialdata,


comparethatdatawithbenchmarksacrosssimilarclientsanddevelopclient-specificanalyticprofiles.Thefinancialadvisordashboardcouldprovidea“what
if”capabilitythatallowsthefinancialadvisortoworkwiththeclienttotestout
differentscenarios(e.g.,changestoinvestmentamounts,changestofinancialgoal
targetdates).


<i><b>SpendingAnalysisRecommendations</b></i>


Thesecondsetofrecommendationsisfocusedonhelpingtheclientoptimize
spendinghabits.Thisiswhereaccesstotheclient'screditcardandbanking
statements(maybeviaMint.comand/orhischeckingaccounts)couldyield
valuableinsightstohelptheclientminimizecashoutflowandincreasefinancial
investments(seeFigure4.16).Thetypesofspendingdecisionsthatwouldneedto
bemodeledinclude:


Consolidatingexpendituresofsimilarproductsandservices


Flaggingexpendituresthatareabnormallyhighgiventheclient'sfamily
situation,homelocation,etc.


Integratingcustomerloyaltyprograminformationtofindretailerswhocan
providebestpricesonfoodandhouseholdstaples


Increasinginsurancedeductiblestolowerpremiums


</div>
<span class='text_page_counter'>(117)</span><div class='page_container' data-page=117>

<b>Figure4.16</b>Spendanalysisandrecommendations


Therearelotsofopportunitiestoleverageexternaldatasourcesandbestpractices
acrosstheFSIclientbasetofindbetterdealsinanattempttoreducetheclient's
discretionaryspending.Thereareseveralretail,insurance,travel,hospitality,
entertainment,cellphone,andotherwebsitesfromwhichdatacouldbegathered.


Thisdatacouldbeusedtocreaterecommendationstoreducetheclient'sspending
andoptimizetheclient'smonthlybudget,withthesavingsbeingusedtoincrease
financialcontributionsagainsttheclient'sfinancialgoals.


<i><b>AssetAllocationRecommendations</b></i>


Thethirdsetofrecommendationsisfocusedonhelpingclientsoptimizetheir
assetallocationinlightoftheirfinancialgoals.Byleveragingbestpracticesacross
otherclients,portfolios,andinvestmentinstruments,prescriptiveanalyticscanbe
developedtomakespecificassetallocationrecommendationsthatsupportasset
allocationdecisionssuchas(seeFigure4.17):


Whichstocksandbondstosellorbuyagainstspecificfinancialgoalportfolios
Portfolioallocationdecisionsthatproperlybalancetherisk-returnratioofthe
client'sportfolioinlightofrisktoleranceandfinancialgoals


Otherfinancialinstrumentsthatcanacceleratetheclient'sprogressagainst
financialgoalsorreduceriskforthoseshort-termfinancialgoals


<b>Figure4.17</b>Assetallocationrecommendations


</div>
<span class='text_page_counter'>(118)</span><div class='page_container' data-page=118>

client'sdesiredrisklevel.Tofurtherprotecttheclient'sinvestmentassets,an
aggregatedviewofthemarketplacecouldyieldmoretimelyinsightsintostocks
andbondsthataresuddenlyhotorcold.Thisisalsoanareawherereal-time


analyticscanbeleveragedtoensurethatnosuddenmarketmovementsexposethe
clienttounnecessaryassetallocationrisks.Thedashboardcouldalsosupportan
interactive“whatif”collaborationdirectlywiththeclienttogleanevenmoredata
andinsightsabouttheclient'sinvestmentpreferencesandtoleranceforrisk.



<i><b>OtherInvestmentRecommendations</b></i>


Thefourthsetofrecommendationsisfocusedonotherassetsthatclientsneedto
consideraspartoftheiroverallfinancialstrategy.Realestate(theclient'shome
andanyvacationhomes)isprobablythemostobvious.Thisisanareawhere
recommendationsaboutotherinvestmentoptionscanbedeliveredtohelp
supportclientdecisionsregarding(seeFigure4.18):


Identifyingtheidealamountofinsuranceneededgivenhomevaluation
changes


HomeimprovementprojectsthatyieldthebestROIforparticularhousetypes,
budgets,andlocationsovertime


Identifyingtherighttimetobuyorsellahome,andevenmaking
recommendationsastowhatpricetobidforhomesinselectareas
Bestareastolookforsecondaryand/orvacationhomeinvestments
Mostcost-effectivelocationstoliveinafterretirement


<b>Figure4.18</b>Otherinvestmentrecommendations


Thereisabevyofexternaldatasourcesthatcanbeleveragedtohelpfacilitate
analyticsinthisarea.Forexample,ZillowandRealtor.comproviderealestate
valuationsandmonthlychangesinrealestatevaluationsthatcouldbe


</div>
<span class='text_page_counter'>(119)</span><div class='page_container' data-page=119>

<b>Summary</b>



Bigdatacanpoweramorerelevantandmoreactionableuserexperience.Instead
ofoverwhelmingbusinessuserswithanendlessarrayofcharts,reports,and
dashboardsandforcingusersto“sliceanddice”theirwaytoinsights,wecan


insteadleveragethewealthofavailablestructuredandunstructureddatasources,
inreal-time,coupledwithdatasciencetouncovercustomer,product,and


operationalinsightsburiedinthedata.Wecanleveragethoseinsightstocreate
frontlineemployee,manager,andcustomerrecommendationsandthenmeasure
theeffectivenessofthoserecommendationssothatwearecontinuouslyrefining
ouranalyticmodels.


</div>
<span class='text_page_counter'>(120)</span><div class='page_container' data-page=120>

<b>HomeworkAssignment</b>



Usethefollowingexercisetoapplywhatyoulearnedinthischapter.


<b>Exercise#1:Selectoneofyourorganization'soutward-facingdashboards,</b>


websites,ormobileapps.Ifnotsomethingfromyourorganization,thenselect
awebsiteordashboardthatyouuseregularly.Thatmightincludesomething
fromyourbank,creditcardprovider,cellularprovider,orutilitycompany.
Grabafewscreencapturesofthedashboardorwebsite.


<b>Exercise#2:Thinkthroughhowyouastheuserusethisdashboard,website,</b>


ormobiletomakedecisions.Writedownthosedecisionsthatyoutrytomake
fromthewebsite.Forexample,fromyourutility,youmightwanttomake
decisionsaboutenergyandwaterconsumption,yourwaste/garbageplan,and
maybeevenwhichofthedifferentappliancerebatesyoumightwantto


consider.


<b>Exercise#3:Next,addarecommendationspanelthathassuggestionsfor</b>



eachofthedecisionsthatyoucapturedinStep2.Forourutilityexample,one
recommendationmightbe“Onlywater3daysaweekfrom6:00a.m.to7:00
a.m.tosaveapproximately$12.50permonthonyourmonthlywaterbill.”Or
anotherrecommendationmightbe“Replaceyourexistingdryerwithamore
efficientmodelliketheSamsungDV457tosave$21.75onyourmonthly
energybill.”


<b>Exercise#4:Finally,identifypotentialexternaldatasourcesthatmight</b>


</div>
<span class='text_page_counter'>(121)</span><div class='page_container' data-page=121></div>
<span class='text_page_counter'>(122)</span><div class='page_container' data-page=122>

<b>PartII</b>



<b>DataScience</b>



Thesethreechaptersintroducedatascienceasakeybusinessdisciplinethathelps
organizations“crosstheanalyticschasm”fromtheBusinessMonitoringto


BusinessInsightsandBusinessOptimizationphases.Thesechapterswill


introducetheconceptofdatascienceandthenbroadenthediscussiontocover
whatdatasciencetechniquestouseinwhichbusinessscenarios.


<b>InThisPart</b>



Chapter5:DifferencesBetweenBusinessIntelligenceandDataScience
Chapter6:DataScience101


</div>
<span class='text_page_counter'>(123)</span><div class='page_container' data-page=123></div>
<span class='text_page_counter'>(124)</span><div class='page_container' data-page=124>

<b>Chapter5</b>



<b>DifferencesBetweenBusinessIntelligenceandData</b>


<b>Science</b>




IwashiredbyalargeInternetportalcompanyin2007toheadupeffortsto
developitsadvertiseranalytics.Theobjectiveoftheadvertiseranalyticsproject
wastohelptheInternetportalcompany'sadvertisersandagenciesoptimizetheir
advertisingspendacrosstheInternetportal'sadnetwork.Theinternalcodename
fortheprojectwas“LookingGlass”becausewewantedtotaketheadvertisersand
agenciesthroughan“AliceinWonderland”typeofexperienceinhowwedelivered
actionableinsightstohelpourkeybusinessstakeholders—MediaPlanners&


BuyersandCampaignManagers—successfullyoptimizetheiradvertisingspendon
theInternetportal'sadnetwork.Butinmanyways,itwasmethatwentthrough
thelookingglass.


Severalmonthslater(August2008),IhadtheopportunitytokeynoteatTheData
WarehouseInstitute(TDWI)conferenceinSanDiego.ItaughtaclassatTDWIon
howtobuildanalyticapplications,soIwasbothfamiliarwithandabigfanofthe
TDWIconferences(andstillam).However,inmykeynote,Itoldtheaudiencethat
everythingthatIhadtaughtthemabouthowtobuildanalyticapplicationswas
wrong(seeFigure5.1).


<b>Figure5.1</b>SchmarzoTDWIkeynote,August2008


Likewithmyownpersonalexperience,manyorganizationsandindividualsare
confusedbythedifferencesintroducedbybigdata,especiallythedifferences
betweenBusinessIntelligence(BI)anddatascience.BigdataisnotbigBI.Big
dataisakeyenablerofanewdisciplinecalleddatasciencethatseekstoleverage
newsourcesofstructuredandunstructureddata,coupledwithpredictiveand
prescriptiveanalytics,touncovernewvariablesandmetricsthatarebetter


</div>
<span class='text_page_counter'>(125)</span><div class='page_container' data-page=125>

ThischapterdiscussesthedifferencesbetweenBIanddatascience:


Thequestionsaredifferent.


Theanalyticcharacteristicsaredifferent.


Theanalyticengagementprocessesaredifferent.
Thedatamodelsaredifferent.


Thebusinessviewisdifferent.


</div>
<span class='text_page_counter'>(126)</span><div class='page_container' data-page=126>

<b>WhatIsDataScience?</b>



Datascienceisacomplicatednewdisciplinethatrequiresadvancedskillsand
competenciesinareassuchasstatistics,computerscience,datamining,


mathematics,andcomputerprogramming.Ashadbeenstatedcountlesstimes,
datascientistsarethebusiness“rockstars”ofthe21stcentury.


Althoughwhatdatascientistsdocanbequitecomplex,whattheyaretryingto
achieveisnot.Infact,Ifindthattheverybestintroductorybooktodatascienceis
<i>Moneyball:TheArtofWinninganUnfairGamebyMichaelLewis(W.W.Norton</i>
&Company,2004).ThebookisabouttheOaklandA'sGeneralManagerBilly
Beane'suseofsabermetricstohelpthesmall-marketOaklandA'sprofessional
baseballteamoutperformcompetitorswithsignificantlylargerbankrolls.The
bookyieldsthemostaccuratedescriptionofdatascience:


<i>Datascienceisaboutfindingnewvariablesandmetricsthatarebetter</i>
<i>predictorsofperformance.</i>


That'sit—nothingmore—andyes,datascienceisthatsimple.Butthepowerof
thatsimplestatementisgamechanging,ascanbeseeninFigure5.2andthe


successthatBillyBeaneandtheOaklandA'shaveachievedbymakingplayer
acquisitionsandin-gamedecisionsbasedonadifferent,morepredictivesetof
metrics.


<b>Figure5.2</b>OaklandA'sversusNewYorkYankeescostperwin


Thebookalsohasanothervaluablelesson:goodideascanbecopied.So


</div>
<span class='text_page_counter'>(127)</span><div class='page_container' data-page=127>

predictive“on-basepercentage”metric.


<b>BIVersusDataScience:TheQuestionsAreDifferent</b>



WhenclientsaskmetoexplainthedifferencebetweenaBusinessIntelligence
analystandadatascientist,Istartbyexplainingthatthetwodisciplineshave
differentobjectivesandseektoanswerdifferenttypesofquestions(seeFigure
5.3).


<b>Figure5.3</b>BusinessIntelligenceversusdatascience


<b>BIQuestions</b>



BIfocusesondescriptiveanalytics:thatis,the“Whathappened?”typesof
questions.Examplesinclude:


HowmanywidgetsdidIselllastmonth?


WhatweresalesbyzipcodeforChristmaslastyear?
HowmanyunitsofProductXwerereturnedlastmonth?


Whatwerecompanyrevenuesandprofitsforthepastquarter?


HowmanyemployeesdidIhirelastyear?


BIfocusesonreportingonthecurrentstateofthebusiness,orasisnow
commonlycalledBusinessPerformanceManagement(BPM).BIprovides
retrospectivereportstohelpbusinessuserstomonitorthecurrentstateofthe
businessandanswerquestionsabouthistoricalbusinessperformance.These
reportsandquestionsarecriticaltothebusiness,sometimesrequiredfor
regulatoryandcompliancereasons.


</div>
<span class='text_page_counter'>(128)</span><div class='page_container' data-page=128>

ofunder-andover-performance.Buteventheseanalyticsarefocusedon
monitoringwhathappenedtothebusiness.


<b>DataScienceQuestions</b>



Ontheotherhand,datascientistsareinsearchofvariablesandmetricsthatare
betterpredictorsofbusinessperformance.Consequently,datascientistsfocuson
predictiveanalytics(“Whatislikelytohappen?”)andprescriptiveanalytics


(“WhatshouldIdo?”)typesofquestions.Forexample:
PredictiveQuestions(Whatislikelytohappen?)


HowmanywidgetswillIsellnextmonth?


WhatwillsalesbyzipcodebeoverthisChristmasseason?
HowmanyunitsofProductXwillbereturnednextmonth?


Whatareprojectedcompanyrevenuesandprofitsfornextquarter?
HowmanyemployeeswillIneedtohirenextyear?


PrescriptiveQuestions(WhatshouldIdo?)



Order[5,000]ComponentZtosupportwidgetsalesfornextmonth.
Hire[Y]newsalesrepsbythesezipcodestohandleprojectedChristmas
sales.


Setaside[$125K]infinancialreservetocoverProductXreturns.


Sellthefollowingproductmixtoachievequarterlyrevenueandmargin
goals.


Increasehiringpipelineby35percenttoachievehiringgoals.


Toanswerthesepredictiveandprescriptivequestions,datascientistsbuild


</div>
<span class='text_page_counter'>(129)</span><div class='page_container' data-page=129>

<b>TheAnalystCharacteristicsAreDifferent</b>



AnotherareaofdifferencebetweenBIanddatascienceisintheattitudinal
characteristicsandworkapproachofthepeoplewhofillthoseroles(seeTable
5.1).


<b>Table5.1</b>BIAnalystVersusDataScientistCharacteristics


<b>Area</b> <b>BIAnalyst</b> <b>DataScientist</b>


Focus Reports,KPIs,trends Patterns,correlations,models


Process Static,comparative Exploratory,experimentation,visual
Datasources Pre-planned,addedslowly Onthefly,asneeded


Transform Upfront,carefullyplanned In-database,ondemand,enrichment


Dataquality Singleversionoftruth “Goodenough,”probabilities


Datamodel Schemaonload Schemaonquery


Analysis Retrospective,descriptive Predictive,prescriptive


</div>
<span class='text_page_counter'>(130)</span><div class='page_container' data-page=130>

<b>Figure5.4</b>CRISP:CrossIndustryStandardProcessforDataMining


Datasciencetakesaverysimilarapproach:establishabusinesshypothesisor
question;exploredifferentcombinationsofdataandanalyticstobuild,test,and
refinetheanalyticmodel;andwash,rinse,andrepeatuntilthemodelprovesthat
itcanprovidetherequired“analyticlift”whilereachingasatisfactorygoodnessof
fit.Finallytheanalyticsaredeployedoroperationalizedincludingpossibly


</div>
<span class='text_page_counter'>(131)</span><div class='page_container' data-page=131>

<b>TheAnalyticApproachesAreDifferent</b>



Unfortunately,theseexplanationsareinsufficienttoanswersatisfactorilythe
questionofwhat'sdifferentbetweenBusinessIntelligenceanddatascience.So
let'sexaminecloselythedifferentengagementapproaches(includinggoals,tools,
andtechniques)thattheBIanalystandthedatascientistusetodotheirjobs.


<b>BusinessIntelligenceAnalystEngagementProcess</b>



TheBIanalystengagementprocessisadisciplinethathasbeendocumented,
taughtandrefinedoverthreedecadesofbuildingdatawarehousesandBI


environments.Figure5.5providesahigh-levelviewoftheprocessthatatypicalBI
analystuseswhenengagingwiththebusinessuserstobuildouttheBIand


supportingdatawarehouseenvironments.



<b>Figure5.5</b>BusinessIntelligenceengagementprocess


<b>Step1:Pre-buildDataModel.Theprocessstartsbybuildingthe</b>


foundationaldatamodel.Whetheryouuseadatawarehouseordatamartor
hub-and-spokeapproach,whetheryouuseastar,snowflake,normalizedor
dimensionalschema,theBIanalystmustgothroughaformalrequirements
gatheringprocesswiththebusinessuserstoidentifyall(oratleastthevast
majorityof)thequestionsthatthebusinessuserswanttoanswer.Inthis
requirementsgatheringprocess,theBIanalystmustidentifythefirst-and
second-levelquestionsthebusinessuserswanttoaddressinordertobuilda
robustandextensibledatamodel.Forexample:


</div>
<span class='text_page_counter'>(132)</span><div class='page_container' data-page=132></div>
<span class='text_page_counter'>(133)</span><div class='page_container' data-page=133>

SQLrequest.TheBIanalystscanalsospecifygraphicalrenderingoptions(bar
charts,linecharts,piecharts)untiltheygettheexactreportand/orgraphic
thattheywant(seeFigure5.6).


<b>Figure5.6</b>TypicalBItoolgraphicoptions


TheBItoolsareverypowerfulandrelativelyeasytouseifthedatamodelis


configuredproperly.Bytheway,thisisagoodexampleofthepowerofschemaon
load.Thistraditionalschemaonloadapproachremovesmuchoftheunderlying
datacomplexityfromthebusinessuserswhocanthenusetheBItoolsgraphical
userinterfacetomoreeasilyqueryandexplorethedata(thinkself-serviceBI).
Insummary,theBIapproachreliesonapre-builtdatamodel(schemaonload),
whichenablesuserstoquicklyandeasilyquerythedata—aslongasthedatathat
theywanttoqueryisalreadydefinedandloadedintothedatawarehouse.Ifthe
dataisnotinthedatawarehouse,thenaddingdatatoanexisting​warehousecan


takemonthstomakehappen.Notonlydoesmodifyingthedatawarehouseto
includeanewdatasourcerequireasignificantamountoftime,buttheprocess
canbeverycostly,asdataschemashavetobeupdatedtoincludethenewdata
source,newETLprocesseshavetobeconstructedtotransformandnormalizethe
datatofitintotheupdateddataschemas,andexistingreportsanddashboards
mayhavetobeupdatedtoincludethenewdata.


<b>TheDataScientistEngagementProcess</b>



Thedatascienceprocessissignificantlydifferent.Infact,thereisverylittlefrom
theBIanalystengagementprocessthatcanbereusedinthedatascience


</div>
<span class='text_page_counter'>(134)</span><div class='page_container' data-page=134>

<b>Figure5.7</b>Datascientistengagementprocess


<b>Step1:DefineHypothesistoTest.Step1ofthedatascienceengagement</b>


processstartswiththedatascientistsidentifyingthepredictiontheywantto
makeorhypothesisthattheywanttotest.Thisisaresultofcollaboratingwith
thebusinesssubjectmatterexperttounderstandthekeysourcesofbusiness
differentiation(e.g.,howtheorganizationdeliversvalue)andthenconstruct
theassociatedhypothesesorpredictions.


<b>Step2:GatherData…andMoreData.Instep2ofthedatascience</b>


engagementprocess,thedatascientistgathersrelevantorpotentially


interestingdatafromamultitudeofsources—bothinternalandexternaltothe
organization—andpushesthatdataintothedatalakeoranalyticsandbox.The
datalakeisagreatfoundationalcapabilityforthisprocess,asthedata



scientistscanacquireandingestanydatatheywant(as-is),testthedataforits
valuegiventhehypothesisorprediction,andthendecidewhethertoinclude
thatdataintheanalyticmodel.Thisiswhereanenvisioningexercisecanadd
considerablevalueinfacilitatingthecollaborationbetweenthebusinessusers
<i>andthedatascientiststoidentifydatasourcesthatmayhelpimprove</i>


predictiveresults.


<b>Step3:BuildDataModel.Step3iswherethedatascientistsdefineand</b>


</div>
<span class='text_page_counter'>(135)</span><div class='page_container' data-page=135></div>
<span class='text_page_counter'>(136)</span><div class='page_container' data-page=136></div>
<span class='text_page_counter'>(137)</span><div class='page_container' data-page=137>

<b>TheDataModelsAreDifferent</b>



Thedatamodelsthatareusedinthedatawarehousetosupportanorganization's
BIeffortsaresignificantlydifferentfromthedatamodelsthedatascientistsprefer
touse.


<b>DataModelingforBI</b>



TheworldofBI(akaquery,reporting,dashboards)requiresadatamodeling
techniquethatallowsbusinessuserstocreatetheirownreportingandqueries.To
supportthisneed,RalphKimballpioneereddimensionalmodeling—orstar


schemas—whileatMetaphorComputersbackinthe1980s(seeFigure5.9).


<b>Figure5.9</b>Dimensionalmodel(starschema)


Thedimensionalmodelwasdesignedtoaccommodatetheanalysisneedsofthe
businessusers,withtwoimportantdesignconcepts:


<b>Facttables(populatedwithmetricsormeasures)correspondtotransactional</b>



</div>
<span class='text_page_counter'>(138)</span><div class='page_container' data-page=138>

<b>Dimensiontables(populatedwithattributesaboutthatdimension)</b>


representthe“nouns”ofthatparticulartransactionalsystemsuchasproducts,
markets,stores,employees,customers,anddifferentvariationsoftime.


Dimensionsaregroupsofhierarchiesanddescriptorsthatdescribethefacts.It
isthesedimensionalattributesthatenableanalyticexploration,attributessuch
assize,weight,location(street,city,state,zip),age,gender,tenure,etc.


Dimensionalmodelingisidealforbusinessusersbecauseitsupportstheirnatural
question-and-answerexplorationprocesses.DimensionalmodelingsupportsBI
conceptssuchasdrillacross(navigatingacrossdimensions)anddrillup/drill
down(navigatingupanddownthedimensionalhierarchiessuchastheproduct
dimensionhierarchyofproduct⇨brand⇨category).


Today,allBItoolsusedimensionalmodelingasthestandardwayforinteracting
withtheunderlyingdatawarehouse.


<b>DataModelingforDataScience</b>



<i>Intheworldofdatascience,Hadoopprovidesanopportunitytothinkdifferently</i>
abouthowwedodatamodeling.HadoopwasoriginallydesignedbyYahootodeal
withverylong,flatweblogs.Hadoopwasdesignedwithverylargedatablocks
(Hadoopdefaultblocksizeis64MBto128MBversusrelationaldatabaseblock
sizesthataretypically32Kborless).Tooptimizethisblocksizeadvantage,the
datascienceteamwantsverylong,flatrecordsandlong,flatdatamodels.1


</div>
<span class='text_page_counter'>(139)</span><div class='page_container' data-page=139>

<b>Figure5.10</b>UsingflatfilestoeliminateorreducejoinsonHadoop



AsanexampleinFigure5.10,insteadofthreedifferentstarschemaswith
conformedorshareddimensionstolinkthedifferentstarschemas,thedata
scienceteamwantsthreelong,flatfileswiththefollowingcustomerdata:


Customerdemographics(age,gender,currentandprevioushomeaddresses,
valueofcurrentandprevioushome,historyofmaritalstatus,kidsandtheir
agesandgenders,currentandpreviousincome,etc.)


Customerpurchasehistory(annualpurchasesincludingitemspurchased,
returns,pricespaid,discounts,coupons,location,dayofweek,timeofday,
weathercondition,temperatures)


</div>
<span class='text_page_counter'>(140)</span><div class='page_container' data-page=140></div>
<span class='text_page_counter'>(141)</span><div class='page_container' data-page=141>

visitastore,recencyofstorevisit,frequencyofstorevisitsinpast


week/month/quarter,howlongdoIstayatwhichstores(“passthru”or
“linger”),etc.


<b>Classifications.Nowwewanttocreatesome“classifications”aboutBill</b>


Schmarzo'slifethatmighthaveimpactonStarbucks'skeybusinessinitiatives
suchaslifestageclassification(longmarriage,kidincollege,kidathome,
weight/dietconscious,etc.),lifestyleclassification(heavytraveler,heavychai
teadrinker,lightexerciser,andsoon),orproductclassification(morning
coffee/oatmealconsumer,afternoonfrap/cookieconsumer,etc.).


<b>AssociationRules.Wemightalsowanttocapturesomepropensitiesabout</b>


Bill'susagepatternsthatwecanusetosupportStarbucks'skeybusiness
initiatives,includingpropensitytobuyoatmealwhenhebuyshisventichai
lattewhentravelinginthemorning,propensitytobuyacookie/pastrywhen


travelingintheafternoon,propensitytobuyproductinthechannel,etc.


<b>Scores.Wealsomaywanttocreatescorestosupportdecision-makingand</b>


processoptimization.Scoresthatwemightwanttocreate(again,depending
onStarbucks'skeybusinessinitiatives)couldincludeadvocacyscore(which
measuresmylikelihoodtorecommendStarbucksandmakepositivecomments
forStarbucksonsocialmedia),loyaltyscore(whichmeasuresmylikelihoodto
continuetovisitStarbucksstoresandbuyStarbucksproductsversus


competitors),productusagescore(whichisameasureofhowmuchStarbucks
productIconsume—andrevenueIgenerate—whenIvisitaStarbucksstore),
etc.


Aprofilecouldbemadeupofhundredsofmetricsandscoresthat—whenusedin
combinationagainstaspecificbusinessinitiativelikecustomerretention,


</div>
<span class='text_page_counter'>(142)</span><div class='page_container' data-page=142>

<b>Figure5.11</b>Samplecustomeranalyticprofile


Somemetricsandscoresaremoreimportantthanothers,dependingonthe


businessinitiativebeingaddressed.Forexample,afinancialservicesfirmfocused
oncustomeracquisition,disposableincome,retirementreadiness,lifestage,age,
educationlevel,andnumberoffamilymembersdatamaybethemostimportant
predictivemetrics.However,forthatsamefinancialservicesfirmfocusedon


customerretention,metricssuchasadvocacy,customersatisfaction,attritionrisk,
socialnetworkassociations,andselectsocialmediarelationshipsmaybethemost
importantpredictivemetrics.



</div>
<span class='text_page_counter'>(143)</span><div class='page_container' data-page=143>

<b>Figure5.12</b>Improvecustomerretentionexample


Theanalysisprocessworkslikethis:


<b>Step1:Establishahypothesisthatyouwanttotest.Inourcustomerretention</b>


example,ourtesthypothesisisthat“Premiumgoldcardmemberswithgreater
thanfivedayswithoutapurchaseormobileappengagementhave25to30
percenthigherprobabilityofchurnthansimilarcustomers.”


<b>Step2:Identifyandquantifythemostimportantmetricsorscorestopredicta</b>


certainbusinessoutcome.Inourexample,themetricsandscoresthatwe're
goingtousetotestourcustomerattritionhypothesisincludesCustomer


Tenure(inmonths),CustomerSatisfactionScore,AverageMonthlyPurchases,
andCustomerLoyaltyScore.Noticethatthemetricsdonothavethesame
weight(orconfidencelevel).Somemetricsandscoresaremoreimportantthan
othersinpredictingperformancegiventhetesthypothesis.


<b>Step3:Employthepredictivemetricstobuilddetailedprofilesforeach</b>


individualcustomerwithrespecttothehypothesistobetested.


<b>Step4:Compareanindividual'srecentactivitiesandcurrentstatewithhisor</b>


herprofileinordertoflagunusualbehaviorsandactionsthatmaybe
indicativeofacustomerretentionproblem.Inourcustomerretention


example,wemightwanttocreatea“CustomerAttrition”scorethatquantifies


thelikelihoodthatparticularcustomerisgoingtoleave,andthencreate


specificrecommendationsastowhatactionsor“nextbestoffers”canbe
deliveredtoretainthatcustomer.


<b>Step5:Continuetoseekoutnewdatasourcesandnewmetricsthatmaybe</b>


</div>
<span class='text_page_counter'>(144)</span><div class='page_container' data-page=144>

andscoresusingsensitivityanalysisandsimulationsliketheMonteCarlo
experiments.


<b>Step6:Integratetheanalyticinsights,scores,andrecommendationsintothe</b>


</div>
<span class='text_page_counter'>(145)</span><div class='page_container' data-page=145>

<b>Summary</b>



OrganizationsarerealizingthatdatascienceisverydifferentfromBIandthatone
doesnotreplacetheother.Bothcombinetoprovidethe“dynamicduo”of


analytics—onefocusedonmonitoringthecurrentstateofthebusinessandthe
othertryingtopredictwhatislikelytohappenandthenprescribewhatactionsto
take.


Bigdataisakeyenablerofanewdisciplinecalleddatascience.Datascienceseeks
toleveragenewsourcesofstructuredandunstructureddata,coupledwith


advancedpredictiveandprescriptiveanalytics,touncovernewvariablesand
metricsthatarebetterpredictorsofperformance.


Asdiscussedinthischapter,BIisdifferentfromdatascienceinthefollowing
ways:



Thequestionsaredifferent.


Theanalyticcharacteristicsaredifferent.


Theanalyticengagementprocessesaredifferent.
Thedatamodelsaredifferent.


Thebusinessviewisdifferent.


Thischapteralsointroducedtheveryimportantdatascienceconceptcalled
analyticprofiles.Organizationsarelearningthatmoreimportantthantryingto
createa360-degreeprofileofthecustomerisidentifyingandquantifyingthose
fewerbutmoreimportantmetricsthatarebetterpredictorsofbusinessor
customerperformancesuchasoptimizingkeybusinessprocesses,influencing
customerbehaviors,anduncoveringnewmonetizationopportunities.


</div>
<span class='text_page_counter'>(146)</span><div class='page_container' data-page=146>

<b>HomeworkAssignment</b>



Usethefollowingexercisestoapplywhatyoulearnedinthischapter.


<b>Exercise#1:DescribethekeydifferencesbetweenBIanddatascienceand</b>


whatthosedifferencesmeantoyourorganization.


<b>Exercise#2:Listsampledescriptive(Whathappened?),predictive(Whatis</b>


likelytohappen?),andprescriptive(WhatactionsshouldItake?)questions
thatarerelevanttothetargetedbusinessinitiativethatyouidentifiedin
Chapter2.



</div>
<span class='text_page_counter'>(147)</span><div class='page_container' data-page=147>

<b>Notes</b>



1<sub>ApacheHadoopisanopen-sourcesoftwareframeworkwritteninJavafor</sub>


distributedstorageanddistributedprocessingofverylargedatasetson


computerclustersbuiltfromcommodityhardware.AllthemodulesinHadoop
aredesignedwithafundamentalassumptionthathardwarefailures(of


</div>
<span class='text_page_counter'>(148)</span><div class='page_container' data-page=148></div>
<span class='text_page_counter'>(149)</span><div class='page_container' data-page=149>

<b>Chapter6</b>



<b>DataScience101</b>



Therearemanyexcellentbooksandcoursesfocusedonteachingpeoplehowto
becomeadatascientist.Thosebooksandcoursesprovidedetailedmaterialand
exercisesthatteachthekeycapabilitiesofdatasciencesuchasstatisticalanalysis,
datamining,textmining,SQLprogramming,andothercomputing,mathematical,
andanalytictechniques.Thatisnotthepurposeofthischapter.


ThepurposeofChapter6istointroducesomedifferentanalyticalgorithmsthat
businessusersshouldbeawareofandtodiscusswhenitmightbemost


appropriatetousewhichtypesofalgorithms.Youdonotneedtobeadata


scientisttounderstandwhenandwhytoapplytheseanalyticalgorithms.Amore
detailedunderstandingofthesedifferentanalyticalgorithmswillhelpthe


</div>
<span class='text_page_counter'>(150)</span><div class='page_container' data-page=150></div>
<span class='text_page_counter'>(151)</span><div class='page_container' data-page=151>

startviralmarketingcampaigns.


Thischapterreviewsanumberofdifferentanalytictechniques.Youarenot



expectedtobecomeanexpertinthesedifferentanalyticalgorithms.However,the
moreyouunderstandwhattheseanalyticalgorithmscando,thebetterposition
youareintocollaboratewithyourdatascienceteamandsuggesttheartofthe
possibletoyourbusinessleadershipteam.


FundamentalexploratoryanalyticalgorithmsthatarecoveredinChapter6are:
Trendanalysis


Boxplots


Geography(spatial)analysis
Pairsplot


Timeseriesdecomposition


Moreadvancedanalyticalgorithmsthatarecoveredinthischapterare:
Clusteranalysis


Normalcurveequivalent(NCE)analysis
Associationanalysis


Graphanalysis
Textmining


Sentimentanalysis


Traversepatternanalysis


Decisiontreeclassifieranalysis


Cohortsanalysis


Throughoutthechapter,youwillcontemplatehowTheParkscouldleverageeach
ofthesedifferentanalytictechniques.


<b>NOTE</b>



ThroughoutthischapterIprovidelinkstositesthatcanhelpyouget


</div>
<span class='text_page_counter'>(152)</span><div class='page_container' data-page=152>

<b>NOTE</b>



</div>
<span class='text_page_counter'>(153)</span><div class='page_container' data-page=153>

<b>FundamentalExploratoryAnalytics</b>



Let'sstartbycoveringsomebasicstatisticalanalysisthatwaslikelycoveredin
yourfirststatisticscourse(yes,Irealizethatyouprobablysoldyourstatsbookthe
minutethestatsclasswasover).Trendanalysis,boxplots,geographicalanalysis,
pairsplot,andtimeseriesdecompositionareexamplesofexploratoryanalytic
algorithmsthatthedatascientistsusetogeta“feelforthedata.”These


exploratoryanalyticalgorithmshelpthedatascienceteamtobetterunderstand
thedatacontentandgainahigh-levelunderstandingofrelationshipsandpatterns
inthedata.


<b>TrendAnalysis</b>



Trendanalysisisafundamentalvisualizationtechniquetospotpatterns,trends,
relationships,andoutliersacrossatimeseriesofdata.Oneofthemostbasicyet
verypowerfulexploratoryanalytics,trendanalysis(applyingdifferentplotting
techniquesandgraphicvisualizations)canquicklyuncovercustomer,operational,
orproducttrendsandeventsthattendtohappentogetherorhappenatsome


periodofregularity(seeFigure6.1).


<b>Figure6.1</b>Basictrendanalysis


InFigure6.1,thedatascientistmanuallytestedanumberofdifferenttrending
optionsinordertoidentifythe“bestfit”trendline(inthisexample,using


MicrosoftExcel).Oncethedatascientistidentifiesthebesttrendingoption,the
datascientistcanautomatethegenerationofthetrendlinesusingR.


</div>
<span class='text_page_counter'>(154)</span><div class='page_container' data-page=154>

differentbusinessdimensions(e.g.,products,geographies,salesterritories,
markets)inordertoundercoverpatternsandtrendsatthenextlevelof


granularity.Thedatascientistcanthenwriteaprogramtojuxtaposethedetailed
trendlinesintothesamechartsothatitiseasiertospottrends,patterns,


relationships,andoutliersburiedinthegranulardata(seeFigure6.2).


<b>Figure6.2</b>Compoundtrendanalysis


Finally,trendanalysisyieldsmathematicalmodelsforeachofthetrendlines.
Thesemathematicalmodelscanbeusedtoquantifyreoccurringpatternsor


behaviorsinthedata.Themostinterestinginsightsfromthetrendlinescanthen
beflaggedforfurtherinvestigationbythedatascienceteam(seeFigure6.3).


</div>
<span class='text_page_counter'>(155)</span><div class='page_container' data-page=155>

<b>WARNING</b>



</div>
<span class='text_page_counter'>(156)</span><div class='page_container' data-page=156>

<b>TheParksRamifications</b>




TheParkscouldusetrendanalysistoidentifythevariables(e.g.,waittimes,
socialmediaposts,consumercomments)thatarehighlycorrelatedtothe
increaseordecreaseinguestsatisfactionforeachattraction,restaurant,retail
outlet,andentertainment.TheParkscouldleveragetheresultsfromthetrend
analysisto


1. Flagproblemareasandtakecorrectiveactionssuchasopeningmorelines,
promotinglessbusyattractions,movingkiosksthatareblockingtraffic
flow,andresituatingcharactersatdifferentpointsintheparks;


2. Identifythelocationandtypesoffutureattractions,restaurants,retail
outlets,andentertainment.


Formoreinformationabouthowtomakesimpleplotsandgraphs(line
charts,barcharts,histograms,dotcharts)inR,checkout


/>


<b>Boxplots</b>



Boxplotsareoneofthemoreinterestingandvisuallycreativeexploratoryanalytic
algorithms.Boxplotsquicklyvisualizevariationsinthebasedataandcanbeused
toidentifyoutliersinthedataworthyoffurtherinvestigation.Aboxplotisa


convenientwayofgraphicallydepictinggroupsofnumericaldatathroughtheir
quartiles.Boxplotsmayalsohavelinesextendingverticallyfromtheboxes


(whiskers)indicatingvariabilityoutsidetheupperandlowerquartiles,hencethe
<i>termsbox-and-whiskerplotandbox-and-whiskerdiagram(seeFigure6.4).</i>


</div>
<span class='text_page_counter'>(157)</span><div class='page_container' data-page=157></div>
<span class='text_page_counter'>(158)</span><div class='page_container' data-page=158>

<b>TheParksRamifications</b>




TheParkscanemployboxplotstodetermineitsmostloyalguestsforeachof
thepark'sattractions(e.g.,CanyonCopterRide,MonsterMansion,Space
Adventure,GhoulishGulch).TheParkscanusetheresultsoftheboxplot
analysistocreateguestcurrentandmaximumlifetimevaluescoresagainst
whichtoprioritizetowhomtorewardwithPriorityAccesspassesandother
couponsanddiscounts.


FormoreinformationaboutcreatingboxplotsinR,checkout
/>


<b>Geographical(Spatial)Analysis</b>



Geographicalorspatialanalysisincludestechniquesforanalyzinggeographical
activitiesandconditionsusingabusinessentity'stopological,geometric,or
geographicproperties.Forexample,geographicalanalysissupportsthe


integrationofzipcodeanddata.goveconomicdatawithaclient'sinternaldatato
provideinsightsaboutthesuccessoftheorganization'sgeographicalreachand
marketpenetration(seeFigure6.5).


<b>Figure6.5</b>Geographical(spatial)trendanalysis


IntheexampleinFigure6.5,geographicalanalysisiscombinedwithtrend


</div>
<span class='text_page_counter'>(159)</span><div class='page_container' data-page=159>

<b>TheParksRamifications</b>



TheParkscanconductgeographicaltrendanalysistospotanychanges(at
boththezip+4andhouseholdlevels)inthegeo-demographicsofguestsover
timeandbyseasonalityandholidays.TheParkscanusetheresultsofthis
geographicalplusseasonalityanalysistocreategeo-specificcampaignsand




promotionswiththeobjectiveofincreasingattendancefromunder-penetratedgeographicalareasbydayofweek,holidays,andseasonality.


<b>PairsPlot</b>



Pairsplotanalysismaybemyfavoriteanalyticsalgorithm.Pairsplotanalysis
allowsthedatascientisttospotpotentialcorrelationsusingpairwisecomparisons
acrossmultiplevariables.Pairsplotanalysisprovidesadeepviewintothe


differentvariablesthatmaybecorrelatedandcanformthebasisforguidingthe
datascienceteamintheidentificationofkeyvariablesormetricstoincludeinthe
developmentofpredictivemodels(seeFigure6.6).


<b>Figure6.6</b>Pairsplotanalysis


</div>
<span class='text_page_counter'>(160)</span><div class='page_container' data-page=160>

<b>TheParksRamifications</b>



TheParkscanleveragepairsplotanalysistocompareamultitudeofvariables
toidentifythosevariablesthatdrivegueststoparticularattractions,


entertainment,retailoutlets,andrestaurants.TheParkscanusetheresultsof
theanalysistodrivein-parkpromotionaldecisionsandoffersthatdirect


gueststounder-utilizedattractions,entertainment,retailoutlets,and
restaurants.


AdditionalpairedplotoptionsinR(e.g.,pairs,splom,plotmatrix,ggcorplot,
panelcor)canbefoundat



/>


<b>TimeSeriesDecomposition</b>



Timeseriesdecompositionexpandsonthebasictrendanalysisbydecomposing
thetraditionaltrendanalysisintothreeunderlyingcomponentsthatcanprovide
valuablecustomer,product,oroperationalperformanceinsights.Thesetrend
analysiscomponentsare


<b>Cyclicalcomponentthatdescribesrepeatedbutnon-periodicfluctuations,</b>


<b>Seasonalcomponentthatreflectsseasonality(seasonalvariation),</b>


<b>Irregularcomponent(or“noise”)thatdescribesrandom,irregularinfluences</b>


andrepresentstheresidualsofthetimeseriesaftertheothercomponentshave
beenremoved.


Fromthetimeseriesdecompositionanalysis,abusinessusercanspotparticular
areasofinterestinthedecomposedtrenddatathatmaybeworthyoffurther
analysis(seeFigure6.7).


</div>
<span class='text_page_counter'>(161)</span><div class='page_container' data-page=161></div>
<span class='text_page_counter'>(162)</span><div class='page_container' data-page=162>

<b>TheParksRamifications</b>



TheParkscandeploytimeseriesdecompositionanalysistoidentifyand
quantifytheimpactthatseasonalityandspecificeventsarehavingonguest
visitsandassociatedspend.TheParkscanusetheresultsoftheanalysisto


1. Createseason-specificmarketingcampaignsandpromotionstoincrease
guestvisitsandassociatedspend,



2. Determinewhicheventsoutsideofthethemeparks(concerts,professional
sportingevents,BCSfootballgames)areworthyofpromotionaland


sponsorshipspend.


FormoreinformationabouttimeseriesdecompositioninR,checkout


</div>
<span class='text_page_counter'>(163)</span><div class='page_container' data-page=163>

<b>AnalyticAlgorithmsandModels</b>



Thefollowinganalyticalgorithmsstarttomovethedatascientistbeyondthedata
explorationstageintothemorepredictivestagesoftheanalysisprocess.These
analyticalgorithmsbytheirnaturearemoreactionable,allowingthedatascientist
toquantifycauseandeffectandprovidethefoundationtopredictwhatislikelyto
happenandrecommendspecificactionstotake.


<b>ClusterAnalysis</b>



Clusteranalysisisusedtouncoverinsightsabouthowcustomersand/orproducts
clusterintonaturalgroupingsinordertodrivespecificactionsor


recommendations(e.g.,personalizedmessaging,targetmarketing,maintenance
scheduling).Clusteranalysisorclusteringistheexerciseofgroupingasetof


objectsinsuchawaythatobjectsinthesamegrouparemoresimilartoeachother
thantothoseinothergroups(clusters).


Clusteringanalysiscanuncoverpotentialactionableinsightsacrossmassivedata
volumesofcustomerandproducttransactionsandevents.Clusteranalysiscan
uncovergroupsofcustomersandproductsthatsharecommonbehavioral
tendenciesand,consequently,andcanbetargetedwiththesamemarketing


treatments(seeFigure6.8).


</div>
<span class='text_page_counter'>(164)</span><div class='page_container' data-page=164>

<b>NOTE</b>



</div>
<span class='text_page_counter'>(165)</span><div class='page_container' data-page=165>

<b>TheParksRamifications</b>



TheParkscanleverageclusteranalysistocreatemoreactionableprofilesof
thepark'smostprofitableguestclustersandhighestpotentialguestclusters.
TheParkscanusetheresultsoftheanalysistoquantify,prioritize,andfocus
guestacquisitionandguestactivationmarketingefforts.


FormoreinformationaboutclusteranalysisinR,checkout


/>


<b>NormalCurveEquivalent(NCE)Analysis</b>



Atechniquefirstusedinevaluatingstudents'testingperformance,normalcurve
equivalent(NCE),isadatatransformationtechniquethatapproximatelyfitsa
normaldistributionbetween0and100bynormalizingadatasetinpreparation
forpercentilerankanalysis.Forexample,anNCEdatatransformationisawayof
standardizingscoresreceivedonatestintoa0–100scalesimilartoapercentile
<i>rankbutpreservingthevaluableequal-intervalpropertiesofaz-score(seeFigure</i>
6.9).


<b>Figure6.9</b>Normalcurveequivalentanalysis


</div>
<span class='text_page_counter'>(166)</span><div class='page_container' data-page=166></div>
<span class='text_page_counter'>(167)</span><div class='page_container' data-page=167></div>
<span class='text_page_counter'>(168)</span><div class='page_container' data-page=168>

<b>TheParksRamifications</b>



TheParkscanemploytheNCEtechniquetounderstandpriceinflection



pointsforpackagesofattractionsandrestaurants.TheParkscanleveragethe
priceinflectionpointstooptimizepricing(e.g.,createapackageofattractions
andrestaurantsbyseasonality,holiday,dayofweek,etc.)andcreatenew
PriorityAccesspackages.


<i>Formoreinformationabouthowtousez-scorestonormalizedatausingR,</i>
checkout


Formore


insightsintotheNCEdatatransformationtechnique,see


/>


<b>AssociationAnalysis</b>



Associationanalysisisapopularalgorithmfordiscoveringandquantifying
relationshipsbetweenvariablesinlargedatabases.Associationanalysisshows
customerorproducteventsoractivitiesthattendtohappentogether,which
makesthistypeofanalysisveryactionable.Forexample,theassociationrule
{buns,ketchup}→{burger}foundinthepoint-of-salesdataofasupermarket
wouldindicatethatifacustomerbuysbunsandketchuptogether,sheislikelyto
alsobuyhamburgermeat.Suchinformationcanbeusedasthebasisformaking
pricing,productplacement,promotion,andothermarketingdecisions.


Associationanalysisisthebasisformarketbasketanalysis(identifyingproducts
and/orservicesthatsellincombinationorsellwithapredictabletimelag)thatis
usedinmanyindustriesincludingretail,telecommunications,insurance,digital
marketing,creditcards,banking,hospitality,andgaming.


</div>
<span class='text_page_counter'>(169)</span><div class='page_container' data-page=169>

<b>Figure6.11</b>Associationanalysis



Oneveryactionabledatasciencetechniqueistoclustertheresultingassociation
rulesintocommongroupsorsegments.Forexample,inFigure6.12,thedata
scienceteamclusteredtheresultingassociationrulesacrosstensofmillionsof
customersinordertocreatemoreaccurate,relevantcustomersegments.Inthis
process,thedatascienceteam


Runstheassociationanalysisacrossthetensofmillionsofcustomersto
identifyassociationruleswithahighdegreeofconfidence,


Clustersthecustomersandtheirresultingassociationrulesintocommon


groupingsorsegments(e.g.,Chipotle+Starbucks,VirginAmerica+Marriott),
Usesthesenewsegmentsasthebasisforpersonalizedmessaginganddirect
marketing.


<b>Figure6.12</b>Convertingassociationrulesintosegments


</div>
<span class='text_page_counter'>(170)</span><div class='page_container' data-page=170></div>
<span class='text_page_counter'>(171)</span><div class='page_container' data-page=171>

<b>TheParksRamifications</b>



TheParkscanleveragemarketbasketanalysistoidentifythemostpopular
andleastpopular“packagesofattractions.”TheParkscanusethis“packages
ofattractions”datato


1. CreatenewpricingandPriorityAccesspackagesforthemostpopular
packagesinordertooptimizein-parktrafficflowandreduceattraction
waittimes,


2. CreatenewpricingandPriorityAccesspackagesfortheleastpopular
“packages”inordertodrivetraffictounder-utilizedattractions.



FormoreinformationaboutassociationanalysisinR,checkout


/>


<b>GraphAnalysis</b>



Graphanalysisisoneofthemorepowerfulanalysistechniquesmadepopularby
socialmediaanalysis.Graphanalysiscanquicklyhighlightcustomerormachine
(thinkInternetofThings)relationshipsobscuredacrossmillionsifnotbillionsof
socialandmachineinteractions.


Graphanalysisusesmathematicalstructurestomodelpairwiserelationsbetween
objects.A“graph”inthiscontextismadeupof“vertices”or“nodes”andlines
callededgesthatconnectthem.Socialnetworkanalysis(SNA)isanexampleof
graphanalysis.SNAisusedtoinvestigatesocialstructuresandrelationships
acrosssocialnetworks.SNAcharacterizesnetworkedstructuresintermsofnodes
(peopleorthingswithinthenetwork)andthetiesoredges(relationshipsor


interactions)thatconnectthem(seeFigure6.13).


<b>Figure6.13</b>Graphanalysis


</div>
<span class='text_page_counter'>(172)</span><div class='page_container' data-page=172></div>
<span class='text_page_counter'>(173)</span><div class='page_container' data-page=173>

<b>TheParksRamifications</b>



TheParkscanemploygraphanalysistouncoverstrengthofrelationships
amonggroupsofguests(leaders,followers,influencers,cohorts).TheParks
canleveragethegraphanalysisresultstodirectpromotions(discounts,
restaurantvouchers,travelvouchers)togroupleadersinordertoencourage
theseleaderstobringgroupsbacktotheparksmorefrequently.



FormoreinformationaboutsocialnetworkanalysisinR,checkout



-r-using-package-igraph/.


<b>TextMining</b>



Textminingreferstotheprocessofderivingusableinformation(metadata)from
textfilessuchasconsumercomments,e-mailconversations,physicianor


techniciannotes,workorders,etc.Basically,textminingcreatesstructureddata
outofunstructureddata.


Textminingisaverypowerfultechniquetoshowduringanenvisioningprocess,
asmanybusinessstakeholdershavestruggledtounderstandhowtheycangain
insightsfromthewealthofinternalcustomer,product,andoperationaldata.Text
miningisnotsomethingthatthedatawarehousecando,somanybusiness


stakeholdershavestoppedthinkingabouthowtheycanderiveactionableinsights
fromtextdata.Consequently,itisimportanttoleverageenvisioningexercisesto
helpthebusinessstakeholderstoimagetherealmofwhatispossiblewithtext
data,especiallywhenthattextdataiscombinedwiththeorganization's


operationalandtransactionaldata.


</div>
<span class='text_page_counter'>(174)</span><div class='page_container' data-page=174>

<b>Figure6.14</b>Textmininganalysis


</div>
<span class='text_page_counter'>(175)</span><div class='page_container' data-page=175></div>
<span class='text_page_counter'>(176)</span><div class='page_container' data-page=176>

<b>TheParksRamifications</b>



TheParkscanmineguestcomments,socialmediaposts,ande-mailstoflag


andrankareasofconcernandproblemsituations.TheParkscanleveragethe
textminingresultstolocateunsatisfiedguestsinordertodrivepersonal
(face-to-face)guestinterventionefforts.


FormoreinformationabouttextmininganalysisusingR,checkout



/>


<b>SentimentAnalysis</b>



Sentimentanalysiscanprovideabroadandgeneraloverviewofyourcustomers'
sentimenttowardyourcompanyandbrands.Sentimentanalysiscanbea


powerfulwaytogleaninsightsaboutthecustomers'feelingsaboutyourcompany,
products,andservicesoutoftheever-growingbodyofsocialmediasites


(Facebook,LinkedIn,Twitter,Instagram,Yelp,Snapchat,Vine,etc.)(seeFigure
6.15).


<b>Figure6.15</b>Sentimentanalysis


InFigure6.15,thedatascienceteamconductedcompetitivesentimentanalysisby
classifyingtheemotions(e.g.,anger,disgust,fear,joy,sadness,surprise)of


</div>
<span class='text_page_counter'>(177)</span><div class='page_container' data-page=177>

keycompetitor'sperceivedperformanceandqualityofserviceissuffering).


Unfortunately,itissometimesdifficulttogetthesocialmediadataatthelevelof
theindividual,whichisrequiredtocreatemoreactionableinsightsand


recommendationsattheindividualcustomerlevel.However,leading



</div>
<span class='text_page_counter'>(178)</span><div class='page_container' data-page=178>

<b>TheParksRamifications</b>



TheParkscanestablishasentimentscoreforeachattractionandcharacter

andmonitorsocialmediasentimentfortheattractionsandcharactersinreal-time.TheParkscanleveragethereal-timesentimentscorestotakecorrective
actions(placateunhappyguests,openadditionallines,openadditional


attractions,removekiosks,movecharacters).


FormoreinformationaboutsentimentanalysisusingR,checkout


/>


<b>TraversePatternAnalysis</b>



Traversepatternanalysisisanexampleofcombiningacoupleofanalytic


algorithmstobetterunderstandcustomer,product,oroperationalusagepatterns.
Traverseanalysislinksacustomerorproductusagepatternsandassociationrules
toageographicalorfacilitymapinordertoidentifypotentialpurchase,traffic,
flow,fraud,theft,andotherusagepatternsandrelationships.


Theprocessstartsbycreatingassociationrulesfromthecustomer'sorproduct's
usagedata,andthenmapsthoseassociationrulestoageographicalmap(store,
hospital,school,campus,sportsarena,casino,airport)toidentifypotential
performance,usage,staffing,inventory,logistics,trafficflow,etc.problems.
InFigure6.16,thedatascienceteamcreatedaseriesofassociationrulesabout
slotandtableplayinacasino,andthenusedthoseassociationrulestoidentify
potentialfootflowproblemsandgamelocationoptimizationopportunities.The
datascienceteam



Createdplayerperformanceassociationrulesaboutwhatgamestheplayers
tendtoplayincombination,


</div>
<span class='text_page_counter'>(179)</span><div class='page_container' data-page=179>

<b>Figure6.16</b>Traversepatternanalysis


Theresultsofthisanalysishighlightsareasofthecasinothataresub-optimized
whencertaintypesofgameplayersareinthecasinoandcanleadto


</div>
<span class='text_page_counter'>(180)</span><div class='page_container' data-page=180></div>
<span class='text_page_counter'>(181)</span><div class='page_container' data-page=181>

<b>TheParksRamifications</b>



TheParkscanemploytraversepatternanalysistounderstandparkandguest
flowswithrespecttoattractions,entertainment,retailoutlets,restaurants,
characters,etc.TheParkscanusethetraversepatternanalysisresultsto


1. Identifywheretoplacecharactersandsituateportablekiosksinorderto
increaserevenues,


2. Determinewhatpromotionstoofferinordertodrivetraffictoidle
attractionsandrestaurants.


<b>DecisionTreeClassifierAnalysis</b>



Decisiontreeclassifieranalysisusesdecisiontreestoidentifygroupingsand
clustersburiedintheusageandperformancedata.Decisionclassifieranalysis
usesadecisiontreeasapredictivemodelthatmapsobservationsaboutanitemto
conclusionsabouttheitem'stargetvalue.


InFigure6.17,thedatascienceteamusedthedecisiontreeclassifieranalysis
techniquetoidentifyandgroupperformanceandusagevariablesintosimilar


clusters.Thedatascienceteamuncoveredproductperformanceclustersthat,
whenoccurringincertaincombinations,wereindicativeofpotentialproduct
performanceormaintenanceproblems.


</div>
<span class='text_page_counter'>(182)</span><div class='page_container' data-page=182>

<b>TheParksRamifications</b>


TheParkscanusedecisiontreeclassifieranalysistoquantifythevariables
thatdriveguestsatisfactionandincreasespendbyguestclusters.TheParks
canleveragethedecisiontreeclassifieranalysisresultstodeterminewhich
variablestomanipulateinordertodriveguestsatisfactionandassociated
guestspend.

FormoreinformationaboutbuildingdecisiontreesusingR,checkout“Tree-BasedModels”at />


<b>CohortsAnalysis</b>


Cohortsanalysisisusedtoidentifyandquantifytheimpactthatanindividualor
machineshaveonthelargergroup.
Cohortsanalysisiscommonlyusedbysportsteamstoascertaintherelativevalue
ofaplayerwithrespecttohisorherinfluenceonthesuccessoftheoverallteam.
TheNationalBasketballAssociationusesarealplus-minus(RPM)metricto
measureaplayer'simpactonthegame,representedbydifferencebetweenthe
team'stotalscoringanditsopponent's.Table6.1showstopRPMplayersfromthe
2014–2015NBAseason.
<b>Table6.1</b>2014–2015TopNBARPMRankings


<b>Rank Player</b> <b>Team MPG RPM</b>


1 StephenCurry,PG GS 32.7 9.34
2 LeBronJames,SF CLE 36.1 8.78
3 JamesHarden,SG HOU 36.8 8.50
4 AnthonyDavis,PF NO 36.1 8.18
5 KawhiLeonard,SF SA 31.8 7.57


6 RussellWestbrook,PG OKC 34.4 7.08
7 ChrisPaul,PG LAC 34.8 6.92
8 DraymondGreen,SF GS 31.5 6.80
9 DeMarcusCousins,C SAC 34.1 6.12
10 KhrisMiddleton,SG MIL 30.1 6.06


Source: />


</div>
<span class='text_page_counter'>(183)</span><div class='page_container' data-page=183></div>
<span class='text_page_counter'>(184)</span><div class='page_container' data-page=184></div>
<span class='text_page_counter'>(185)</span><div class='page_container' data-page=185>

<b>TheParksRamifications</b>



TheParkscanemploycohortsanalysistoidentifyspecificemployeesand
charactersthatincreasetheoverallpark,attractions,characters,customer,
andhouseholdsatisfactionandspendlevels.TheParkscanusetheresultsof
thecohortsanalysisto


1. Decidehowmanyandwheretosituatespecific,popularcharacters;
2. Rewardparkassociatesthatdrivehighercustomersatisfactionscores.
FormoreinformationaboutcohortsanalysisinR,checkoutthearticle
“CohortAnalysiswithR–RetentionCharts”at


</div>
<span class='text_page_counter'>(186)</span><div class='page_container' data-page=186>

<b>Summary</b>


TheobjectiveofChapter6istogiveyouatasteforthedifferenttypesofanalytic
algorithmsadatascienceteamcanbringtobearonthebusinessproblemsor
opportunitiesthattheorganizationistryingtoaddress.Thischapterbetter
acquaintedyouwiththedifferentalgorithmsthatthedatascienceteamcanuseto
acceleratethebusinessuseranddatascienceteamcollaborationprocess.Whileit
isnottheexpectationofthisbookorchaptertoturnbusinessusersintodata
scientists,itismyhopethatChapter6willsetthefoundationthathelpsbusiness
<i>usersandbusinessleadersto“thinklikeadatascientist.”</i>
Thischapterintroducedawidevarietyofanalyticalgorithmsthatthedatascience
teammightuse,dependingontheproblembeingaddressedandthetypesand

varietiesofdataavailable.Italsointroducedafictitiouscompany(Fairy-Tale
ThemeParks)againstwhichyouappliedthedifferentanalytictechniquestosee
thepotentialbusinessactions(seeTable6.2).
<b>Table6.2</b>CaseStudySummary


</div>
<span class='text_page_counter'>(187)</span><div class='page_container' data-page=187></div>
<span class='text_page_counter'>(188)</span><div class='page_container' data-page=188></div>
<span class='text_page_counter'>(189)</span><div class='page_container' data-page=189>

<b>HomeworkAssignment</b>



Usethefollowingexercisestoapplywhatyoulearnedinthischapter.


<b>Exercise#1:Revieweachoftheanalyticalgorithmscoveredinthischapter</b>


andwritedownoneortwousecaseswherethatparticularanalyticalgorithm
mightbeusefulgivenyourbusinesssituations.


<b>Exercise#2:Revisitthekeybusinessinitiativethatyouidentifiedin</b>Chapter
2.Writedowntwoorthreeoftheanalyticalgorithmscoveredinthischapter
thatyouthinkmightbeappropriatetothedecisionsthatyouaretryingto
makeinsupportofthatkeybusinessinitiative.


<b>Exercise#3:Writedowntwoorthreebulletpointsaboutwhyyouthink</b>


</div>
<span class='text_page_counter'>(190)</span><div class='page_container' data-page=190>

<b>Notes</b>



1<sub>Risaprogramminglanguageandsoftwareenvironmentforstatistical</sub>


</div>
<span class='text_page_counter'>(191)</span><div class='page_container' data-page=191></div>
<span class='text_page_counter'>(192)</span><div class='page_container' data-page=192></div>
<span class='text_page_counter'>(193)</span><div class='page_container' data-page=193></div>
<span class='text_page_counter'>(194)</span><div class='page_container' data-page=194>

<b>NOTE</b>



Itistypicalthat40to60percentofthedatawarehouseprocessingloadis
performingETLwork.Off-loadingsomeoftheETLprocessestothedata
lakecanfreeupconsiderabledatawarehouseresources.



UnhandcufftheBIanalystsanddatascienceteamfrombeingreliantonthe
summarizedandaggregateddatainthedatawarehouseasthesinglesourceof
datafortheirdataanalytics(andmitigatetheunmanageableproliferation
“spreadmarts”1thatarebeingusedbybusinessanalyststoworkaroundthe
analyticlimitationsofthedatawarehouse).


Thedatalakesolvesagreatmanyproblems.However,itcanalsoraisealotof
questions.Inapapertitled“BewaretheDataLakeFallacy”


(Gartnerraisedcautionsaboutthe


datalake,specificallyaroundtheassumptionthatallenterpriseaudiencesare
highlyskilledatdatamanipulationandanalysis.Gartner'spointwasthatifadata
lakefocusesonlyonstoringdisparatedataandignoreshoworwhydataisused,
governed,defined,andsecuredorhowdescriptivemetadataiscapturedand
maintained,thedatalakerisksturningintoadataswamp.Andwithoutan
adequatemetadatastrategy,everysubsequentuseofdatameanstheanalysts
muststartfromscratch.


Theabilityofanorganizationtorealizebusinessvaluefrombigdatareliesonthe
organization'sabilitytoeasilyandquickly:


Identifythe“rightand/orbestdata”


Definetheanalyticsrequiredtoextractthevalue


Bringthedataintoananalyticsenvironment(sandbox)suitedforadvanced
analyticsordatasciencework



Curatethedatatoapointwhereitis“suited”foranalysis


Standuptherequiredinfrastructuretosupporttheanalyticsinaccordance
withthedesiredperformanceandthroughputrequirements


Executetheanalyticmodelsagainstthecurateddatatoderivebusinessvalue
Deploytheanalyticsintotheproductioninfrastructure


</div>
<span class='text_page_counter'>(195)</span><div class='page_container' data-page=195>

<b>NOTE</b>



</div>
<span class='text_page_counter'>(196)</span><div class='page_container' data-page=196>

<b>CharacteristicsofaBusiness-ReadyDataLake</b>



Thedatalakeisnotanincrementalenhancementtothedatawarehouse,anditis
NOTdatawarehouse2.0.Thedatalakeenablesentirelynewcapabilitiesthat
allowyourorganizationtoaddressdataandanalyticchallengesthatthedata
warehousecouldnotaddress.


Therearefivecharacteristicsthatdifferentiateabusiness-readydatalakefromthe
datawarehouse(seeFigure7.1):


<b>Figure7.1</b>Characteristicsofadatalake


<b>Ingest.Abilitytorapidlyingestdatafromawiderangeofinternaland</b>


externaldatasources,includingstructuredandunstructureddatasources.The

datalakecanaccomplishrapiddataingestionbecauseitcanloadthedataas-
is;thatis,thedatalakedoesnotrequireanydatatransformationsorpre-buildingadataschemabeforeloadingthedata.


<b>Store.AsingleorcentralrepositoryforamassingALLtheorganization'sdata</b>



includingdatafrompotentiallyinterestingexternalsources.Thedatalakecan
storedataeveniftheorganizationhasnotyetdecidedhowitmightusethe
data.AstheDirectorofAnalyticsandBusinessIntelligenceatStarbuckswas
quoted:“AfullquarterofStarbuckstransactionsaremadeviaitspopular


loyaltycards,andthatresultsin“hugeamounts”ofdata,butthecompanyisn't
surewhattodowith[allthatdata]yet.”Thesamegoesforsocialmediadata,
asStarbuckshasateamwhoanalyzessocialdata,but“Wehaven'tfiguredout
whatexactlytodowithityet.”2


<b>Analyze.Providesthefoundationfortheanalyticsenvironment(oranalytics</b>


</div>
<span class='text_page_counter'>(197)</span><div class='page_container' data-page=197>

internalandexternaldatasourceswiththegoalofuncoveringnewcustomer,
product,andoperationalinsightsthatcanbeusedoptimizekeybusiness
processesandfuelnewmonetizationopportunities.


<b>Surface.Supportstheanalyticmodeldevelopmentandtheextractingofthe</b>


analyticresults(e.g.,scores,recommendations,nextbestoffer,businessrules)
thatareusedtoempowerfrontlineemployees'andbusinessmanagers'


decisionmakingandinfluencecustomerbehaviorsandactions.


<b>Act.Enablestheintegrationoftheanalyticresultsbackintotheorganization's</b>


operationalsystems(callcenter,directmarketing,procurement,store


</div>
<span class='text_page_counter'>(198)</span><div class='page_container' data-page=198></div>
<span class='text_page_counter'>(199)</span><div class='page_container' data-page=199>

Asadatawarehousemanager,Ihatedtheanalyticsteam.Why?Because



wheneveritsmembersneededdata,theyalwayscametomydatawarehousefor
thedatabecausetheyweretoldthatthedatawarehousewasthe“singleversionof
thetruth.”Andtheanalyticteam'sdataandqueryrequestsusuallyscrewedupmy
productionSLAsintheprocess(seeFigure7.2).


<b>Figure7.2</b>Theanalyticsdilemma


</div>
<span class='text_page_counter'>(200)</span><div class='page_container' data-page=200>

<b>Figure7.3</b>Thedatalakelineofdemarcation


Thedatalakeprovidesa“lineofdemarcation”betweentheproduction


</div>

<!--links-->
<a href=' /><a href=''>SeekingAlpha.com</a>
<a href=''>Realtor.com</a>
<a href=' /><a href='http:// gains-high-food-costs-dent-margins/2013-10-21'> </a>
<a href=' /><a href=' /><a href=' /><a href=''>Mint.com</a>
<a href=' />
<a href=' /><a href=' /><a href=' /><a href=' /><a href=' /><a href=' /><a href=' /> MBA (International Business): PROSPECTUS 2013-15 potx
  • 16
  • 263
  • 0
  • ×