Tải bản đầy đủ (.pdf) (7 trang)

What leaders must know about data for machine learning

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (4.45 MB, 7 trang )

MIT SMR CONNECTIONS
M A N AG E R ’S G U I D E

What Leaders Must Know About
Data for Machine Learning

ON BEHALF OF:


MANAGER’S GUIDE — WHAT LEADERS MUST KNOW ABOUT DATA FOR MACHINE LEARNING

1. Align machine learning initiatives with business priorities.

2. Create and maintain a comprehensive view of all data assets.
3. Lay the groundwork for data governance.

4. Identify the specific roles required to build a strong data foundation
for machine learning.

Data Management Strategy Checklist............................................................................. 5
Sponsor’s Viewpoint: Your Data Strategy Is Key to Machine Learning;
a Data Lake Can Help.................................................................................................... 6

C O N T E N TS

What Leaders Must Know About Data to Drive Success With Machine Learning ........... 2

MIT SMR Connections develops content in collaboration with our sponsors.
It operates independently of the MIT Sloan Management Review editorial group.
Copyright © Massachusetts Institute of Technology, 2020. All rights reserved.


MIT SMR CONNECTIONS


MANAGER’S GUIDE — WHAT LEADERS MUST KNOW ABOUT DATA FOR MACHINE LEARNING

What Leaders Must Know About Data to
Drive Success With Machine Learning

M

achine learning is taking predictive analytics to the

For example, Intuit’s machine learning initiatives aim to im-

next level to drive tangible business value for a wide

prove customer service by providing personalized recommen-

array of industries. Algorithms allow credit card

dations to subscribers of its accounting and tax software pro-

companies to detect fraud in real time and help retailers direct

grams. An online retailer may plan to use machine learning to

offers to the customers most likely to respond. In health care,

create more-effective targeted marketing campaigns, while an


tools powered by machine learning help doctors transcribe

automotive manufacturer may be building machine learning

notes more easily so they can focus on patient care. Manufac-

systems to predict equipment failures.

turers can take in data from sensors on plant equipment and
recommend maintenance before malfunctions cause produc-

Establishing which of a business’s strategic priorities have the

tion delays.

best potential to be advanced via machine learning provides
clarity around which data sets are most important to collect,

But machine learning models are only as good as the data

store, and prepare for analysis.

they ingest. “If data is not clean, if it’s not accessible, if it isn’t
stitched together to form a strong foundation, the machine

“Being focused on knowing what data is truly driving your

learning and artificial intelligence capabilities built on top of it

business and matters most is the first piece to a data strategy,”


will have problems,” warns Ashok Srivastava, senior vice pres-

says Juan Tello, chief data officer at Deloitte Consulting and

ident and chief data officer at financial software provider In-

principal in its Strategy & Analytics practice. “So, for example,

tuit. This can lead to difficulties such as inaccurate insights or

if business priorities are to win more customers and provide

inherent bias — factors that can hamper intelligent business

more-competitive pricing based on the products a company

decision-making.

sells, that requires three critical data domains: customer data,
pricing data, and product data. Prioritizing the data strategy

Fortunately, businesses can avoid these perils by designing a

on those areas as a starting point will maximize business out-

data management strategy that develops new capabilities, ini-

comes. Organizations should also reevaluate and adjust as their


tiatives, and roles around machine learning. This guide aims to

business priorities change.”

share lessons from business leaders and industry experts on
how, with the right policies and frameworks in place, data can

This focus is essential, given the vast volumes of data gener-

serve as a strategic corporate asset.

ated by enterprise applications, connected devices, and customer interactions via the web or social media platforms, to

1. Align machine learning initiatives with business priorities.

name just a few sources. However, by narrowing the scope for

The first step in creating an enterprise data management strat-

data management to three or four key sources, businesses can

egy is understanding the business’s goal for machine learning.

focus on those data sets that will deliver the most value.

MIT SMR CONNECTIONS

2



MANAGER’S GUIDE — WHAT LEADERS MUST KNOW ABOUT DATA FOR MACHINE LEARNING

At Intuit, data management experts meet with
the teams that own data to build a catalog of
that information, resulting in a robust list of data
assets within the company.

2. Create and maintain a comprehensive view of all data assets.

and transparency are key to building trust: Business units now

For data to be useful, a business must know it exists. Unfor-

collaborate so that the company knows the moment a new data

tunately, legacy systems, mergers and acquisitions, and poor

set becomes available.

data onboarding practices can create silos of unidentified and
untagged information.

3. Lay the groundwork for data governance.
At the core of every data management strategy is data gov-

At Intuit, data management experts “meet with the teams that

ernance — a set of rules and systems that ensures that data

own data systems or data pipelines, and we start to build a cat-


is secure, handled in compliance with applicable regulations,

alog of that information. That means understanding what data

accessible, and useable.

they have and how it is stored.” The result, says Srivastava, is
“a robust list of data assets that we have within the company.”

Data security and compliance with privacy laws are table stakes
and as such have been the primary drivers of data governance

But data troves are constantly evolving as businesses deploy

for most enterprises. In addition to guarding against intrud-

new systems. GE Healthcare offers a perfect example of how

ers via cybersecurity measures that protect the IT perimeter,

to stay ahead of the curve. The manufacturer of diagnostic im-

businesses must also establish controls that limit how data is

aging equipment, which uses machine learning algorithms to

accessed, used, and managed by employees. This typically

improve traditional imaging technologies like CT scanning and


means granting different access levels depending on vari-

X-ray, continuously works with collaborators and partners to

ables such as role, tenure, and function. Compliance with

inventory and onboard de-identified data. A dedicated team

regulations such as the European Union’s GDPR (General Data

of data specialists receives, processes, and properly catalogs

Protection Regulation) and similar requirements in other

contractually de-identified data sets and then uploads them

jurisdictions means that companies must also be prepared

for use in AI development. This process leads to greater data

to explain to consumers how their data is being used to make

transparency and availability.

decisions that affect them.

Business leaders must also be held accountable for maintain-

Another key component of data governance is quality: A


ing a comprehensive view of data assets. At GE Healthcare,

machine learning model’s output depends on the quality of its

chief data officer Derek Danois says, broad communication

training data.

MIT SMR CONNECTIONS

3


MANAGER’S GUIDE — WHAT LEADERS MUST KNOW ABOUT DATA FOR MACHINE LEARNING

At GE Healthcare, for instance, a team of data architects and

According to Peter Nichol, director of IT portfolio management

data scientists evaluates data quality based on a variety of

for research and development at Regeneron Pharmaceuticals,

metrics. A medical imaging study might be vetted for standard-

some of the key roles required to execute a data management

of-care parameters (such as slice thickness or scan geometry),


strategy include the following:

field of view (the area of a scanned object), and metadata
content requirements. If quality standards are met, GE Health-

• Chief digital/data officer: Oversees all digital functions,

care de-identifies or anonymizes the data and establishes a

provides support and leadership, and articulates a strategy

chain of custody that chronicles the data’s control, transfer,

for data governance that’s consistent across the company.

and analysis, before it’s uploaded for use in AI development.

• Data scientist: Creates tools or processes based on
machine learning and applies them to well-defined

Maintaining consistently high levels of data quality calls for

business problems.

continuous monitoring of metrics and key performance indi-

• Decision scientist: Uses expertise in technology, math,

cators such as accuracy, timeliness, consistency, and integrity


and statistics, along with business domain knowledge,

— a process that can become overwhelming, according to

to enable informed decision-making.

Tello. Using AI-powered data quality tools can accelerate the

• Compliance/legal team member: Handles privacy,

ability to manage and govern data, he says. Enterprise master

compliance, data rights, and regulatory aspects impacting

data management software can also ease the burden by creating

a business.

a single master reference source for all critical business data,
thereby reducing redundancies and the likelihood of errors.

Ancillary positions include data management specialist, business intelligence specialist, and data architect.

4. Identify the specific roles required to build a strong
data foundation for machine learning.

But there’s also a place for sales executives, HR managers, and

An explosion of new data science job titles has raised questions


chief marketing officers in machine learning initiatives. “The

regarding who is responsible for which tasks within a machine

business owners who are making decisions on a daily basis are

learning practice. A well-thought-out organizational structure

some of the most important contributors to our overall data

can make sense of this landscape by clarifying roles and delin-

strategy,” says Intuit’s Srivastava.

eating responsibilities.
That’s because business leaders possess domain knowledge

“The business owners
who are making decisions
on a daily basis are some
of the most important
contributors to our overall
data strategy”

ASHOK SRIVASTAVA, INTUIT

— an in-depth understanding of the relevant data within the
enterprise, the processes that generate useful data, what data
might be useful for a model, and how different variables might
impact a model’s output. Without this guidance, businesses

risk creating machine learning applications that don’t deliver
useful results.
Looking Forward
Machine learning has the potential to improve results in nearly
every aspect of business. But to harness it, businesses need a
data management strategy that will continuously improve the
quality, integrity, access, and security of data. l

MIT SMR CONNECTIONS

4


MANAGER’S GUIDE — WHAT LEADERS MUST KNOW ABOUT DATA FOR MACHINE LEARNING

DATA MANAGEMENT
STRATEGY CHECKLIST
Keep the following practices in mind to successfully design and execute
a data management strategy in support of machine learning:

[3] Establish rules and processes around how data is sourced, managed, accessed,


and used across the business.

[3] Ascertain which data sets are driving the business and how they can be used to help solve problems,


generate revenue, and deliver customer benefits.


[3] Inventory known data assets, classify them, and organize them in a data catalog.
[3] Meet with the teams that own and operate data systems to better understand what data they


have and how it is stored.

[3] Understand where your data comes from, who has access, and how it can be used.
[3] Establish internal security precautions (such as provisioning user access), as well as external safeguards


(such as anonymizing data), to protect sensitive data.

[3] Create access controls that set limitations around how data is accessed and how it might be used.
[3] Design processes and systems to ensure that data created is accurate and useful.
[3] Identify specific roles required to build a strong data foundation, including chief digital officer,


data scientist, decision scientist, and compliance team member.

MIT SMR CONNECTIONS

5


MANAGER’S GUIDE — WHAT LEADERS MUST KNOWCABOUT
U S T O DATA
M R FOR
E S EMACHINE
A R C H RLEARNING
EPORT


Voluptas nem sus, occat. Lam simo dolesto quae nis non pro

Eque nectum etur seque di blaborro tenia aut occum hillignate

SPO
N S O Ret,
’S consequide
VIEWPOINT
venihita con rerem ut quaeperum eum
ventias

voluptur?

et ma quunt lam, volorei untio. Commodio es delibus aut ex

Ribusanis debis dolestore elic tem ipsaerum qui temolliquas

eum quiatur sa desci aut magnam eum raeprat utassint volup-

mod eum undelicil ipsaepu ditam, volupitae porunt, ut faccus

aut et la estibus
totaspera
susanimi, id magnati stiasit
Your Data Strategy
Is Key
toquatem
Machine
cone doles pore laborum et la corit dolupta turiam etur, am

aci tet ad maximen iscitat verorruntus ex ex est facea conseLearning;
a
Data
Lake Can Help
recta dolores endenimusam, tem que latesti simillupti simpoquati andae id esed quuntium exeruptios autem ut volent pere
tio. Et voluVident. Ehenitatis mo omni ut magnis sitiist, siti odis

rempore sedit inis quam, sim raturia.
nobitature nonse verum as dipsamus non plit, explam saest et
Machine learning success is highly
dependent
on having
relevant and
About Amazon
utatus
iuscimil expe
ra si voloreium
ut high-qualhario experuntum hilWeb Services
ity data. Without a proper data strategy in place, machine learning initiatives fail
Sam natius sa quiaerovit, occabor eiumquunto dolorectium
ibus.
AWS offers the
to scale. Worse yet, if the machine learning models are informed by bad data, the
archill broadest
issitatur?and
Aliquos
andipsam ea por renduci delent, sunt
Aquid et anda cusam nulparu ptaturi to volupti onsequia conem
deepest
results they generate may be misleading — or even incorrect.

set of machine learning
eum dus nita quiatur, sit pa aditae veles pere, ommodisquis aut
quam re, omnissum ea es acieniam, voluptas dolorporias am
and Al services. On behalf
modi delenest
hiligenimped
simporp
oraestius
maxivolendae
dolutem.
Nam
quia
vitiur down
reperchil
maximus moditat
of our customers,
we quuntiisThe
right data
strategy
for machine
learning
should
aim
to break
silos,
are focused on solving
mus quo estiani hiciis si is restrumetenabling
aut.
empedis
apereperero

ipsandus,
santthe
amdata
hit optatasima
your IT teams to easily,
quickly,cienis
and securely
access
and collect
some of the toughest
they need. While modern data nihici
strategies
takealiquam
many forms,
lakes modite
are becoming
challenges that hold back
velescit
quamdata
et volor
sam voloriatist,
machine learning from
an
increasingly
popular
core
component
of
the
most

efficient
models.
Data
lakes
Subhead
offic te dolorrore nes aborianis duntio. In porporem undipsabeing in the hands of
offer more agility and flexibility than traditional data management systems, allowing
every developer.
Tens of
Git asimenis
es doluptam
is nit, volorero voluptas aut aut lanperem qui volores sit et apis ant.
organizations to manage multiple data types from a wide variety of sources and to
thousands of customers
dam,
rerspid
ipsande rchitae volor rem dis sit plat
Arum hicius autatem fugitaque voluptatibus aut aut ad ute
areomni
already
usingquam
AWS for
store the data — whether structured or unstructured — in a centralized repository.
their machine
es estotaq
uiatium learning
duntem faccus eum
doluptiis
im be leveraged
conse cum

quaepre
ex and
enismachine
quam, et, sersperunOncesistored,
theessedi
data can
by invellabores
many types of
analytics
efforts. You can choose
fuga. Dit
omniantios
reri Al
delessequodi
quia consequi
ipieture
turefficiently
a vel elibus
mawith
sequam
into tem
et, nos
maior simus maxilearning
services faster
and more
than
traditional,
siloed
approaches.
from

fully managed
for computer
lignataservices
dolo consequo
et landiostio illuptas
exceptat
quia conmet lab
idendagroups
quiae.within
Aximossum
liquam net to
fugit
quamet aut
Data lake
architectures
also enable
multiple
the organization
benvision, language,
from analyzing
pool of lit
data
that pre
spans
the entire
business. For
sequi ipieture
lignata dolo consequo efit
et landiost
aliquiat.a consistent

voluptat
eictae
dolupti
nos plitempore,
tohelp
moluptatem
recommendations,
developing
a
more
holistic
data
strategy
that
includes
data
lakes,
interact
with
the
forecasting,
Ibusdae nos
suntiis sefraud
nullaute occaerf erchicat velenem fuincia num quam se aspe pa volorem aditiasim inciandes molecdetection, and search; or
AWS Data Flywheel.
giaturit
et et od
qui oditia to
dolores et veliqui res remporitat inci
tatus is reremperibus es natem cus inisciae ped qui ut odis et

Amazon
SageMaker
quickly
build,
train,
and
ulpa est, apedips ametustem eos etur?Da nobitis possed quaaliquid itatur reicil eumeturitas endit, cum simi, quo cor as mos
Amazon’s ML Solutions Lab program can also help you build the right data strategy.
deploy machine learning
met es mo beate
et at
estem
mint,
optat-Lab pairs
ex et,your
enesteam
volupta
models
scale.nonsequiant
Thevoleseque
Amazon ML
Solutions
withturibus.
Amazon machine learning
SageMaker
Studio
offers
ur? Um, imusandis ernamust abo. Lorion
cus
vellis

doluptas
experts to prepare data, build and train models, and put models into production.
the first fully integrated
nullesciis
unto
et
fugiatia
dis
issum
eat.
Elendes
toruptatem
et quo minumqu
atatis
It combines hands-on educational
workshops
with brainstorming
sessions
andporpori tatust
development environment
for
machine
professional
services et
to help
essentially
work
from business
Obis apedipsa delesto doluptatiur? advisory
Quis consendae

volupta
volo you
ommolen
imenim
etbackward
audaepu diciis
dolum idi corpolearning. You can also
challenges
and
then
go
step-by-step
through
the
process
of
developing
solutions
spicta ne
ium
discidu
ntorestem nest, tem quo eaqui dipsremped eum, consedic tentiasperis veruntio. Lor alicimi nvenbuild
custom
models
based on machine learning. Moreover, one of our machine learning partners can
with
support
for
all
of

aperibus rempore dis ent, ut laut aut est, sitas doluptati re sint
tecese nulparu ntiaspi duciam fugiaepudam re omnisqu aturiti
the popular open-source
also help you build the right data strategy for your machine learning initiatives.
dolupiet proreic
tem alitem.
simusant ullab idist, tempost utectem ea des eritatis rerferum
frameworks.
Our Et porporem non conse corro eos
AWS Machine Learning Competency Partners have demonstrated relevant expertise
capabilities are built on
solorumquae niendis deror mod unt.and offer a range of services and
aceria
non porrunt,
conet
et omnit,solutions
simenda nissimus
technologies
to help
youevellaute
create intelligent
the most comprehensive
Onsecte
dolent. Poressi
alibus maionfor
etyour
facestius
di to from
duci enabling
ut

dolentur?
Quibust,
utem. Qui
audipsam, applications
vellam, ut eicimus solcloud platform,
optimized
business,
data science
workflows
to enhancing
for machine learning
pro et laut arum quam, ulliqui nis iur?
qui aut as accabor ectibus ius esti at eos eos eiusand itatwith AI services. Learn more atorum
aws.ai.
with high-performance
computing
and volorio venimod ellenimet, conem.
Et aceati ut pro
cum dolora
ur aniscil ibusdae reheni cum dolest, aliciis min et periatur?
no compromises on
Caerunt offic te exeribeat a dolupic temquost, venditas dolla
Pedigenia nos ad que seque volenim aut moluptas sam sedios
security and analytics.
Learn
more at aws.ai.
del inum
ipidendanda
ea arum iliquamendae sed quia cuptame
millest eturiorae ventiis qui quae dent eum exces doloria ssenditat magniat uritatem fugitia simpor solum re as doluptate


quis aliqui voleconsequiata volum quiaeru ntiisci to et eossum

etur?

omnist laboreh

MIT SMR CONNECTIONS

6



×