MATLAB Applications in Bioinformatics
Developing and Deploying Bioinformatics
Applications with MATLAB
MATLAB for Bioinformatics
Kristen Amuzzini
Biotech, Pharmaceutical, & Medical Industry
The MathWorks, Inc.
Presentation Layout
MATLAB applications in Bioinformatics
Customer success stories
MATLAB & The Bioinformatics Toolbox
Sequence analysis
Microarray analysis
Integrating MATLAB with other tools
MATLAB as computational engine for Excel
Questions/Answers & Wrap-up
Bioinformatics Applications
•
Sequence analysis
•
Base calling algorithm design, sequence alignment,
sequence building algorithms
•
Microarray analysis
•
Image processing, QA/QC, data normalization, data analysis
•
Proteomics
•
Mass Spectrometry signal processing, protein marker
identification and classification, peptide sequence
identification, 2D-Gel image analysis
•
Systems Biology
•
Interaction network identification, simulation of metabolic
pathways, flux analysis
Bioinformatics teams supporting multiple
constituencies with multiple tools.
•
C/C++, Java, Perl
•
VB, Excel Macros
•
SQL
•
GUI Based tools
•
Freeware
•
SPLUS, R, SAS, Mathematica
•
Web based tools
!!"
#$%
$$
&$"'($
)#$*+%
$!
,-.
)//0'
+&1&
(%(
Using MATLAB, bioinformatics teams can support
multiple constituencies.
&1&23
%
!!"
#$%
$$
&$"'($
)#$*+%
$!
,-.
)//0'
+&1&
(%(
&1&
(%(
&((
)$("#$
$"!%&(("
%$4#&1&
$
“Having one integrated package
is a big advantage. Using MATLAB and the
MATLAB Compiler reduced my development
time by a factor of 4 or 5.”
“MATLAB has always been ideal as an
algorithm prototyping tool,” Labrenz
concludes, “but the MATLAB Compiler and
C/C++ Math and Graphics Libraries add a
whole new dimension, allowing rapid
delivery of sophisticated solutions.”
0$1!5&(("%$
User example: Genetic Sequence Base Calling
User example: Breast Cancer Prognosis
($%"'("
!"$
!(3(!"
6((7($%
#$
“Since MATLAB and the Image Processing Toolbox
are fully integrated and the MATLAB platform is
very good for matrix calculation, we did not have
to spend time writing the low level image
processing and the basic data analysis routines
like vector and matrix calculations”
“Our research scientists are happy with the quick
feedback,” Dr. Dai says. “Using MathWorks tools,
we can respond to their requests very fast, and
it’s easy for the scientists to use these tools.
Using the GUIs that we develop in MATLAB, they
can access functions without having to remember
the underlying code.”
89%#8
($ :)$(%
Academic users
•
Bioinformatics Teaching
•
MIT, Stanford, Cornell, Carnegie Mellon, …
•
Research
•
Sequencing
•
Base calling algorithm design
•
Sequence analysis
•
Computational biolinguistics
•
Microarray analysis
•
Statistical modeling of microarrays
•
Proteomics
•
Statistical modeling of protein-protein interaction
•
Systems Biology
•
Flux Analysis
More than 600 textbooks for education and professional use, in 19
languages
–
Biosciences
–
Controls
–
Signal Processing
–
Image Processing
–
Mechanical Engineering
–
Mathematics
–
Natural Sciences
–
Environmental Sciences
Thousands of universities teach students using
MathWorks products.
Industry Issues & Solutions
•
Integrating tools from various
programming languages is
difficult, closed source tools are
not customizable, and freeware
is often not supported.
•
There is no standard biological
data format.
•
Applications must be easily
deployable within organizations.
•
MATLAB is a supported, open
architecture, user-friendly
environment for data analysis across
applications, algorithm development,
and deployment.
•
MATLAB and the Bioinformatics
Toolbox provides file format support
for common data sources (web-
based, sequences, microarray, etc.).
•
MATLAB’s deployment tools and
user-interface design environment
allow easy deployment of MATLAB
based applications.
The Bioinformatics Toolbox
Robert Henson
The MathWorks, Inc.
Developing and Deploying Bioinformatics
Applications with MATLAB
MATLAB & The Bioinformatics Toolbox
The MathWorks Product Family
Code Generation
Blocksets
Integrated for:
technical computing, data analysis and visualization
system modeling and simulation
implementation of real-time embedded software
PC-based real-time
systems
,;-,;-
Stateflow
Toolboxes
DAQ cards
Instruments
Databases and files
Financial Datafeeds
Desktop Applications
Automated Reports
•
File I/O
•
FASTA, PDB, SCF, GPR, GAL
•
Web Connectivity
•
GenBank, EMBL, PIR, PDB
•
Sequence Analysis & Alignment
•
Needleman-Wunsch, Smith-Waterman
•
DNA/RNA/AA conversions, pattern searching
• Microarray Normalization & Visualization
• Lowess, global mean, MAD (median absolute deviation)
•
Protein Visualization
•
Atomic composition, molecular weight, hydrophobicity profile
Bioinformatics Toolbox 1.0
<<=.,>>.1?2,=@A9=&B8A?.A.=&12
<<<<<CC<C<<DCCC<<DCDCD<<DD<CD<CCCDCC
<CDCDD<CDD
<<=,==.1&A92&=,.*,.=,.B8A.A&.>&?=2AB
Command
History
MATLAB Desktop Tools
Launchpad:
Start other tools and
demos
Workspace
Browser:
See your data
Command Window
Sequence Alignment Tutorial Example
•
Get human and mouse genes from GenBank
•
Look for open reading frames (ORFs)
•
Convert DNA sequences to amino acid sequences
•
Create a dotplot of the two sequences
•
Perform global alignment
•
Perform local alignment
Microarray Data Analysis Tutorial Example
•
Plot expression profiles for genes
•
Filter genes based on information content of profile
•
Perform hierarchical clustering
•
Perform K-means clustering
•
Perform Principal Component Analysis
Reference:
DeRisi, JL, Iyer, VR, Brown, PO. "Exploring the metabolic and genetic control of gene expression on a genomic scale." Science. 1997 Oct 24;278(5338):680-6.
Integrating and Deploying Bioinformatics Tools with
MATLAB
Robert Henson
The MathWorks, Inc.
Developing and Deploying Bioinformatics
Applications with MATLAB
E
Integrating and Deploying
Bioinformatics Tools with MATLAB
Connecting to MATLAB
Web
Web
In
st
ru
m
en
t
C
on
t
ro
l
In
st
ru
m
en
t
C
on
t
ro
l
Da
ta
A
c
q
ui
sit
io
n
Da
ta
A
c
q
ui
sit
io
n
Im
a
ge
A
c
qu
is
it
io
n
Im
a
ge
A
c
qu
is
it
io
n
Excel / COM
File I/O
Database
Database
Toolbox
Toolbox
C/C++
Java
Perl
C/C++
C/C++
Web
Web
St
an
d
-a
lo
ne
St
an
d
-a
lo
ne
ExcelCOM
Deploying with MATLAB
Push Data into MATLAB
Data I/O
•
Import Excel ranges
into MATLAB
•
Export MATLAB data into
Excel ranges
•
Evaluate MATLAB Statements in
Excel
Computational Engine for Excel
Spread Sheet Applications
•
MATLAB Excel Link can be the
computational engine behind your
Excel applications
•
Fast scalable solution
MLPutMatrix("data",B2:H43)
MLPutMatrix("Genes",A2:A43)
MLPutMatrix("TimeSteps",B1:H1)
MLEvalString("clustergram(data,'RowLabels',…
Genes,'ColLabels',TimeSteps)")
Image ProcessingSignal Processing
Neural Networks Optimization
Statistics
What else could you do?
Bioinformatics
Integrating and Deploying Bioinformatics Tools with
MATLAB
Robert Henson
The MathWorks, Inc.
Developing and Deploying Bioinformatics
Applications with MATLAB
Summary
Industry Issues & Solutions
•
Integrating tools from various
programming languages is
difficult, closed source tools are
not customizable, and freeware
is often not supported.
•
There is no standard biological
data format.
•
Applications must be easily
deployable within organizations.
•
MATLAB is a supported, open
architecture, user-friendly
environment for data analysis across
applications, algorithm development,
and deployment.
•
MATLAB and the Bioinformatics
Toolbox provides file format support
for common data sources (web-
based, sequences, microarray, etc.).
•
MATLAB’s deployment tools and
user-interface design environment
allow easy deployment of MATLAB
based applications.
Further Information
•
Bioinformatics Toolbox Product page
–
Demos, technical literature, trial information
–
www.mathworks.com/products/bioinfo
•
MATLAB Central
–
File exchange and newsgroup access for MATLAB and Simulink
users
–
www.mathworks.com/matlabcentral
–
Access to comp.soft-sys.matlab
file exchange and newsgroup access for
the MATLAB & Simulink user community