Tải bản đầy đủ (.pdf) (203 trang)

Analysing the behaviour of neural networks

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (1.88 MB, 203 trang )

Analysing the Behaviour of Neural Networks

A Thesis by
Stephan Breutel
Dipl.-Inf.

In Partial Fulfillment
of the Requirements for the Degree
Doctor of Philosophy

Queensland University of Technology, Brisbane
Center for Information Technology Innovation
March 2004


 

Copyright c Stephan Breutel, MMIV. All rights reserved.
The author hereby grants permission to the Queensland University of Technology, Brisbane to
reproduce and distribute publicly paper and electronic copies of this thesis document in whole
or in part.


Keywords
Artificial Neural Network, Annotated Artificial Neural Network, Rule-Extraction, Validation of Neural Network, Polyhedra, Forward-propagation, Backward-propagation,
Refinement Process, Non-linear optimization, Polyhedral Computation, Polyhedral
Projection Techniques



Analysing the Behaviour of Neural Networks


by
Stephan Breutel
Abstract
A new method is developed to determine a set of informative and refined interface assertions satisfied by functions that are represented by feed-forward neural networks.
Neural networks have often been criticized for their low degree of comprehensibility.
It is difficult to have confidence in software components if they have no clear and valid
interface description. Precise and understandable interface assertions for a neural network based software component are required for safety critical applications and for the
integration into larger software systems.
The interface assertions we are considering are of the form “if the input ✁ of the neural
network is in a region ✂ of the input space then the output ✄✆☎✝✁✟✞ of the neural network will be in the region ✠ of the output space” and vice versa. We are interested
in computing refined interface assertions, which can be viewed as the computation of
the strongest pre- and postconditions a feed-forward neural network fulfills. Unions of
polyhedra (polyhedra are the generalization of convex polygons in higher dimensional
spaces) are well suited for describing arbitrary regions of higher dimensional vector
spaces. Additionally, polyhedra are closed under affine transformations.
Given a feed-forward neural network, our method produces an annotated neural network, where each layer is annotated with a set of valid linear inequality predicates.
The main challenges for the computation of these assertions is to compute the solution of a non-linear optimization problem and the projection of a polyhedron onto a
lower-dimensional subspace.



Contents
List of Figures

vi

List of Tables

vii


List of Listings

ix

1 Introduction

1

1.1

Motivation and Significance . . . . . . . . . . . . . . . . . . . . . .

1

1.2

Notations and Definitions . . . . . . . . . . . . . . . . . . . . . . . .

4

1.3

Software Verification and Neural Network Validation . . . . . . . . .

6

1.4

Annotated Artifical Neural Networks . . . . . . . . . . . . . . . . . .


9

1.5

Highlights and Organization of this Dissertation . . . . . . . . . . . .

13

1.6

Summary of this Chapter . . . . . . . . . . . . . . . . . . . . . . . .

15

2 Analysis of Neural Networks

17

2.1

Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . .

17

2.2

Validation of Neural Network Components

. . . . . . . . . . . . . .


22

2.2.1

Propositional Rule Extraction . . . . . . . . . . . . . . . . .

28

2.2.2

Fuzzy Rule Extraction . . . . . . . . . . . . . . . . . . . . .

35

2.2.3

Region-based Analysis . . . . . . . . . . . . . . . . . . . . .

44

2.3

Overview of Discussed Neural Network Validation Techniques and
Validity Polyhedral Analysis . . . . . . . . . . . . . . . . . . . . . .

62

3 Polyhedra and Deformations of Polyhedral Facets under Sigmoidal Transformations

65

i


ii

CONTENTS
3.1

Polyhedra and their Representation . . . . . . . . . . . . . . . . . . .

65

3.2

Operations on Polyhedra and Important Properties . . . . . . . . . . .

69

3.3

Deformations of Polyhedral Facets under Sigmoidal Transformations .

70

3.4

Summary of this Chapter . . . . . . . . . . . . . . . . . . . . . . . .

82


4 Nonlinear Transformation Phase

83

4.1

Mathematical Analysis of Non-Axis-parallel Splits of a Polyhedron .

83

4.2

Mathematical Analysis of a Polyhedral Wrapping of a Region . . . .

86

4.2.1

Sequential Quadratic Programming . . . . . . . . . . . . . .

89

4.2.2

Maximum Slice Approach . . . . . . . . . . . . . . . . . . .

90

4.2.3


Branch and Bound Approach . . . . . . . . . . . . . . . . . .

95

4.2.4

Binary Search Approach . . . . . . . . . . . . . . . . . . . .

98

4.3

Complexity Analysis of the Branch and Bound and the Binary Search
Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

4.4

Summary of this Chapter . . . . . . . . . . . . . . . . . . . . . . . . 111

5 Affine Transformation Phase

113

5.1

Introduction to the Problem . . . . . . . . . . . . . . . . . . . . . . . 113

5.2

Backward Propagation Phase . . . . . . . . . . . . . . . . . . . . . . 114


5.3

Forward Propagation Phase . . . . . . . . . . . . . . . . . . . . . . . 117

5.4

Projection of a Polyhedron onto a Subspace . . . . . . . . . . . . . . 118
5.4.1

Fourier-Motzkin . . . . . . . . . . . . . . . . . . . . . . . . 120
5.4.1.1

A Variation of Fourier-Motzkin . . . . . . . . . . . 123

5.4.2

Block Elimination . . . . . . . . . . . . . . . . . . . . . . . 126

5.4.3

The -Box Approximation . . . . . . . . . . . . . . . . . . . 131

5.4.4



5.4.3.1

Projection of a face . . . . . . . . . . . . . . . . . 133


5.4.3.2

Determination of Facets of

5.4.3.3

Further Improvements of the S-Box method . . . . 137



. . . . . . . . . . . . 134

Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . 138

5.5

Further Considerations about the Approximation of the Image . . . . 140

5.6

Summary of this Chapter . . . . . . . . . . . . . . . . . . . . . . . . 140


CONTENTS

iii

6 Implementation Issues and Numerical Problems


143

6.1

The Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143

6.2

Numerical Problems . . . . . . . . . . . . . . . . . . . . . . . . . . 147

6.3

Summary of this Chapter . . . . . . . . . . . . . . . . . . . . . . . . 150

7 Evaluation of Validity Polyhedral Analysis

151

7.1

Overview and General Procedure . . . . . . . . . . . . . . . . . . . . 151

7.2

Circle Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . 153

7.3

Benchmark Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . . 156
7.3.1


Iris Neural Network . . . . . . . . . . . . . . . . . . . . . . 156

7.3.2

Pima Neural Network . . . . . . . . . . . . . . . . . . . . . 158

7.4

SP500 Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . 161

7.5

Summary of this Chapter . . . . . . . . . . . . . . . . . . . . . . . . 163

8 Conclusion and Future Work

165

8.1

Contributions of this Thesis . . . . . . . . . . . . . . . . . . . . . . . 165

8.2

Fine Tuning of VPA . . . . . . . . . . . . . . . . . . . . . . . . . . . 167

8.3

Future Directions and Validation Methods for Kernel Based Machines


168

A Overview of used symbols

171

B Linear Algebra Background

175

Bibliography

185



List of Figures
1.1

Annotated version of a neural network. . . . . . . . . . . . . . . . . .

10

2.1

Single neuron of a multilayer perceptron . . . . . . . . . . . . . . . .

19


2.2

Sigmoid and threshold activation functions and the graph of the function computed by a single neuron with a two-dimensional input. . . .

20

2.3

Two-layer feed-forward neural network . . . . . . . . . . . . . . . .

21

2.4

Overview of different validation methods for neural networks. . . . .

26

2.5

Example for the KT-Method. . . . . . . . . . . . . . . . . . . . . . .

30

2.6

Example for the ☞ -of-✌ method.

. . . . . . . . . . . . . . . . . . .


34

2.7

DIBA : recursively projection of hyperplanes onto each other . . . . .

51

2.8

DIBA: a decision region and the traversing of a line. . . . . . . . . . .

52

2.9

Annotation of a neural network with validity intervals. . . . . . . . .

56

2.10 Piece-wise linear approximation of the sigmoid function. . . . . . . .

61

3.1

Combination of two vectors in a two-dimensional space: from left to
right: linear, non-negative, affine and convex combination. . . . . . .

3.2


66

The back-propagation of a polyhedron through the transfer function
layer. Given the polyhedral description ✍✏✎✒✑✔✓✖✕ ✗✘✓✚✙✜✛✣✢ in the output
space of a transfer function layer the reciprocal image of this polyhedron under the non-linear transfer function is given by ✤✥✎✧✦✩★✫✪✬☎✝✍✏✞✣✎
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .

71

3.3

Approximation of the non-linear region. . . . . . . . . . . . . . . . .

73

3.4

Subdivision into cells. . . . . . . . . . . . . . . . . . . . . . . . . . .

74

3.5

Example for subdivision into cells. . . . . . . . . . . . . . . . . . . .

74

3.6


Point sampling approach. . . . . . . . . . . . . . . . . . . . . . . . .

75

✑✔✁✩✕ ✗✭✦✣☎✝✁✟✞✮✙✜✛✣✢

v


vi

LIST OF FIGURES
3.7

Eigenvalue and eigenvector analysis of the facet manifold. . . . . . .

79

3.8

Example for convex curvature. . . . . . . . . . . . . . . . . . . . . .

80

3.9

Example for concave curvature.

81


4.1

Non axis-parallel split of a polyhedron.

. . . . . . . . . . . . . . . .

84

4.2

Polyhedral wrapping of the non-linear region ✤ . . . . . . . . . . . . .

87

4.3

Application of branch and bound.

95

4.4

Two dimensional example for the binary search method. . . . . . . . 103

5.1

The forward and backward-propagation of a polyhedron through the

. . . . . . . . . . . . . . . . . . . .


. . . . . . . . . . . . . . . . . . .

weight layer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
5.2

The projection of a polyhedron onto a two-dimensional subspace. . . 121

5.3

Relevant hinge. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

5.4

Possible . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135

5.5

The projected polyhedron

6.1

Overview of the framework. . . . . . . . . . . . . . . . . . . . . . . 144

7.1

Visualisation for the behaviour of the circle neural network.

7.2

Projection onto the subspace ✯✱✰✚✲✴✳✘✵✷✶✆✸✺✹ ✳✼✻✾✽ ✿❁❀❃❂❅❄✼✰❇❆❉❈ .


. . . . . . . . 158

7.3

Projection onto the subspace ✯✱✰✚✲✴✳✘✵✷✶✆✸✺✹ ✳✼✻✾✽❋❊❍●✔❂❅❄✼✰❇❆❉❈ .

. . . . . . . . 158

7.4

The computed output regions for the Pima Neural Network. . . . . . . 160

7.5

The computed input region for the SP500 Neural Network. . . . . . . 163



is contained in the approximation. . . . . 139

. . . . . 156


List of Tables
2.1

Overview of neural network validation techniques . . . . . . . . . . .

63


3.1

Concave and convex curvature in the neighborhood of a point. . . . .

75

4.1

Comparison of branch and bound and binary search. . . . . . . . . . . 109

5.1

Computation times for the projection of a polyhedron onto a a lowerdimensional subspace. . . . . . . . . . . . . . . . . . . . . . . . . . 139

vii



List of Listings
5.1

projExample . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138

6.1

net-struct . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144

6.2


mainLoop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145

6.3

forwardStep . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146

6.4

mainVIA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147

6.5

mainVPA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147

6.6

numExample . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148

ix


Statement of Original Authorship
The work contained in this thesis has not been previously submitted for a degree or
diploma at any other higher education institution. To the best of my knowledge and
belief, the thesis contains no material previously published or written by another person except where due reference is made.

Stephan Breutel
March 2004



Acknowledgments
I would like to thank my principal supervisor Fr´ed´eric Maire, without whose unbounded energy, idealism, motivation and immense enthusiasm this research would
not be possible. It was an honor to receive wisdom and guidance from all my supervisory team. A special thanks also to my associate supervisor Ross Hayward who
provided excellent support whenever I needed it.
Apart from my supervisors, I would like to thank my other panel members of my oral
defense, Joaquin Sitte and Arthur ter Hofstede for their valuable comments about this
thesis.
I would also like to thank all of my colleagues in my research center for their help
and encouragement during my time here, and in particular all the people of the Smart
Devices Laboratory. It would take too many pages to list them all.
I was fortunate to have many friends with whom I enjoyed fantastic climbing sessions
at Kangaroo Point and had many great times at the Press Club whilst listing to superb
Jazz performances, all of which helped to keep me relatively sane during my time at
QUT.
I would also like to thank the Coffee Coffee Coffee crew for keeping me awake by
serving me literally over a thousand coffees (to be precise 1084).
Finally, I really would like to thank all my friends and my family back home in Bavaria
for supporting my endevour. A special thanks to my parents for their moral and financial support.
This work was supported in part by an IPRS Scholarship and a QUT Faculty of Information Technology Scholarship.



Chapter 1

Introduction
The first section provides examples of neural networks in industrial applications that
demonstrate the need for neural network validation methods.
In section 1.2 notation conventions are explained. Required terminology, concepts of
software verification and neural network validation are introduced in section 1.3.
In section 1.4 annotated artifical neural networks are defined and the central ideas of

the novel Validity Polyhedral Analysis (VPA) is explained a nutshell. The organization
of this thesis is outlined in section 1.5.

1.1 Motivation and Significance
A conclusion of the report “Industrial use of safety-related artifical neural networks”
by Lisboa [Lis01] is that one of the keys to a successful transfer of neural networks
to marketplace is the integration with other systems (e.g. standard software systems,
fuzzy systems, rule-based systems). This requires an analysis of the behaviour of
neural network based components. Examples for products using neural networks in
safety-critical areas are [Lis01]:


Explosive detection, 1987. SNOOPE from SAIC is an explosive detector. It was
motivated by the need to detect the plastic explosive Semtex. The detector irradiated suitcases with low energy neutrons and collected an emission gamma-ray
spectrum. A standard feed-forward neural network was used to classify between: bulk explosive, sheet explosive and no explosive. However, there were
1


2

Chapter 1. Introduction
several practical problems. For example the 4% false positive of the MLP resulted in a large number of items to be checked. This is not practical, especially
for highly frequented airports such as Heathrow and Los Angeles airport, where
the system was tested.
Financial risk management. PRISM by Nestor relied on a recursive adaptive


model. HNC’s Falcon is based on a regularised Multi layer perceptron (MLP).
Both systems are still market leader for credit card fraud detection [LVE00].
Siemens applied neural networks for the control of steel rolling mines. A pro■


totype neural network based model was used for strip temperature and rolling
force at the hot strip mill of Hoesch, in Dortmund, in 1993. Later Siemens
applied this technology at 40 rolling mills world-wide. Siemens experience indicates that neural networks always complement, and never replace, physical
models. Additionally, the domain expertise is essential in the validation process.
A third observation was that data requirements are severe.
NASA and Boeing are testing a neural network-based damage recovery control


system for military and commercial aircraft. This system has the aim to add a
significant margin of safety fly-by-wire control, when the aircraft sustains major
equipment or system failure, ranging from the inability to use flaps to encountering extreme icing.
Vibration analysis monitoring in jet engines is a joined research project by Rolls■

Royce and the Department of Engineering at Oxford University. The diagnostic system QUINCE combines the outputs from neural networks with template
matching, statistical processing and signal processing methods. The software is
designed for the pass-off test of jet engines. It includes a tracking facility of the
most likely fault. Another project is a real-time in-flight monitoring system of
the Trent 900 Rolls-Royce engine. The project combines different techniques,
like for example Kalman filters with signal processing methods and neural net-



works.
In an European collaborative project involving leading car manufactures, differ-


1.1 Motivation and Significance
ent control systems ranging from engine management models to physical speed
control have been implemented. These control systems combined engineering

expertise with non-linear interpolation by neural network architectures. It included rule-based systems, fuzzy systems and neural networks.
Siemens produces the FP-11 intelligent fire detector. This detector was trained


from fire tests carried out over many years. According to [Lis01] this fire detector triggered one-thirtieth false alarms of conventional detectors. The detector
is based on a digital implementation of fuzzy logic, with rules discovered by a
neural network but validated by human experts.
These examples show that neural networks need to be integrated with other systems,
that it is relevant to extract rules learnt by neural networks (for example for the fire
detector FP-11 or for the credit card fraud detection) and that it is important to provide
valid statements of the neural network behaviour in safety-critical applications.
Therefore, it is necessary to describe the neural network behaviour, e.g. in form of
valid and refined rules. Additionally, it is interesting to obtain explanations for the
neural network behaviour.
However, our main motivation is to compute valid statements about the neural network
behaviour and as such help to prevent software faults. Software errors can cause a lot of
problems and risks, especially in safety-critical environments. Several software errors
and their consequences are collected in [Huc99]. The explosion of the Ariane 5 rocket
and the loss of Mars climate orbiter [Huc99] are recent examples of consequences of
software errors.
To summarize: it is significant to describe the behaviour of neural network components
to:
overcome the low degree of comprehensibility of (trained) neural networks,


validate neural networks,


integrate neural network components in (large) software environments,





apply neural network components even in safety-critical applications,

3


4

Chapter 1. Introduction


detect interesting/new knowledge of statistical data, which was not previously
obvious,


prevent un-learning. Given a polyhedral interface description of a neural network component, one can define a set of invariants, i.e. adapting the neural
network to new environments must not violate these invariants,


control the generalization of a trained neural network. Generalization expresses
the ability of the neural network to produce correct output for previously unseen
input data. A description of the neural network behaviour will provide a better
insight into the neural network generalization capability,


visualize corresponding regions in the input and the output space of a neural
network.


1.2 Notations and Definitions
The following notation conventions are inspired from the book by Fritzke [Fri98] and
most of the conventions followed the Matlab [Mat00c] notation.




Sets are denoted by calligraphic upper case letters (e.g.


).

Matrices are denoted by single upper case bold letters (e.g.


). To denote a

column or row vector or any arbitrary element within the matrix, we use the
Matlab notation [Mat00c], e.g.

is the ◆ -th column vector of matrix


.

extracts the ◗ -th and the ❚ -th row vector.

✗❏☎❖P❘◗❙▼❯❚❲❱❳▼❨❑❋✞




✗❏☎▲❑❅▼❖◆▲✞

Vectors are denoted by single lower case bold letters (e.g. ❩ ). By default a vector
is a column vector.

❩❭❬

represents the transpose of ❩ . To represent an element

within the vector we use Matlab notation, e.g.

❩✩☎✝◆▲✞

denotes the ◆ -th element of



the vector.
Let ❪ and


denote index sets. Then we denote with

the rows ❪ and columns


of the matrix



.

✗❏☎❴❪❵▼❛❫✭✞

the selection of


1.2 Notations and Definitions



5

A scalar variable is denoted with a simple lower case letter (e.g. ✌ ).



Let ✗

and ❜

be two matrices with the same number of columns. With

denote the vertical concatenation of two matrices. Similarly,

P❘✗

❜✏❱

❝❞ ❢❡




we

denotes

the horizontal concatenation of two matrices with the same number of rows.
Additionally, we use the convention, that definitions and expressions, which are
explained in the glossary 1 , will be written in emphasized style, when used for the first
time. Also function names are written in emphasized style.
An overview of all symbols and special operators is provided in appendix A. However, the following table contains symbols which are already relevant for this chapter
and for the literature review of neural network analysis techniques in Chapter 2.



...

polyhedron, i.e. the intersection of a finite number of half-spaces.



...

arbitrary region.

...

a box is an axis parallel hyper-rectangle. We also use the expression


...

hypercube.

...

activation vector of the output neurons

...

net input vector (input vector for the transfer function layer)

...

activation vector of input neurons

...

weight matrix

...

single bias value

...

bias vector

...


sigmoid transfer function




❣✟❤❥✐







The next table defines a special function symbol, one operator and the interval notation.




...


P ♣q▼sr❯❱

box function,



☎✝✤♥✞

denotes the smallest axis-parallel hypercube


containing a region ✤ .
...

boolean operator to compare if two expressions are equivalent

...

denotes the interval

✑✔t✆✕ ♣✉✙✈t✱✙✜r❃✢

To refer to sigmoidal functions we often use the Matlab terms logsig and tansig.
1

A glossary is not provided, yet, but it will be included in the final version. We apologise for any

inconveniences.


6

Chapter 1. Introduction

1.3 Software Verification and Neural Network Validation
We will divide software components, depending on the task and its implementation,
into two classes, namely into trainable software components and into non-trainable
software components.

Definition 1.1 Trainable Software Components

Software components for classification and non-linear regression, where the task is implicitly specified via a set of examples, and a set of parameters is optimized according



to these examples are called trainable software components.

Typically, we use trainable software components where it is not easy or not possible to define a clear algorithm. For example, in tasks like speech recognition, image
recognition or robotic control, often statistical learners, like neural networks or support
vector machines are applied.

Definition 1.2 Non-trainable Software Components
Software components, where the task is precisely specified, an algorithm can be defined and the task is implemented with a programming language are called non-trainable



software components.
We also refer to non-trainable software components as standard software.

Software Verification and Validation of Neural Network Components
Standard software program verification methods take the source code as input and
prove the correctness against the (formal) specification. Among others, important concepts of software verification are pre- and postconditions.

Definition 1.3 Precondition and Postcondition
Given a software component



and two boolean expressions

output data, the statement


✑✺①④✢✺✡✩✑❃③④✢

①②▼❯③

about the input and


1.3 Software Verification and Neural Network Validation


output status ✑❃③④✢ is true after the termination of ✡ . The assertions ✑✺①④✢ and ✑❃③④✢
indicates, that for every input status which fulfills

✑✺①④✢

7

before the execution of

also named precondition and postcondition.

the
are



We can view pre- and postconditions as specifications of the component properties.
However, using pre- and postconditions does not assure that the software component




fulfills these specifications. A formal proof is required.

In the context of artifical neural networks we talk about validation techniques. In the
following we discuss the central ideas of neural network validation techniques in a nutshell. The validation approaches for neural networks are propositional rule extraction,
fuzzy rule extraction and region based analysis. Propositional rule extraction methods
take a (trained) neural network component as input, extract symbolic rules and test
them against the network behaviour itself and against the test data. These methods are
helpful to test neural networks.
Fuzzy rule extraction methods try to extract a set of fuzzy rules, which mimics the
behaviour of neural networks. The advantage of fuzzy rule extraction, compared to
propositional rule extraction, is that, generally, less rules are needed to explain the
neural network behaviour. In addition, with the use of linguistic expressions easily
understandable characterizations about the neural network behaviour are obtained.
Region based analysis methods take a (trained) neural network as input and compute
related regions in the input and output space. Region based analysis techniques differ
from the above methods because they have a geometric origin, are usable for a broader
range of neural networks and have the ability to compute more accurate interface descriptions of a neural network component. These methods compute a region mapping

✤♥⑤ then output is in region ✤♥⑥ , where ✤✘⑤ describes
a set of points in the input space and ✤ ⑥ a set of points in the output space. For some

of the form: if the input is in region

methods, these region based rules agree exactly with the behaviour of the neural network. The more refined those regions are, the more information we obtain about the
neural network. Validity Interval Analysis (VIA), developed by Thrun [Thr93], for
example, is able to find provably correct axis-parallel rules, i.e. rules of the form: “if
the input




is in the hypercube

✠✣⑦

then the output



is in the hypercube

✠✩⑧ ”. Hence,

region based analysis approaches are suitable to validate the behaviour of neural net-


×