VIETNAM NATIONAL UNIVERSITY, HANOI
UNIVERSITY OF ENGINEERING AND TECHNOLOGY
BUI PHI DIEP
AVOIDING STATE-SPACE EXPLOSION
IN MODEL-CHECKER
MASTER THESIS OF INFORMATION TECHNOLOGY
Hanoi - 2014
VIETNAM NATIONAL UNIVERSITY, HANOI
UNIVERSITY OF ENGINEERING AND TECHNOLOGY
BUI PHI DIEP
AVOIDING STATE-SPACE EXPLOSION
IN MODEL-CHECKER
Major: Computer science
Code: 60480101
MASTER THESIS OF INFORMATION TECHNOLOGY
SUPERVISOR: Assoc. Prof. Nguyen Viet Ha
Dr. Mohamed Faouzi Atig
Hanoi - 2014
Declaration of Authorship
I hereby declare that this submission is my own work and to the best of my knowledge it
contains no materials previously published or written by another person, or substantial
proportions of material which have been accepted for the award of any other degree
or diploma at University of Engineering and Technology (UET/Coltech) or any other
educational institution, except where due acknowledgement is made in the thesis. Any
contribution made to the research by others, with whom I have worked at UET/Coltech
or elsewhere, is explicitly acknowledged in the thesis. I also declare that the intellectual
content of this thesis is the product of my own work, except to the extent that assistance
from others in the project’s design and conception or in style, presentation and linguistic
expression is acknowledged.
Signed:
Date:
i
ABSTRACT
Model-checking is a well-known technique for the program verification problem (i.e.,
checking that the program satisfies a given property). However, Model-checking su↵ers
the state-space explosion problem. This is more visible in the the case of concurrent /
parallel programs. Therefore, developing new efficient techniques to address the statespace explosion problem (such as slicing) is a crucial and difficult challenge in Modelchecking.
In this thesis, we present a new slicing method for handling the standard state explosion problem. Our slicing method consists of three steps: (1) creating an abstraction
with respect to a subset of program variables, which leads to an over-approximation
of the input program, (2) reconstructing a program with another subset of variables
from a counterexample of abstracted program, and (3) refining the abstraction if the
counterexample is a spurious one. The process stops when either there are no more
counterexamples remaining or there exists a counterexample after investigating all variables. In the former case, the program is correct; in the latter case, the program contains
an error. We have implemented a prototype tool and run it successfully on standard
benchmarks, together with several challenging examples. The experimental results show
the efficiency of our method.
ii
Acknowledgements
First and foremost, I would like to express my deepest gratitude to my supervisor, Assoc.Prof. Nguyen Viet Ha, for his patient guidance and continuous support throughout
the years. I would like to give my honest appreciation to my co-supervisor Dr. Mohamed
Faouzi Atig and Prof. Parosh Aziz Abdulla for their great support. They always appear
when I need help, and respond to queries so helpfully and promptly. The encouragement from my family, my friends in UET-VNUH and Uppsala University, and my girl
friend, Diu Cap, is also very important for me. When reading this thesis, if you find any
mistakes, sending to me at is appreciated.
iii
Contents
Declaration of Authorship
i
Acknowledgements
iii
Contents
iv
List of Figures
vi
List of Tables
vii
1 Introduction
1.1 Motivation . . . . . . . . . .
1.2 Related work . . . . . . . . .
1.2.1 Program Slicing . . .
1.2.2 Predicate Abstraction
1.3 Thesis structure . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
1
4
4
5
7
2 Slicing Sequential Programs
2.1 Program Syntax . . . . . . . . . . . . . . . . .
2.2 Statements, Variables . . . . . . . . . . . . . .
2.3 Program Control-Flow Graph . . . . . . . . . .
2.4 Program Transition System . . . . . . . . . . .
2.5 Variable Slicing for Sequential Programs . . . .
2.5.0.1 Creating an initial abstraction
2.5.1 Reconstructing Counterexample . . . .
2.5.2 Refining the abstraction . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
8
8
9
10
11
13
15
15
19
.
.
.
.
.
21
22
22
22
23
23
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
3 Slicing Concurrent Programs
3.1 Program Syntax . . . . . . . . . . . . . .
3.2 Statements . . . . . . . . . . . . . . . . .
3.3 Program Control-Flow Graph . . . . . . .
3.4 Variable Slicing for Concurrent Programs
3.4.1 Reconstructing the counterexample
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
4 Experiment
27
4.1 Sequential Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.2 Concurrent Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
iv
Contents
v
5 Conclusions and Future Work
33
Bibliography
34
List of Figures
1.1
1.2
1.3
Structure of SPIN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Our method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
CEGAR framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1
2.2
2.3
An example of program and its control flow graph . . . .
An example of program abstraction, counterexample and
program . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Refined abstraction . . . . . . . . . . . . . . . . . . . . . .
3.1
3.2
3.3
CFG of concurrent program and their abstraction . . . . . . . . . . . . . . 21
A concurrent counterexample . . . . . . . . . . . . . . . . . . . . . . . . . 24
Simulating the concurrent counterexample . . . . . . . . . . . . . . . . . . 25
4.1
A program with its initial abstraction . . . . . . . . . . . . . . . . . . . . 28
vi
3
4
6
. . . . . . . . . 11
reconstructed
. . . . . . . . . 18
. . . . . . . . . 20
List of Tables
2.1
2.2
The syntax of program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Conditions on state transitions hv1 , ⌦1 i !P hv2 , ⌦2 i for each vertex type. 12
4.1
Experimental results of verifying concurrent programs in comparison with
SPIN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
Column Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
Experimental results of verifying concurrent programs in comparison with
tools in SV-COMP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.2
4.3
vii
Chapter 1
Introduction
In this chapter, we describe the motivation of our work, and it is important. Initially, we
start with the necessaries of program verification and Model-Checking. Then we state
the state explosion problem, which are unavoidable by applying Model-Checking. Next,
we summarize our solutions and results. Finally, we describe related work and thesis’s
structure.
1.1
Motivation
Software is everywhere. The appearance of software is in diverse areas namely education,
healthcare system, transportation. Besides, the development of multicore architecture
makes software to be designed in many cores. It helps software run faster, but unfortunately software becomes larger and more complex. Software developers now need more
e↵ort to not only make software but also guarantee that it works correctly. Errors in
software are difficult to find and fix. Therefore, we need efficient techniques, for example
program verification and testing, to help us handle complex program errors.
Program verification is a technique that considers a program with given properties of
interest. It then concludes that whether the properties are satisfied in the program. The
considered properties of program can be valuations of variables at a program location,
or no out of memory errors. The result of program is either the properties are hold,
which means all executions of program do not violate the properties, or the properties
are not hold because of a specific execution. It is notable that program verification is
undecidable. There are no a program verification tool which can handle every type of
program with every type of properties. In practice, each program verification tool only
focuses on a small types of program with a restricted set of properties.
1
Chapter 1. Introduction
2
Program testing is another technique to find program errors. Program testing considers
a program with a pair of input and output, called test case. If the program executes with
the given input and returns the given output, the program is definitely correct in that
case. Otherwise, the program has an error related to the given input and output. There
are a number of limitations of program testing. Initially, program testing requires a large
number of test cases to cover all possible program execution. More importantly, program
testing cannot handle nondeterministic programs, for example a program with many
threads and interleaving between threads. Such bugs like Heisenbugs [1] are difficult to
figure out by program testing. So in the scope of this thesis, we only focus on using
program verification.
Model checking is a popular verification technique. The key idea of model checking is
that it represents the program by a model, i.e. Kripke structure [2], and verifies the
model instead of the original program. The work in [3] shows a number of advantages of
model checking in comparison with other verification techniques, i.e. automated theorem
proving, such as faster, providing counterexample - which is one of the biggest advantage
of model checking, and using temporal logics to describe properties of program.
SPIN [4] is an efficient and a famous model checker. SPIN provides an intuitive notation
for design specification, and a concise notation for correctness claims, and the consistency
between the notations. In particular, the design specifications are written in Promela
language [5], and correctness claims are written by Linear Temporal Logic [6]. The
structure of SPIN is shown in Figure 1.1. It takes input from the front end XSPIN.
The input is the program specification, including the program design and its correctness
claims, both are written in Promela language. If the specification has no syntax errors,
it then generates a verifier, optimizes and executes the verifier. A counterexample of
program is detected, it is then sent back to the simulation to to inspect the error in
detail.
However, SPIN faces the state explosion problem. Ideally, SPIN visits program states
and stores the valuation each state. The state spaces that SPIN travels may contain
billions of reachable states. Therefore, SPIN requires much time when verifying large
programs, especially concurrent programs.
One of efficient method to handle the state explosion program is slicing [7]. There are
a number of slicing methods that have been developed, i.e. static slicing [7], dynamic
slicing [8] and conditional slicing [9]. In general, instead of considering the whole program, slicing focus on a number of statements in the program with respect to the slicing
criterion. The slicing criterion often is a pair of a program location and a subset of
program variables. Slicing is proved to be useful in testing [10], debugging [8].
Chapter 1. Introduction
3
XSPIN
Front -End
Syntax Error
Reports
Promela Parser
LTL Parser
and Translator
Verifier
Generator
Simulation
Optimization
Counterexample
Executable
On-The-Fly
Verifier
Figure 1.1: Structure of SPIN
We now propose a new slicing method to handle some basic cases of the explosion
problem. The di↵erence of our method in comparison with other slicing methods that is
we do not concentrate on any specified subset of program variables. All variables in the
program are considered. For each step in our method, a subset of program variables is
selected. A new executable program, which consists of statements in the original with
respect to selected variables, is generated and verified.
The Figure 1.2 describes our method. Our method aims to check the program safety.
The process of our method is describes as follows.
• Our method takes a program as input. It then creates an initial abstraction of
program with respect to a subset of program variables. The abstraction is an
over-approximation of the original program.
• The abstraction then is verified by a model checker. If the program is safe, the
process stops, the program is concluded to be safe. Otherwise, a counterexample
is generated. The counterexample execution trace shows why there exists a bug
in the program. The counterexample is used to reconstruct a simulation program.
In general, the simulation program is another abstraction of the original program,
which not only follows statements in counterexample execution trace but also contains statements with respect to a new subset of variables of the original program.
The model checker then verifies the new abstract program.
Chapter 1. Introduction
4
P
Create Initial
Abstraction
P1
Model Check
Cex
Refine
No counterexample
Reconstruct
Counterexample
P2
No counterexample
Correct
Cex, V ars(P2 ) 6= ;
Model Check
Cex, V ars(P2 ) = ;
Error
Figure 1.2: Our method
• If the reconstructed program is safe, that means the counterexample is spurious.
Our method then refines the initial abstraction and runs model checker again.
Otherwise, a counterexample is generated and our method reconstructs it again
with another subset of variables.
• Our method stops when either there are no more variables remaining in the program or no more counterexamples are detected. In the former case, the program
is error. In the latter case, the program is safe.
1.2
Related work
In this section we describe related work, including work about program slicing and
predicate abstraction.
1.2.1
Program Slicing
Program slicing is a static analysis technique proposed by Weiser [7]. The technique uses
control and data flow to find a subset of statements in the program which are suitable
for a particular criterion. That criterion defines how the given program is sliced is called
Chapter 1. Introduction
5
slicing criterion. The applications of program slicing are in several tasks, including
testing [10], debugging [8], maintenance [11], etc.
The slicing method proposed by Weiser is called static slicing. Slicing criterion of static
slicing is a tuple (l, V ) where l is a program location and V is a subset of program
variables. A slice is any subset of program that maintains the e↵ect of program on
the slicing criterion. In particular, a static slice is obtained by removing a number of
statement in the original program and it preserves the program behavior of the original
program with respect to the slicing criterion.
The main issue of static slicing is that a static slice may contain many statements that do
not a↵ect the value of variables of interest. To overcome the issue, dynamic slicing [8] is
proposed to reduce the number of considered statements. In particular, dynamic slicing
preservers the behavior of program for a specific input. So that instead of involving all
potential statements a↵ecting the slicing criterion, dynamic slicing reduces the search
space. Therefore, slicing criterion of dynamic slicing is a triple (l, V, I) where l, V are
program location and a subset of variables in the program respectively, and I is the
program input.
Conditioned slicing [9] is another slicing method that bridges the gap between static
slicing and dynamic slicing. While static slicing does not care about input, dynamic
slicing specifies input in detail, that triggers a large number of input needed to cover the
whole program. Conditioned slicing provides information about input without being so
specific as to give the precise values, i.e., using boolean expression to relate the possible
value of inputs. Slicing criterion of dynamic slicing is a triple (l, V, F (V )) where l, V
are program location and a subset of variables in the program respectively, and F (V ) is
the first order logic formula on variables in V .
1.2.2
Predicate Abstraction
Predicate abstraction is a technique proposed by Graf and Saidi [12] and Colon and
Uribe [13]. The technique is widely applied in software verification. Instead of tracking
the specific value of data of variables in the program, it tracks predicates of the data.
When verifying a program, an abstracted program is created to represent it, using
Existential Abstraction [14]. Variables in the abstracted program are Boolean variables,
which represents predicates. So each abstract state in the abstracted program models a
number of state in the original program. There exists a transition from an abstract state
if at least one corresponding concrete state in the original program has the transition,
that is over-approximation. The main challenge of predicate abstraction that is finding
a set of predicates for a program.
Chapter 1. Introduction
6
P, '
Create Initial
Abstraction
A(P ), '
Model Check
A(P ) 2 '
A(P ) ✏ '
Generate
Counterexample
Refine
Cex is spurious
Cex
Stop
Cex is not spurious
Check Spurious
Counterexample
Figure 1.3: CEGAR framework
Counterexample guide abstract refinement (CEGAR) is a technique that automates
searching for predicates. The technique consists of four basic steps as shown in Figure
1.3. In the figure, we use some notations including P, A(P ), ', Cex to denote the original
program, abstracted program, program specification and counterexample, respectively.
• Create Initial Abstraction: Construct abstracted program of the original program
by over-approximation the program behavior of the original one. The abstracted
program is a finite-state program whose variables are Boolean.
• Verify abstraction: The abstracted program is finite state so a model checker for
programs with Boolean variables is used for verifying whether the abstracted program satisfies a given specification '. If the model checker returns the abstracted
program is safe then the process finishes. Otherwise, the program does not satisfy
the specification and then a counterexample is returned.
• Check the counterexample: The counterexample of abstracted program may be
a spurious counterexample because of over-approximation, that means the counterexample is not corresponds a valid execution of the original program. If the
counterexample turns out to be an actual counterexample, the original program
does not satisfy the specification. Otherwise, the abstracted program needs a
refinement.
• Refine the abstraction: Because of the existence of the spurious counterexample,
it is necessary to eliminate the counterexample from the abstracted program by
adding additional predicates, which represents behaviors in the abstracted program
Chapter 1. Introduction
7
which does not correspond to original program. The model checker then resumes
with the refined abstraction.
1.3
Thesis structure
The organization of this thesis is as follows. Chapter 2 describes our slicing method for
sequential programs. It also includes the program syntax, transition system of sequential
programs. Chapter 3 defines our slicing method for concurrent programs. We only focus
on the di↵erence of our method compared with the method for sequential programs.
Experimental results contains many tests with both sequential and concurrent programs
in Chapter 4. Most of them are well-known algorithms, for example Dekker [15], Lamport
[16], Peterson [17], etc. Finally, conclusions and future work are in Chapter 5.
Chapter 2
Slicing Sequential Programs
In this chapter, we describe our slicing method for sequential programs. We aim to check
the program safety. The form of safety properties are assertions that specify invariants
of the program. In Promela, assertions are assert statements: assert(e), where e is
an expression. If the expression e evaluates to 0 at run time, the program aborts. Safety
checking is so reduced to check whether assert statements are reachable.
We first consider Promela language with the standard syntax, statements, variables in
a Promela program as well as the transition system of Promela. We then describes our
method in details for sequential Promela programs.
Definition 1 (Directed Graph). A directed graph is a tuple G = (V, E) where V is a
set of vertices and E ✓ V ⇥ V is a set of edges, together with functions start and end
that associate a start vertex and an end vertex with each edge. A path of length k from
node n to node m is a sequence of edges he1 , . . . , ek i such that end(ei ) = start(ei+1 ) for
i = 1, . . . , k.
2.1
Program Syntax
The program syntax is presented in Table 2.1. Each program P declares a set of variables
Vars and a procedure init. Variable type is either integer or boolean. All variables
are global. Variables are statically scoped as in C. The variable identifier is a C-style
identifier. Procedure init is constructed by a sequence of statements. Statements
may be labelled and are built inductively by composition with control-flow statement.
Expressions are built in the usual way from the constants, variables and the standard
logical connectives. There are three statements that can a↵ect the control flow of a
8
Chapter 2. Slicing Sequential Programs
9
Table 2.1: The syntax of program
Syntax
prog
::=
decl⇤
decl
btype
::=
::=
btype id+ ;
int
id
|
::=
bool
[a-zA-Z ][a-zA-Z0-9 ]
prog
::=
init() body
body
::=
{ sseq }
prog ⇤
::=
::=
|
stmt
::=
|
|
|
|
|
option ::=
lstmt+
stmt
id : stmt
skip ;
goto id ;
id = expr+ ;
if option+ fi ;
do option+ od ;
assert expr ;
:: expr -> sseq
expr
expr binop expr
( expr )
id
const
&| == | > | < . . .
sseq
lstmt
binop
::=
|
|
|
::=
Description
A program consists of a list
of global variable declarations
followed by a list of procedure
definitions
Global variable declarations
Variables may has either int or
bool type
An identifier is a regular Cstyle identifier
Procedure definition, followed
by program body
Program body is a sequence of
statements
Labelled statement
Assignment statement
Conditional statement
Loop statement
Assert statement
Option in either conditional
or loop statement
Logical connectives
program, including: if, while and assert. The statement if and do are a nondeterministic, guarded choice. For each if and do, there are possibly two or more
options. When an option is selected to execute, all other options are discarded.
2.2
Statements, Variables
The term statement denotes and instance that can be derived from the nonterminal stmt
in Table 2.1. Let P be a program with n statements, Stmt be the set of statements in
P . Let T ype : Stmt ! {Skip, Goto, Assignment, Condition, Assertion} be the function
indicating the type of statements in Stmt.
Chapter 2. Slicing Sequential Programs
10
Because the sequential program only has one procedure, we assume that all variables and
statement labels are globally unique. Let V ars(P ) be the set of variables in P , V ars(si )
be the set of variables appearing in statement si . If si is an assignment statement, let
V arsr (si ), V arsl (si ) be the set of variables in the right side and left side of si respectively,
V ars(si ) = V arsr (si ) [ V arsl (si ).
2.3
Program Control-Flow Graph
This section defines the control-flow graph of a program of sequential programs. The
program has a procedure, which is modeled by a directed graph.
The control flow graph of a sequential program P is a directed graph GP = (VP , EP )
where VP is a set of vertices, and EP ✓ VP ⇥ VP is the set of edges. The set VP
contains one vertex for each statement in P and one additional vertex Exit for the
program, which indicates where the program stops. Furthermore, VP has one vertex
Err to indicate the failure of assert statements. Each edge connects two vertices,
for example an edge from v1 to v2 , denoted by hv1 , v2 i. Edges are constructed by the
successor function SuccP : VP ! VP . The successor function is defined in term of
function N extP : VP ! VP . N extP has a recursive definition based on the program
syntax in the Table 2.1. In the table, each statement has an sseq node as its parent.
If statement si is not the last statement in its parent sseq, N extP (si ) is the statement
immediately following si in the sequence. Otherwise, let a be the closest ancestor of
si in the syntax tree such that a is a stmt node and a is the not last statement in its
sequence. If a exists, N extP (si ) is the statement immediately following a. Otherwise,
N extP (si ) is Exit.
So we define SuccP basing on N ext function:
• If si is goto L, SuccP (si ) = {sj } where sj is the statement labelled L.
• If si is either an if or a do statement, then SuccP (si ) = {sj1 , . . . , sjn } where
sjk , 1 k n are options of the if or do respectively.
• If si is the last statement in an option of do statement, then SuccP (si ) = {sj1 , . . . , sjn }
where sjk , 1 k n are options of the do.
• If si is an assertstatement, then SuccP (si ) = {N extP (si ), Exit}.
Example 1. Consider the example in Figure 2.1. The left figure is the program source.
The program has one function named init. The control flow graph of init is shown in
the right figure. There is an if statement at line 5 with two options x2 3 and x2 > 3
Chapter 2. Slicing Sequential Programs
11
x1 = 1
x2 = x1 + 1
x2 > 3
x2 <= 3
skip
assert(x2 == 2)
Exit
(a) An example of Promela program
Err
(b) Program control flow graph
Figure 2.1: An example of program and its control flow graph
at line 6 and line 8 respectively, so two edges (l4, l6) and (l4, l8) are added. Similarly,
two edge (l7, Exit) and (l9, Exit) are to illustrate the program flow when it finishes if
statement. The program also has an assert statement at line 7, so we have an edge
(l7, Err) to indicate the program transition when the assertion fails.
2.4
Program Transition System
For a set V ars0 ✓ V ars, a valuation ⌦ to V ars0 is a function that associates every
variable in V ars0 with a specific value. ⌦(var), var 2 V ars0 is the valuation of variable
var. For any function f : D ! R, d 2 D, r 2 r, f [d/r] : D ! R is defined as f [d/r](d0 ) =
r if d = d0 and f (d0 ) otherwise. For example, if V ars0 = {x, y} and ⌦ = {(x, 1), (y, 2)}
then ⌦[x/0] = {(x, 0), (y, 2)}. For any function F : V ! V, v 2 V, f [v/⌦] : V ! V is
defined as F [v|⌦](v 0 ) = v 00 |8var 2 V ars(v 0 ) : var/⌦(var). For example, if statement
v : x = y + 1 and ⌦ = {(y, 1)} then v|⌦ : x = 0 + 1.
A state ⌘ of program is a pair hsi , ⌦i where si is a statement, and ⌦ is the valuation
at the statement. Intuitively, a state is represented by a statement and the valuation of
variables at that location. Let States(P ) be the set of states in program P .
We use ⌘1 !P ⌘2 to denote that there exists a transition from state ⌘1 to state ⌘2 .
Formally, ⌘1 !P ⌘2 hold if ⌘1 = hi, ⌦i i and ⌘2 = hj, ⌦j i and ⌘1 , ⌘2 2 StatesP . The table
2.2 shows the state transition for each vertex type.