Tải bản đầy đủ (.pdf) (17 trang)

STROKE EXTRACTION FOR CHINESE CHARACTERS USING A TRENDFOLLOWED TRANSCRIBING TECHNIQUE

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (1013.48 KB, 17 trang )

Pattern Recognition, VoL29, No. 11, pp. 1789 1805, 1996
Copyright © 1996 Pattern Recognition Society. Published by Elsevier Science Ltd
Printed in Great Britain.
0031 3203/96 $15.00+.00

Pergamon

PII: S0031-3203(96)00039-8

STROKE EXTRACTION FOR CHINESE CHARACTERS USING
A TREND-FOLLOWED TRANSCRIBING TECHNIQUE
J I - R O N G L I N and C H A N G - F U U C H E N
Department of Electrical Engineering, Tatung Institute of Technology, 40, Chung-Shan N. Road, 3rd Sec.,
Taipei, 10451, Taiwan, R.O.C.

(Received 1 February 1994; in revisedform 15 December 1995; receivedfor publication 4 April 1996)
Abstract--The merit of the stroke extraction algorithms which utilize the thinning process is the ease of the
feature abstracting from the skeleton of a character. The two main tasks for this kind of algorithms are to find
the certain adjacent segmental strokes for being merged into a complete stroke, and to search the corner
point to divide the bend segmental stroke into two or more individual strokes. This paper proposes an
intuitive and effective stroke extraction method that passes through the distorted region and gets the reliable
information of global features by applying the trend-followed transcribing technique to correctly accomplish
the tasks. In our experiments, the most frequently used 1500 Chinese characters printed in both the Ming font
and the Fang Sung font with the size of 64 x 64 points are tested. The results of the experiments show that
the rate for correctly extracting all strokes of a character is 97.8% for the Ming font and 98.4% for the
Fang-Sung font. That is, the proposed stroke extraction algorithm is useful and reliable. Copyright © 1996
Pattern Recognition Society. Published by Elsevier Science Ltd.
Chinese character stroke extraction

Thinning process


1. INTRODUCTION
The recognition of a Chinese character is considered
a very difficult problem. Considering the practical
conditions of a Chinese character recognition system,
the recognition rate of the conventional methods, such
as shape matching, position matching, projection
profiles, transform recognition, etc., is low. (~-s) Some
popular ideas for increasing the recognition rate have
been developed, and an appreciative one is the structure analysis method. (6-i°) The Chinese character is
constructed from some rules that are dependent on the
printed font, the writing style, the size, and the thickness. However, the spatial relations and geometric
configurations of the elemental strokes are usually well
maintained. Therefore, these properties can be regarded as invariant features and be applied to the
recognition of Chinese characters.
In the structure analysis methods, the success of
recognition is greatly influenced by how correctly the
elemental strokes can be extracted. There are two
essentialrules for stroke extraction. (i) A representative
stroke that is a dot (-), a horizontal stroke (--), a vertical stroke ([), or a leaning stroke (/or \) is extracted to
be an individual stroke. They should be exactly extracted without being split or being merged. (ii) A bent
stroke, such as -3, should be split up into two representative strokes at its corner point. To follow these rules,
the stroke extraction has been studied by m a n y
authors and the stroke extraction algorithms can be
generally divided into two groups according to the
utilization of the thinning process or not. The algo-

Travel algorithm

Trend-followed


rithms without thinning process (1 l-13) extract strokes
by the knowledge about the width-variation of strokes.
Theoretically a stroke can be extracted by means of the
width-variation, but to measure the width of a stroke
which is related to the trend of it is technically difficult.
Thus, owing to the illness of width-measure, the rules
for stroke extraction based on knowledge about the
width-variation of strokes are more complex and unexpected. F o r instance, the knowledge is converted
into about 20 parameters in Tseng's algorithm. (12) F o r
the algorithms with thinning process, (a4-~6) the two
main tasks are to handle fork points and to find corner
points. The violent distortion on fork points produced
by the thinning process, such as the split of one fork
point into two or more fork points, makes the handle
of fork points difficult. The general method (i4'i6) devotes to finding and merging the split fork points. The
problem of these methods is that they do not take care
of both the local and global considerations such that
they are likely to fall into local traps. The same problem happens in the search of corner points.
In this paper, an intuitive and effective stroke extraction method, which uses the thinning process and is
called the trend-followed stroke transcribing algorithm, is proposed. In order to handle the problem
mentioned above, the algorithm intends to pass
through the distorted region to get the reliable information of the global features. By the same manner, the
corner-searching algorithm is flexible for both the
smooth transition corner and the abrupt transition
corner. This stroke extraction technique consists of
three stages: (i) thin the character and label the feature

1789



1790

J.-R. L I N and C.-F. C H E N

points; (ii) search and label the corner points; and (iii)
transcribe each stroke from its one end point and
record the traces until finding the other end point of
this stroke or the fork point that is judged to be the end
of this stroke by calculating the trends of the branches.
Section 2 will give these proposed procedures and
algorithms in detail. The experimental results are described in Section 3 while the algorithm is summarized
in Section 4.
2. TREND-FOLLOWED STROKE EXTRACTION

F o r easily describing the system, some definitions
are given in the following:
Character point: a character point is a point of the
skeleton of a thinned character.
Neighbor: as shown in Fig. 1, point n~, i = 0, 1,..., 7,
is named neighbor of character point P. A neighbor
that is a character point is named character neighbor,
and a neighbor that is not a character point is named
non-character neighbor.
Arm: an isolated character neighboring or a group
of consecutive character neighbors of character point
P is named arm of character point P. For example, as

n~

no


n7

n4

P

no

n3

n~

n,

shown in Fig. 2, P1, Pz,-.-, P8 has 1, 1, 2, 2, 3, 3, 4, and
4 arms, respectively.
End point: an end point is a character point which
has only one arm. In Fig. 2, P1 and P2 are end points.
Middle point: a middle point is a character point
that has two arms. In Fig. 2, P3 and P4 are middle
points.
Fork point: a fork point is a character point that
has three or four arms. If it has 3 (4) arms, it is named
3-fork (4-fork) point. In Fig. 2, Ps, P6, P7 and Ps are
fork points.
Corner point: a corner point is a connection point
of two strokes whose trends construct a sharp angle
less than C_ANGLE, a constant angle that varies with
the font of the input character.

Branch point: if an arm of character point P is an
isolated character neighbor P,, then P, is denoted as
the branch point of this arm. If an arm of P is a group of
character neighbors P , ~ P~,,+ k)mod S, and the numbers
of arms of P, ~ P(,+k) mod8 are a o ~ ak, respectively,
then the branch point of this arm is P(, + 0 moa 8 which
ai=max{a o ~ a k } and i < j for ai=aj, where

O<_i,j<_k.
F-segmental branch: an f-segmental branch is defined as a segmental branch which consists of a n u m b e r
of consecutive middle points and has a fork point as
the start point.
Trend: let the consecutive points of a segmental
branch be P1,P2,... ,Pk. The trend of the segmental
branch can be calculated as following:
[k/2] P ( k - i +

X=

~

-

-

1 )x - -

-

Pix;


[k/2] 1D

y=

V ~(k-i+l)y

[[P~P(k-~+1~ll

i=1

trend=tan-l(Y/X)

~1

-n
IIPiPj [I is the length of vector (P~, Pj).

J

O'-'-t'---O

p,



P:




.•

.-

• ........ O

.P~

B ........ •

. -"

"-..

]-..
..-

P5

.-

i

* ........ P~".

.-

",..


..-"

.e7

Fig. 2. Examples for the arms of character point P.

.P~

~i~.

IIP~P(k-~+x)[I '

where (P~x,Piy) is the position of P~ and
Fig. 1. The neighbors of character point P.

__ P


Stroke extraction for Chinese characters

1791

//V

V2

t4

Vi
V3


.
4 °"

.....
,."

"•

V"

3

:," V4.':"
::7, .."

i,

.' V2
Vp

-...~.
end

vl
Fig. 3. The calculation of the trend of a segmental branch; where vti = ~-=,,
i = 1,..., 4.
Itv~ll

[ Data~::~ Data~:~Data ~ --+


~_~

Data

Fig. 4. The doubly linked list.

Figure 3 gives an example for the calculation of the
trend• In this paper, the angle of two trends is set in the
range [0, 7~].
Bud: a bud is a segmental branch whose length is
short in comparison with the general strokes and is
produced in the thinning process. In this paper, the
stroke whose length is not longer than BUD, a constant length, is considered as a bud.
Doubly linked list (DLL): a doubly linked list is the
data structure, which each node has two pointers, one
pointing to the next node and the other pointing to the
previous node in the list, and is used to record the chain
code of the extracted stroke. Figure 4 shows its configuration.
The most natural way to extract strokes is to transcribe them one by one. A stroke can be transcribed
from its one end point to the other. While a fork point
is encountered, by calculating the trends of the coming
and the forward branches, the forward branch whose
trend is near to the trend of the coming branch is
selected to be transcribed forward. If there is not any
such branch, then the transcription of this stroke is
finished since this fork point is the other end point of
this stroke. The transcription is repeated until all
strokes are extracted. This idea can be carried out by
the trend-followed stroke extraction algorithm which

consists of three stages: pre-processing, corner point

searching, and stroke extraction. In the pre-processing
procedure, the input binary character is thinned and
all its character points are labeled. Then the corner
points are searched and labeled. In the stroke extraction procedure, the transcribing algorithm, and trend
calculation algorithm, and the stroke extraction strategy are used.
2.1. Pre-processing
2.1.1. Thinning procedure. The selection of the
thinning algorithm (17-19) does not matter. Once the
thinning algorithm is selected, its behavior on the fork
points must be thoroughly known. The thinning algorithm that is developed by Xia (17) is utilized in this
paper.
The behavior on the fork points of the Xia's thinning
algorithm describes that a character is thinned out
until the least points that maintain the connection and
produce no holes. Thus, there are four types of fork
points:
Type I: a fork point has three arms that are all
isolated character neighbors of it.
Type II: a fork point has two arms that are both
isolated character neighbors of it and one arm that
consists of the consecutive character neighbors of it.
Type III: a fork point has one arm which is the
isolated character neighbor of it and two arms which


1792

J.-R. LIN and C.-F. CHEN


i1'

"-i~/
1

Type I

TypeI

.•.-



......... O ........



Type II

I

j"

.0

, ........ p .

/



"•

Type II

Type III

TypeIV

Fig. 5. Examples for four types of fork point P.

P

Fig. 6. Edges of the neighbors of character point P.

both consist of the consecutive character neighbors of
it.
Type IV: a fork point has four arms that are all
isolated character neighbors of it. Figure 5 gives some
examples for each type.
2.1.2. Labelin9 character points. Every character
point is labeled the number of its arms. Labels 1, 2, 3,
and 4 indicate the end point, the middle point, the
3-fork point, and the 4-fork point, respectively, while
label 5 indicates the corner point which is a special
kind of middle point and is searched by the algorithm
described in Section 2.2. To give the label of the
character point, the number of edges of the character
neighbors is counted and is divided by two since each
arm of character point P has two edges. F o r example,

as shown in Fig. 6, character point P has six edges in its
character neighbors, thus point P is labeled 3. After all
the character points are labeled with the number of
their arms, the corner points are searched and labeled.
2.2. Searchin9 corner points
A corner point is a connection point of two strokes
whose trends construct a sharp angle. Hence, a corner

point is the middle point whose arms are isolated
character neighbors. In order to efficiently search corner points, all middle points are scanned one by one
but only the possible candidates are further checked.
There are two types of candidates for the corner points.
Type ! is the middle point whose two branch points are
neighbors n i and n(i + 2) rnod 8 or n(i_ 2) mod 8, where i = 0,
1, 2,..., 7. Type II is the middle point whose branch
points are neighbors n~ and n(i + 3) mod8 or n(i_ 3) modS"
Figure 7 gives the examples of both types. In general, to
judge whether a candidate is a corner point or not, we
transcribe CL consecutive middle points for each of its
two branches and calculate its corresponding trend• If
the angle of these two trends is not greater than
C_ANGLE, then it is judged to be a corner point.
Occasionally, as shown in Fig. 8, a corner point is
hidden because the transition of its two connected
strokes is too smooth to be correctly judged by the
above method. The way to resolve this problem is to
j u m p over the smooth transition part and then find the
trends of the branches. The smooth transition part
may be on one side of the candidate or across it.
Therefore, for each branch of the candidate, we transcribe the branch forward by two segments with the

adequate length such that if there is no smooth transition part on the branch, then the trend of the first
segment represents this branch; else the first segment
contains the smooth transition part and the trend of
the second segment represents this branch. If the angle
of the trends of the two first segments is not greater
than C_ANGLE, then they form a corner, otherwise
there is a smooth transition part or the candidate is not
a corner point• F o r processing the smooth transition
part, a j u d g m e n t algorithm is proposed. In the following, the detailed algorithm for transcribing both
branches of a candidate to get two segments with the


Stroke extraction for Chinese characters

1793

o.
• •.

...p

...•

P

p

........




.........



.°..°

(a)

(b)

(e)

P

p

p ........ •
/
°.•

/.-

(d)

(e)

(f)

Fig. 7. Examples for two types of candidates for corner points. (a) (c) are the candidates for Type I and

(d)-(f) for Type II.

......................... i ......................... ."..... ~............
..: ........................ ~......................... .t..... i............
:

: •

...................................... , ...... :...... ! ..... !............
......

~ ..... " ...... " ..... , ...........

i ...... i......

i

:

:

•i

:

i

.:




iO!

!

~

!

!

,

iO:
:

:

~ ..... i ..... - ......

...... !...... i...... i...... !..... !..... :'...... !...... i ..... !..... -:......
0 ~ !...... !~..... !i..... ,~...... ~i...... .!~..... ~!..... !!......
...... !:..... ~~......
!

!ol

!

!


i

iol

.....................................................

i

i

i

i

i

!

i

i

!

!

~ ...........

i


!
~ ......

i i i ioioioioioi.i
...... i ..................

i .......... i ............ i .......... i ......

Fig. 8. A smooth corner of character ~ .

a d e q u a t e l e n g t h for e a c h b r a n c h is first d e s c r i b e d .
T h e n , a n a l g o r i t h m f o r m a k i n g t h e j u d g m e n t will be
g i v e n in detail.
2.2.1. A19orithm for transcribin9 the segments. I n
t h e b e g i n n i n g , we s e a r c h t h e t w o b r a n c h p o i n t s o f t h e
candidate, which are the isolated character neighbors,
a n d s t a r t t r a n s c r i b i n g its t w o b r a n c h e s , r e s p e c t i v e l y
f r o m t h e f o u n d b r a n c h points• F o r t r a n s c r i b i n g forw a r d f r o m a m i d d l e p o i n t , its n e i g h b o r s a r e i n s p e c t e d
c l o c k w i s e f r o m t h e n e i g h b o r n e x t to t h e p r e v i o u s
t r a n s c r i p t i o n p o i n t for f i n d i n g t h e o t h e r b r a n c h p o i n t

as t h e n e x t t r a n s c r i p t i o n p o i n t . F o r g e t t i n g t w o segm e n t a l b r a n c h e s for e a c h b r a n c h , t h e f o l l o w i n g t h r e e
stages are executed:
S t a g e 1: t r a n s c r i b e t h e b r a n c h f o r w a r d u n t i l t h e
t r a n s c r i b i n g d i s t a n c e e q u a l s CL o r a n o n - m i d d l e p o i n t
is e n c o u n t e r e d .
S t a g e 2: if t h e e n c o u n t e r e d n o n - m i d d l e p o i n t in s t a g e
1 is a f o r k p o i n t or a n e n d p o i n t , t h e s e c o n d s e g m e n t a l
b r a n c h is g i v e n a n e m p t y set. O t h e r w i s e t h e s e c o n d

s e g m e n t a l b r a n c h is s e a r c h e d as following. W e s t a r t
again transcribing the branch forward until the
t r a n s c r i b i n g d i s t a n c e f r o m t h e t e r m i n a l p o i n t of
t h e s e g m e n t a l b r a n c h g o t in s t a g e 1 to t h e c u r r e n t
t r a n s c r i p t i o n p o i n t e q u a l s CL o r a n o n - m i d d l e p o i n t is
e n c o u n t e r e d . If t h e n e w t e r m i n a l p o i n t is a c o r n e r p o i n t
a n d t h e t e r m i n a l p o i n t o f s t a g e 1 is a m i d d l e p o i n t , t h e n
we m e r g e t h e n e w s e g m e n t a l b r a n c h i n t o t h e o n e g o t in
s t a g e 1 a n d t r a n s c r i b e a g a i n for g e t t i n g a n e w s e g m e n tal b r a n c h .
S t a g e 3: if t h e l e n g t h o f a s e g m e n t a l b r a n c h is n o t
g r e a t e r t h a n B U D , t h e n t h e s e g m e n t a l b r a n c h is m o d i fied as a n e m p t y set.
2.2.2. A19orithm for judgin 9 the corner point• I n this
s u b s e c t i o n , we d e c i d e w h e t h e r t h e c a n d i d a t e is a c o r n e r
o r n o t w i t h t r e n d s o f t h e s e g m e n t a l b r a n c h e s g o t in t h e
last s u b s e c t i o n . If t h e c a n d i d a t e is a c o r n e r p o i n t , t h e n
we c o n s i d e r w h e t h e r this c a n d i d a t e a n d t h e o t h e r
corner points on the segmental branches which are
f o u n d b e f o r e this c a n d i d a t e s h o u l d be m e r g e d o r not.
L e t t h e t w o b r a n c h e s o f t h e c a n d i d a t e be i n d e x e d
A a n d B, a n d t h e t w o s e g m e n t a l b r a n c h e s for e a c h
b r a n c h g o t in last s u b s e c t i o n be i n d e x e d 1 a n d 2,
respectively. F o r e x a m p l e , S e g m e n t A1 i n d i c a t e s t h e


1794

J.-R. LIN and C.-F. CHEN
Segment A2

Segment A2


i

Segment A1

Segment A 1 I

Segment B2

Segment B2

Segment B 1

Segment B 1

(a)

(b)

Segment A2

Segment A 2 /

A

Segment A ~
Segment A1

Segment B2


Segment B
Segment B2

Segment B 1

(c)

(d)

sog

Segment A2

~Nj~

egment A1

Segment ~ /

"~

Segment A1

Segment B1 /

v

Segment B2

(e)


Segment ]32

(0

Fig. 9. Examples for different relations of near corner points. [~: Candidate; 0: Corner point; and ©:
Redundant corner point.

first segment of branch A, which is got in stage 1 of the
last subsection; Trend (B2) indicates the trend of the
second segment of branch B, which is got in stage 2 of
the last subsection. Thus, the procedure for cornerjudgment can be given in detail as follows:
Stage 1: if at least one of the angles of Trend (A1)
and Trend (B1), Trend (A1) and Trend (B2), Trend (B 1)
and Trend (A2), and Trend (A2) and Trend (B2) is not

greater than C_ANGLE, then the candidate is a corner
point and is labeled 5.
Stage 2: as shown in Fig. 9(a) and (b), if only the
terminal point of Segment A1 (B1) is a corner point,
then we check whether Segment A2 (B2) is parallel to
the Segment B1 (A1). If they are parallel that the angle
of them is not greater than L_ANGLE or greater than
C_ANGLE, a constant angle of two trends below


Stroke extraction for Chinese characters
which they are regarded as an identical orientation,
then the candidate and the terminal corner point are
distinct corners. Otherwise, they indicate the same

corner and are replaced by the center point of the
segment between them as shown in Fig. 9(c). Similarly,
if both the terminal points of Segment A 1 and Segment
B 1 are corner points, then we check whether Segment
A2 and Segment B2 are parallel. If they are parallel,
this two corner points indicate distinct corners and
the candidate is a redundant corner point and is
relabeled 2 as shown in Fig. 9(d) and (e). Otherwise,
the three corner points indicate the same corner as
shown in Fig. 9(f) and only the candidate is set to be
a corner point and others are reset to be the middle
points.
2.3

Extracting strokes

The strokes are extracted one-by-one through the
transcription of them. Figure 10 shows the procedure
of the stroke extraction. When we transcribe a stroke,
the transcribed trace of the stroke is recorded in the
individual doubly linked list (DLL) which is defined in
the beginning of the section. As shown in Fig. 10, there
are four main steps: (1) finding the starting point;
(2) transcribing the stroke forward; (3) going through
the fork point; and finally (4) finding the hidden starting points. The following four subsections will give the
detailed description respectively.

2.3.1. Finding the starting point. The starting point
is either an end point or a corner point. Therefore, at
first, a character point with label 1 or 5 is found and its

position is recorded in an empty DLL. After its neighbors are inspected to search out one branch point,
which is to be the next transcription point, we change
its label into 0 to show this character point having been
transcribed if its label is 1, or, change its label into 1 to
indicate this character point to be a starting point of
another stroke if its label is 5. If there is no more end
point or corner point, then we try to search the hidden
starting points as described in Subsection 2.3.4.
2.3.2. Transcribing the stroke forward. After the
starting character point of a stroke is found, we transcribe it forward until we encounter a non-middle
point. The method for searching the next transcription
point is the same as that described in Subsection 2.2.1.
Every time when the next transcription point is
searched out, it is set to be the new current transcription point and is recorded in the DLL. If its label is
2 (middle point), then change it into 0 to show this
middle point transcribed and repeat transcribing the
stroke forward. If its label is 1 (end point) or 5 (corner
point), then change it into 0 or 1, respectively, and the
transcription of this stroke is finished. If the label is 3 or
4 (fork point), we need to decide whether the stroke is
continued to be transcribed as described in the next
subsection.

1795

2.3.3. Going through the fork point. When the current transcription point is a 3-fork or 4-fork point,
there are 2 or 3 forward branch points, respectively.
Sometimes, additional fork points occur in a forward
branch such that they will branch more forward
branches. In order to effectively and economically

determine which forward branch to be transcribed
forward or to finish the transcription of this stroke, and
travel algorithm, which is similar to traveling over
a tree structure that the root is the original fork point
while the nodes are the fork points on the forward
branches, is required to find the feasible forward
branches which are likely to be the extension of the
coming branch. Since the thinning process makes violent distortion on a fork point, it should be avoided to
use the distorted segmental branch to calculate the
trend of a forward branch. Especially, when a fork is
split into several fork points by the thinning process, it
appears some short and bend segmental branches
around the fork points. Therefore, the travel algorithm
must pass through the distorted short f-segmental
branches until finding a nondistorted f-segmental
branch, whose length is not shorter than the constant
length SL, where the f-segmental branch consists of
a number of consecutive middle points which are
started from a fork point and SL is dependent on
the size of the input character. Under the travel
algorithm, there are some exceptions to the above
rule such that two additional rules are also required.
The first one is that if the travel of a forward branch
passes through three f-segmental branches whose
lengths are all not shorter than 0.7SL, it is considered that these three f-segmental branches are nondistorted. The second one is that if the total travel
distance is longer than 4 SL, it is considered that the
longest one of the traveled f-segmental branches is
nondistorted.
Let the forward branch points of each fork point be
tagged 1 and 2 for a 3-fork point and tagged 1, 2, and

3 for a 4-fork point in clockwise order. While a forward
branch is traveled, a counter and three arrays are used.
The counter is used to count the number of the traveled
character points. The first array is used to record the
positions of the traveled character points in order.
Whenever a fork point is encountered, the second
array is used to record the distance (number of character points) between the original fork point and the
encountered fork point. The third array is used to
record the tag of the forward branch point which is
now selected to be traveled forward. Figure 11 shows
the flowchart for traveling the feasible forward
branches. In the beginning, the forward branch point
tagged 1 of the original fork point is selected to be
traveled forward. Then the feasible forward branches
are traveled as follows:
(1) The travel of a forward branch is finished when
one of the conditions listed below takes place.
(i) The current travel point is an end point or a corner point.


1796

J.-R. LIN and C.-F. CHEN

( end point or corner point )

Search the next transcription point ]
and transcribe forward

Search the hidden

starting points

]

I

Yes

No

( end/corner point

~

Yes

Travel the feasible forward branches ]
and compare the trends of the comming ]
and the feasible forward branches I
J

!

Yes

[

Transcribe forward

]


I
Fig. 10. The flowchart of stroke transcription.

(ii) The current travel point is a fork point and the
total traveling distance is longer than 4SL.
(iii) The distance from the current travel point to the
fork point traveled last equals SL.
(iv) There are three f-segmental branches whose
lengths are all not shorter than 0.7SL.
(2) If the current travel point is a middle point, then
its forward branch point is traveled forward.
(3) Whenever a fork point is encountered, its forward branch point tagged 1 is traveled forward.

At the moment of finishing the travel of a forward
branch, the traveled fork points of the forward branch
are inspected in the opposite order of being traveled to
find the first one of which has forward branch points
having not been traveled. It should be noted that the
choice of this fork point can be decided by comparing
the number of the forward branch points of the fork
point with the tag of its last traveled forward branch
point which is able to be got from the data of the third
array. If such a fork point exists, then its forward


Stroke extraction for Chinese characters

1797


I Travel to the forward branch point tagged 1 I

Search the next travel point and travel forward I

Yes

Total travel di
> 4SL ?
,w

o f the current "
.egmental=sL?b r ~ , h

J
_~
No x,,,x

JAre

forward branch
backward to find the
fork point which has
forward branch po'mts

"-,,,

there three"'..

_f - s e g m e n t a l branches
w h o s e length


>07SL

I ns

9 J
Travel to the forward branch
point of the found fork point
w h o s e tag is greater by one than
the tag of the last traveled
forward branch point

Yos

l

]No
( fork point )
Fig. 11. The flowchart of the travel algorithm.

branch point, whose tag is greater by one than the tag
of the last traveled forward branch point, is selected to
be traveled forward. The items of the arrays from the
original fork point to the found fork point are duplicated for the arrays of the new current travel forward branch, and the last item of the new third array
should be increased by one. The counter is set to be as
the distance from the original fork point to this found
fork point. And the travel of the new current travel

forward branch is continued by setting the selected
forward branch point to be the current travel point.

However, if it fails to find such a fork point, then the
travel of all the feasible forward branches of the original fork point is finished.
To decide whether the coming branch and the feasible forward f-segmental branch constitute the segment of a stroke, we need to find the difference between
the trends of the coming branch and the feasible


1798

J-R LIN and C - F CHEN

forward f-segmental branches To calculate the trend
of the coming branch, the coming branch is traveled
backwards by the data of the DLI_ until one of the four
conditions for finishing the travel listed in the second
paragraph of this subsection takes place. The nearest,
in backward order, coming f-segmental branch whose
length isn't shorter than 0 7 S L is selected to calculate
its trend as the trend of the coming branch Thereupon
we calculate the trends of the feasible forward f-segmental branches and determine which forward branch
to be transcribed forward or to finish the transcription
of this stroke. The determination is to check whether

00

the coming branch and one of the feasible forward
f-segmental branches constitute, the segment of
a stroke, i e , the angle between their trends is less than
L _ A N G L E The determination procedure is repeated
by scanning the farthest f-segmental branch for every
feasible forward branch which hasn't been checked

until the object is found or all of the feasible segmental
branches are checked If it fails to find such a forward
f-segmental branch, then the extraction of this stroke is
finished Otherwise, we directly transcribe forward to
the further end of the found f-segmental branch and
store the recorded data of arrays in the D L L While

0

. . . . . . . . .

I . . . . . . . . .

2

. . . . . . . . .

3

. . . . . . . . .

4

. . . . . . . . .

5

. . . . . . . . .

0


. . . .

0

0

. . . .

0

. . . .

0

. . . .

0

. . . .

.

.

.

.

.


5

.

.

. . . .

.

.

.

.

.

. . . .

.

.

.

.

5

.

. . . .

.

.

.

.

.

.

.

.

.

5
.

.

. . . .

.


.

.

.

.

.

5

.

.

. . . .

.

.

.

.

.

.


.

.

5

.

.

. . . .

.

.

.

.

.

.

.

.

5


.

.

.

6...

. . . .

.

.

.

O . . .

.

.

.

.

01
02


.

03

.

.

04

.

.

.

.

.

.

.

.

.

.


05

.

.

.

.

.

.

.

.

.

.

06

. . . . . . . . . . . . . . . . . . . . . . .

07

.


.

.

.

.

.

.

.

.

.

.

.

.

.

.

.


.

.

.

a

.

.

.

.

.

.

.

.

.

a

.


.

.

.

.

.

.

.

.

.

.

.

.

.

.

.


.

.

a

.

.

.

.

.

.

.

.

.

.

.

.


.

.

a

.

.

.

.

.

.

.

.

.

.

. . . . . . . . . . . . . . . . . . . . . .
. . . . . . .

10


. . . . . . . . . . . . . . . . . . . . . .

11

. . . . . . . . . . . .

12

. . . . . . . . .

13

~

. . . . . . . . . . . . .

a

.

.

.

.

.

.


.

.

.

.

.

.

.

.

.

.

.

. . . . .

a

. . . . . . . . . . . . . . . . . . . . . .

08

09

.

.

;

.

. . . . . . .

f*

.
.

.

.

.

.

.

.

.


.

.

.

.

.

.

.

.

.

.

.

.

.

.

.


.

.

.

.

.

.

.

.

.

.

.

.

.

.

.


.

.

.

.

.

.

.

.

.

.

.

.

.

.

b


. . . . . . . . . . . . . . . . . . . . . . . . . .

.

.

. . . . . . .

b

a

. . . . . . . . . . . . .

b

a

. . . . . . . . . . . . .

b

.

.

.

.


.

.

.

.

.

.

.

.

.

.

.

.

.

.

.


.

.

.

.

.

.

.

.

.

.

.

.

.

.
.


.

.

.
.

.

.

.
.

.

.

.

.

.

.

.

.


.

.

.

.

.

.

.

.

.

.

.

.

.

. . . . . . . . . . . . . . . . . . . . . . . . . . .
b

*f


b

d

.

.

. . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .
. . . . .

.

.

.

. . . . . . . . . . . . .

. . . . . . .

f f f f f f f . . * . . f f f f f . d
f f f

.
.


. . . . . . . . . . .

ecc

. . . . . . . . . . . .

. . . .

e e e e e e e . ,

c

. . . . . . . . . . . .

. . . . . . .

* e e e

. . . . . . . . .

c

. . . . . . . . . . . .

. . . . . . . . .

d

. . . . . . .


b

. . . . . . . . . . .

c

. . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . .

a

. . . . . . . . .

d

. . . . . . .

b

. . . . . . . . . . .

c

. . . . . . . . . . . . .

14

. . . . . . . . . . . . . . . . . . . .


a

. . . . . . . . .

d

. . . . . . .

b

. . . . . . . . . . .

c

. . . . . . . . . . . . .

15

. . . . . . . . . . . . . . . . . . . .

a

. . . . . . . . .

d

. . . . . . .

b


. . . . . . . . . . .

c

. . . . . . . . . . . . .

16

. . . . . . . . . . . . . . . . . . .

a

. . . . . . . . .

d

. . . . . . . .

b

. . . . . . . . . . .

c

. . . . . . . . . . . . .

17

. . . . . . . . . . . . . . . . . . .


a

. . . . . . . . .

d

. . . . . . . .

b

. . . . . . . . . .

c

. . . . . . . . . . . . . .

18

. . . . . . . . . . . . . . . . . . .

a

. . . . . . . . .

d

. . . . . . . . . .

*


. . . . . . . . . . . . . .

19

. . . . . . . . . . . . . . . . . .

20

. . . . . . . . . . . . . . . . .

a

a

. . . .

21

. . . . . . . . . . . . . . . . .

a

. . . . .

g

. . . . . .
g

22


. . . . . . . . . . . . . . . .

23

. . . . . . . . . . . . . . .

a

. . . . . . . . .

24

. . . . . . . . . . . . . . .

a

. . . . . . . . . .

25

. . . . . . . . . . . . . .

26

. . . . . . . . . . . . .

a

. . . . . . . . . .


a
a
aa

. . . . . . .

. . . .

i

i

27

. . . . . . . . . . .
. . . . . . . . . .
. . . . . . . .

30

. . . . . . .

31

. . . . . . . . . . . . . . . . . . . . .

i

32


. . . . . . . . . . . . . . . . . . . . .

i

33

. . . . . . . . . . . . . . . . . . . . .

i

34

. . .

35

. . . . . . . . . . . . . . . . . . . . . . .

36

. . . . . . . . . . . . . . . . . . . . . .

37

. . . . . . . . . . . . . . . . . . . . .

38

. . . . . . . . . . . . . . . . . . . . .


39

.

a

. . . . . .

i

.

.

.

.

.

.

i
i

. . . . . . . . . . . . .

.


.

.

.

.

.

.

.

.

.

.

. . . . . . . . .

. . . . . .

h h h h . h h

. . . . . . . . . . . .

* h h h h h


. . . . . . .

hh

b

. . . . . . . .

b

. . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . .

b

. . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . .

b

. . . . . . . . . . . . . . . . . . . . . . . . . .

d

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .


k k k k j

. . . . . . . . . . . . . . . . . . .

. . . . . .

k k k k k k k k k k k k k

. . . . .

j

* k k k k k

. . . . . . . . . . . . . . . . . .

.

40

.

.

.

.

.


.

.

.

.

.

.

.

.

.

.

.

.

.

.

.


.

.

.

.

.

.

.

.

.

.

.

.

.

.

.


.

.

.

.

.

.

.

.

.

42

.

.

.

.

.


.

.

.

.

.

.

.

.

.

.

.

.

.

.

.


.

.

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

j

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

j

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

j

. . . . . . . . . . . . . . . . . .

]

. . . . . . . . . . . . . . . . . .

j


. . . . . . . . . . . . . . . . . .

. . . . . . . . .

t t l t t t t l l

* L L L L L L L L

. . . . . . . . . . . . .

. . . .

. . . . . . . . . . . . . . . . . . . . .

j

. . . . . . . . . . . . . . . . . . .

i

. . . . . . . . . . . . . . . . . . . . . .

j

. . . . . . . . . . . . . . . . . . .

i

. . . . . . . . . . . . . . . . . . . . . .


j

. . . . . . . . . . . . . . . . . . .

i

. . . . . . . . . . . . . . .
i

.

m m m . m . . j

.

.

.

.

.

.

.

.

.


.

j

.

.

.

.

.

.

.

.

.

.

.

.

.


.

.

.

.

.

.

.

.

.

.

.

.

.

.

.


.

.

.

.

.

.

.

.

.

.

.

.

.

.

i


. . . . . . . . . . . . . . . . . . . . . .

j

i

. . . . . . . . . . . . . . . . . . . . . .

j

.

.

.

.

.

.

.

.

.

i


. . . . . . . . . . . . . .

n n n n n n n *

.

. . . . . . . .

46

.

.

.

.

.

.

.

.

.

.


.

.

.

.

.

.

.

.

.

.

.

47

.

.

.


.

.

.

.

.

.

.

.

.

.

.

.

.

.

.


.

.

.

48

.

.

.

.

.

.

.

.

.

.

.


.

.

.

.

.

.

.

.

.

.

49

.

.

.

.


.

.

.

.

.

.

.

.

.

.

.

.

.

.

.


.

i

. . . . . .

o . . . . . . . . . . . . . . . .

50

.

.

.

.

.

.

.

.

.

.


.

.

.

.

.

.

.

.

.

.

i

. . . . . .

o

. . . . . . . .

.


.

.

.

.

.

J

. . . . . .

0

. . . . . . . . . .

.

.

54

.

.

.


.

.

.

.

.

.

.

.

.

.

.

.

.

.

.


.

.

.

. . . . . . . . . . . . . . .

j

.

.

.

.

.

.

.

.

.

.


.

.

.

.

.

.

.

.

.

pp

.

.

p

.

.


j

.

.

.

.
.

.
.

.

.

.

.

.

.

.

. . . . . . . . . .


p

.

.

.

.

.

.

.

.

.

.

.

.

.

.


.

.

.

.

.

.

.

.

.

.

.

.

.

.

.


.

.

. . . . . . . . . . . . . . . . . .

57

.

.

.

.

.

.

.

.

58

. . . . . . . . . . . . . . . . .

59


. . . . . . . . . . . . . . .

60

. . . . . . . . . . . . . .

o

.

61

. . . . . . . . . . . . . .

o

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

62

.

.

.

.

.


.

.

.

.

.

.

.

.

.

.

.

.

.

.

.


.

.

.

.

.

.

.

.

.

.

.

.

.

.

.


.

.

.

.

.

.

.

.

.

.

.

.

.

.

.


.

.

.

.

.

.

63

.

.

.

.

.

.

.

.


.

.

.

.

.

.

.

.

.

.

.

.

.

.

.


.

.

.

.

.

.

.

.

.

.

.

.

.

.

.


.

.

.

.

.

.

.

.

.

.

.

.

.

.

.


.

.

.

63

.

.

.

.

.

.

.

.

.

.

.


.

.

.

.

.

.

.

.

.

.

.

.

.

.

.


.

.

.

.

.

.

.

.

.

.

.

.

.

.

.


.

.

.

.

.

.

.

.

.

.

.

.

.

.
.


.
.

.

.

.

.

.

.

.

.

.

.

.
.

.

.


.

.

.

.

.

.

.

.

.
.

.

.
.

.

.

.


.

.

.

.

.

.

p

.

.

.

.

.

.

.

.


.

pp

.

.

.

.

.

.

.

oo

oo

. . . . . . . . . . . . . . . . . . . . . . . . . . .

.

.

.


.

.

.

.

.

.

.

.

.

.

.

.

.

.

.


.

.

.

.

.

.

.

.

.

.

.

.

.

.

.


.

.

.

.

.

.

.

.
.

p

. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.

.

.

.

.


.

.

.

.

. . .

. . . . . . . . . . . . . . .

.

.

.

.

. . . . . . . . . . . . . . . . . . . . . .

.

.

.

.


.

. . . . . . . . . . . . . . . . . . . . . . . .
pp

.

.

.

.

.

.

.

.

.

.

.

.

.


.

.

.

.

.

.

.

.

.

.

.

.

.

.

.


.

.

.

.

.

.

.

.

.

.

.

.

.

.

.


. . . . . . . . . . . .

o

.

.

.

. . . . . . . . . . . . . .

o

.

.

.

.

oo

0

.

.


.

.

.

.

.

.

.

.

.

.

.

.

.

.

.


.

.

.

.

.

.

.

o
oo

.

.

.

.

.

.


*n

.

.

j

.

.

.

. . . . .

.

j

. . . . . . . . . . . . . .

.

.

.

n n n n n n


.

.

.

. . . .

. . . . . . . . . . . . . . . . . . .
.

55

.

.

nn

. . . . . . . . . . . . . . . . . . .

.

56

.

.

.


* n . . n . . n

.

.

.

.

.

.

.

.

.

.

.

.

.

.


.

.

.

.

.

.

.

.

.

.

.

.

.

.

.


.

.

.

.

.

.

.

.

.

.

.

.

.

.

.


.

.

.

.

.

.

.

.

.

.

.

. . . . . . . . . . . . . . . . . . . . . . . . . .

.

.

.


. . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . .

.

.

.

.

53

.

.

.

. . . . . . . . . . . . . . . . . . . . .

52

.

.

j


45

i

.

j

. . . . . . . . . . . . .

44

i

.

m m m m m m . . . m . . ,

i

.

.

. . . . . . . .
*mmmmmmm

43


.

. . . . . . . . . . . . . . . . . .

j

i

41

51

. . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . .

*

g . d

i

.

b

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . .


. . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

.

b

. . . . . . . .
. . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

29

a

d

g . . . d

. . . . . .

. . . . . .

28

aa

. . . .


. . . . . . . .

d
d

.

.

.

.

.
.

.

.

.

p

.

.

.


.

.

.

.

p

. . . . . . . . . . . . . . . . .

p

. . . . . . . . . . . . . . . . .

.

Fig 12 The stroke extraction result of the input character~~'

.

.

.

.

.


.

.

.

.

.

.

.

.

.

.

.

.

.

.

.


.

.

.

.

.

.

.

.

.

.

.

.

.
.

.


.

.

.
.

.

.

.

.

.

.

.

.

.

.

.

.


.

.

.

.
.

.

.
.

.

.

.

.

.

.

~.
. . . .


. . . .
I . . . .

. . . . . . . . .
.

.

.

.

.

.

.

.

.

.


Stroke extraction for Chinese characters

1799

end point if it links to another candidate within BUD

points; else it is hidden end point. We search out and
check all candidates and change the labels of the
hidden end points into l's. When there is no such
3-fork points, the extraction of strokes is finished. It
should be noted that the 4-fork point is a cross of two
strokes. Therefore, there is no hidden end point for the
4-fork point.
Figure 12 gives the result for the stroke extraction of
the input c h a r a c t e r ~~".

transcribing an f-segmental branch forward, if the next
point of the terminal point of this f-segmental branch is
a fork point, then the transcribed middle points of this
forward f-segmental branch are relabeled as - 2, where
the negative sign indicates that they have been transcribed and the absolute value 2 is used to preserve
this segmental branch for other strokes which
also share them. If the point is not a fork point, then
the transcribed middle points of this forward f-segmental branch are relabeled 0. An end point and
a corner point are relabeled as described in Subsection
2.3.1. It should be noted that the labels of fork points
are never changed. After transcribing to the found
forward f-segmental branch, we continue the transcription of this stroke as described in the last
subsection.

3, EXPERIMENT RESULTS
The merit of the stroke extraction algorithms which
utilize the thinning process is the ease of the feature
abstracting from the skeleton of a character• Because
that the skeleton of a character is the collection of the
segmental strokes, there are two main tasks for this

kind of stroke extraction algorithms. One is to find the
certain adjacent segmental strokes for being merged
into a complete stroke and the other is to search the

2.3.4. Find the hidden end points of strokes. The
3-fork point which leaves only one branch point that
has not been transcribed is a candidate for the end
point of a certain stroke. A candidate is not a hidden

"A

" ' A ' '
• A



• "A

"

'A

"

-A
"A

.

"A


.

. . . . . . .

" "B
.
.

A

. . . . .
. .

.
.

.

.

.

....

.

..FF

........


"B

.

.

• " 'B

'B
'B

" B
"

"

"B

B B

'

"

II1
'

.


.

D D D D '

D D D D D D D
.

.

.

.

.

.

. . . .

DD

.

.::oG0o

. . . . . . . . . . . . . .

' •

G - "


"G

' B 3 ' "
'
" "

"G

H

• "

"G"
'
'

G

"HG"

"
.

.

....

f2
'


'3B"

.

_f3

"~X3 . . . .

3 B
'3
" '



' • "
'

.~',,,,

" 3 "

"BB
BB
.

.

. . . .
DDDD


.3..

•B

"
' "
• •

.

" B
"DD
" " "3
. . . .

" "

"B

.~3.

'A

X "
"E

B

XB

B
'B

A

FF.FFFFFFF~VFFFF~FFFF"

B B "

C C C C C '

C X C C C C
A
•A

F

B

.

. . . . . . . . . .

C C C C C A
.

'B
" B

"

C C C C C C C C C
. . . . . .

B
B
'

.

.

X "
"H



'H

.

H
I IXI

.

.......
IIIlllIll
l l l l l l l
. . . . . . . .
"! . . . . . . . .


.

.

" "R

.

.
I

.

. . .
" "11

.

.

. . .
I IlI

II .....

.

.


.

I11111
I I I

.......

H
H "

'H

B

"

H
"H
H
H
'H
'H

'

"

N

" H H "

"

'H

H

H

"

H

'

"

"

"H

H

"
"
"

"

Fig. 13. The stroke extraction result o f ~ u s i n g the Ogawa's algorithm symbol X shows the fork point.



1800

J.-R. LIN and C.-F. CHEN

"A
'A
'A





' ' A '
' 'A
'
• "A

• A •
"A
"
'A
.
"A
.
' +A
C C C C C ' A

. . . . . . .
C C C C C C C C C

. . . . .

. . . . .

"
"
.
.
.

.

.
. . . .

"

.

' E E E E E E E E E E E E E X B B

''

. . . . . .
. . . .

BB




XB

'B

'B

B

"B
"B
B

' X

.

.

"

'

+






"


'



'
"

.

.

.

.

.

B B
" ' B ' "
B B '
B B
" "
BB
.
.
.
• •

.


.



I

.

.

.

.

.

.

.

.

.

.

D D D D "

H

H

.

.

.

.

'

F
F

.

. . . .

.

F

.

"

.......
llIllII


Ill

~

"X
.
.
.
.
.
B G
. . . . . . . . . .
B ' " " G G G G G G G G G G ' F
. . . . . .
X

.

.

.

' B '

B B

"

B
'


B
B

B

B
B
"

"B

"

'
" '
• •

' B B ' "
'B

"B

.
.
.
.
.
. . . . .
C C C C C


C X C C C C
A
'A
A
A
• A

"

B
B
'B
'B
B
B
B "

"

.

F

"
"

"
"


"

F "
X
"N
~
"H
.
.
H
.
.
H . . . .

IIIIIIIIIIIXIIII
. . . . . . .

F F F F
F

.
.

.
.

.
.

IIIIIIIII

.....

.
I

.
I II
I I
. . . . . . .

"

H"

. . . . . . .

H

H
H
H
H
H
H

B



J


'

H

"

H

~sa
.

j

.

~

,'

.J

.' j N .

.

Fig. 14. The stroke extraction result of :~using the Liao's algorithm symbol X shows the fork point.

corner point to divide the bend segmental strokes into
two or more individual strokes. The essential difficulty

for the former is to process the violent distortions on
the neighborhood of the split fork points. For solving
this problem, the general methods include two phases.
For phase one, we merge the distorted split fork points
according to their distances and peripheral conditions•
For phase two, we conclude which adjacent segmental
strokes can be composed a complete stroke based on
the orientation of them. In the algorithm of Ogawa
et al., (16~ they specify a threshold of distance, which
depends on the width of strokes and some conditions
for the angles of the peripheral segmental strokes, to
decide whether two fork points should be merged or
not. The drawback of the algorithm is the lack of
adaptability. As Fig. 13 shows, it fails to merge fl and
f2 owing to violating the peripheral conditions and it
fails to merge f2 and f3 owing to violating the distance
limitation. In the algorithm of Liao et al., (+) they propose assistant variable-radius circles to provide the
adaptable threshold of the distance. The exception of

this algorithm will arise when a fork point is located
near to the boundary of the original stroke. The reason
is that it makes the radius of the assistant circle too
small to bring a correct conclusion. AsFig. 14 shows, it
succeeds to merge fl and f2 into f12, but it fails to
merge f12 and f3 owing to the above reason• Since
both of the two algorithms do not take the shared
segmental stroke of two intersected strokes into consideration, the failure of phase one causes the failure of
phase two. However, the proposed algorithm described in Subsection 2.3.3 handles the fork points by
adaptedly traveling through the distorted area to get
reliable information, instead of the work of phase one

and takes the sharing of segmental strokes into consideration, so that the intersected strokes are correctly
extracted. Figure 15 shows the result of our proposed
algorithm.
The observation of our experiments shows the excellent adaptability of the proposed algorithm. Likewise,
the corner-searching algorithm stated in Section 2.2 is
adaptable for both the smooth transition corner and


Stroke extraction for Chinese characters

"A
A
"A
" •
" ' A '
' A

• "A

"A
'A
"A

"
.
.

"A

.


•A
C C C C C X

. . . . . . .
C C C C C C C C C
.
.
.
.
.


'
.
.
.

.

.

.
.
.

.

A


.
.

.

.

.

.

X C C C C
A

A
A
'A

'
"

"A

"

"X"

"
'B


'X

B

.

.

.

F F

* *



"X

"B

.

.

.

" " F F F F F F F F F F

.


.

.

.

.

.

.

.

.

.

G

"

E

"B
I J B
B




"



"

'G

"

. . . . . . .
H H H H H H H H H H
.

.

. . . .

"

DD

E

" •

"

E


"

"

.
G

. . . .

'H

.
D D D D



'G

'



.

" G E "
"X

' B B
'


.
.

E
'

" ' B ' "
BB

B B

.
.

X'

' ' B ' "

BB


.

E
" E

.

B


B

.

.

D D D D D D D
. . . . . . .

-::iEEE
. . . . . . . . . .

"

.

'

.

.

"

.x-.
' * F
+


"


'

+ D D
. . . .
D ' D
' • "+
. . . .
DDDD
" " ' D D D D D D D D D D "
" "D'+
. . . .
DD
" D D D D D D D D
. . . . . . . . . .
X
" ' D D
.
.
.
.
.
.
.
.
.
.
B ' '
'"
" B B "

B
.
.
.
.

. . . . . .

"



" • •

'
'

"


'

"B
'
B'
.
.

.


"

"B
B
B

B
'B

XB
"B

B
"B
"B

B B "

. . . . . .
C C C C C '

. . . . .

. . . . . .
.
.

1801

.


HX

H H H H H H H H H "
. . . . . . . . . .
.

.

.

.

"XHHHH
G

.

.

.
.

.

.
.

.


.
.

.
.

H H H H H H H H H

.
H H H H H H
. . . . . . .

. . . . .

"G
'G
"G
"G
"G
"G

B

"1
I

.
.

. . . . .


G
"G"

" '
•I I "

"G"

1

G

' ' I ' G "
'



'I

'

"

Fig. 15. The stroke extraction result of ~ u s i n g the proposed algorithm. Symbol X shows the fork point and
symbol + shows the shared point•

Table 1. The sets of parameters used in the experiments
Font
Parameter

SL
CL
C ANGLE
L_ANGLE
BUD

Fang-Sung

Ming

8
6

8
6

100

110

25
4

32
6

the abrupt transition corner such that the correct rate
of about 99% is obtained. That is, only for few strokes,
such as the right leaning stroke of ]~, some middle
points of them are mistaken as corner points. In the

experiments, we test the most frequently used 1500
Chinese characters printed in both Ming font and
F a n g - S u n g font with the size of 64 x 64 points. Since
the shapes of strokes are different according to their

fonts, the parameters for the stroke extraction procedure are adjusted to make suitable for different fonts.
The sets of parameters are listed in Table 1. In these
experiments, the following essential rules are used to
determine the correctness of stroke extraction results.
(i) A representative stroke that is a dot (-), a horizontal
stroke (--), a vertical stroke (i), or a leaning stroke
( / o r ".,) is extracted to be an individual stroke. If some
of these strokes make contact with each other, they
should be exactly extracted without being split up or
being merged. (ii) A bent stroke, such as 7 , should be
split up into two representative strokes at its corner
point• (iii) Two connective strokes which have similar
trends and construct a representative stroke are extracted as one stroke. F r o m the results of the experiment, the rates for correctly extracting all strokes of
a character are 97.8% for Ming font and 98.4% for
Fang Sung font for the 1500 Chinese characters. This
shows that the proposed stroke extraction algorithm is
useful and reliable. Some results of the stroke extraction are shown in Figs 16 19.


1802

/

Fig. 16. The result for the stroke extraction of the input character,~.


/

/

-"7

e'
/

Fig. 17. The result for the stroke extraction of the input character 2 ~ .


Stroke extraction for Chinese characters

1803

/

\

Fig. 18. The result for the stroke extraction of the input character'l~.

4. CONCLUSION
The two main tasks of the Chinese character stroke
extraction algorithms with thinning process are to find
corner points and to handle fork points. The difficulty
for finding the corner point is that the smooth transition corner and the abrupt transition corner exist in
a Chinese character. And the difficulty for handling the
fork point is the violent distortion on the fork point
produced by the thinning process, such as the split of

one fork point into two or more fork points. In this
paper, we propose an intuitive and effective stroke
extraction algorithm that applies the trend-followed
transcribing technique to correctly accomplish these
two tasks. The proposed stroke extraction algorithm
consists of three stages: (i) thin the character and label

the feature points; (ii) search and label the corner
points; and (iii) transcribe each stroke from its one end
point and record the trace until reaching the other end
point of this stroke or reaching the fork point that is
judged to be the end of this stroke by calculating the
trends of the branches. In the experiments, we test the
most frequently used 1500 Chinese characters printed
in both the Ming font and the Fang-Sung font with the
size of 64 x 64 points. The experiment results show the
excellent adaptability of the proposed algorithm. The
corner-searching algorithm stated in Section 2.2 is
adaptable for both the smooth transition corner and
the abrupt transition corner and gives the correct rate
of about 99%. The experiments also give the rates for
correctly extracting all strokes of a character 97.8 % for
the Ming font and 98.4% for the Fang Sung font for


1804

J.-R. LIN and C.-F. CHEN

gO

-j
j

Fig. 19. The result for the stroke extraction of the input character ~ J .

the 1500 Chinese characters. This shows that the proposed stroke extraction algorithm is useful and reliable.
Acknowledgements The authors would like to thank the
anonymous reviewers for their valuable suggestions. This
work was supported in part by the National Science Council,
Taiwan, under Grant NSC 82-0408-E-036-069, and by the
Tatung Company under Contract 81-1207-39. The authors
are grateful for their financial supports.

REFERENCES

1. P. P. Wang and R. C. Shiau, Machine recognition of
printed Chinese characters via Transformation algorithm, Pattern Recognition 5, 303 321 (1973).

2. S. Mori, K. Yamamoto and M. Yasuda, Research on
machine recognition of handprinted characters, IEEE
Trans. Pattern Anal. Mach. lntell. 6, 386 405 (1984).
3. Y. X. Gu, Q. R. Wang and C. 3". Sunen, Application of
a multilayer decision tree in computer recognition of
Chinese characters, IEEE Trans. Pattern Anal. Mach.
Intell. 5, 83-89 (1983).
4. C. W. Liao and J. S. Huang, A transformation invariant matching algorithm for handwritten Chinese
character recognition, Pattern Recognition 23, 1167
1188 (1990).
5. J. S. Huang and M.-L. Chung, Separating similar complex Chinese characters by Walsh Transform, Pattern
Recognition 20, 425 428 (1987).

6. C.H. Leung, Y. S. Cheung and Y. L. Wong, A knowledgebased stroke-matching method for Chinese character
recognition, IEEE Trans. Systems, Man Cybern. 17, 9931003 (1987).


Stroke extraction for Chinese characters

7. K. P. Chan and Y. S. Cheung, Correction to Fuzzyattribute graph with application to Chinese character
recognition, IEEE Trans. Systems, Man Cybern. 22, 402410 (1992).
8. L.-H. Chen and J.-R. Lieh, Handwritten character
recognition using a 2-layer random graph model by
relaxation matching, Pattern Recognition 23, 1189-1205
(1990).
9. S.W. Lu, Y. Ren and C. Y. Suen, Hierarchical attributed
graph representation and recognition of handwritten
Chinese characters, Pattern Recognition 24, 617-632
(1991).
10. F.H. Cheng and W. H. Hsu, Research on Chinese OCR in
Taiwan, International d. Pattern Recognition Artif. Intell.
5, 139-164 (1991).
11. N. Babaguchi, Y. Kitamura, M. Shiono, H. Sanada and
Y. Tezuka, A method of direction segments extraction
from character pattern without thinning process, Trans.
Electr. Commun. Engin. Japan J65-D, 874-881 (1982).
12. L. Y. Tseng and C. T. Chuang, An efficient knowledgebased stroke extraction method for multi-font Chinese
characters, Pattern Recognition 25, 1445 1458 (1992).

1805

13. F. Chang, H.-S. Don, Y.-C. Chen and W.-L. Hsu, A stroke
segmentation algorithm with application to Chinese

character recognition, Proc. '92 Second National Workshop on Character Recognition R.O.C. 44-57 (1992).
14. C. W. Liao and J. S. Huang, Stroke segmentation by
Bernstein-Bezier curve fitting, Pattern Recognition 23,
475-484 (1990).
15. K. W. Gan and K. T. Lua, A new approach to stroke and
feature point extraction in Chinese character recognition,
Pattern Recognition Lett. 12, 381-387 (1991).
16. H. Ogawa and K. Taniguchi, Thinning and stroke segmentation for handwritten Chinese character recognition, Pattern Recognition 15, 299-308 (1982).
17. Y. Xia, Skeletonization via the realization of the fire
front's propagation and extinction in digital binary
shapes, IEEE Trans. Pattern Anal. Mach. Intell. 11, 10761086 (1989).
18. B.-K. Jang and R. T. Chin, Analysis of thinning algorithms using mathematical morphology, IEEE Trans.
Pattern Anal. Math. Intell. 12, 541-551 (1990).
19. B. Li and C. Y. Suen, A knowledge-based thinning algorithm, Pattern Recognition 24, 1211-1221 (1991).

About the Author--JI-RONG LIN was born in Kee-Lung, Taiwan, on December 24, 1964. He graduated

with a B.S. degree in Electrical Engineering from Tatung Institute of Technology, Taipei, Taiwan, in 1989. Mr
Lin is now a Ph.D. candidate in the Department of Electrical Engineering, Tatung Institute of Technology.
His research interests are pattern recognition, image processing, and signal processing.

About the A u t h o r - - C H A N G - F U U CHEN was born in Ping-Tung, Taiwan, on March 26, 1947. He received

the B.S. degree (with great distinction) from Tatung Institute of Technology, Taipei, Taiwan, in 1969 and the
M.S., Engineer, and Ph.D. degrees from Stanford University, Stanford, CA, in 1974, 1975 and 1982,
respectively, all in Electrical Engineering.
After being an Electrical Engineering Officer of the Combined Service Forces, he joined Tatung Institute of
Technology in 1970. He became an Associate Professor in the Department of Electrical Engineering in 1977,
and has been a Professor since 1984 (1973-1975, 1979-1983, on leave). He also was Chairman of the
department from 1983 to 1991. From February 1982 to August 1983 he was a Senior Technology Staffofthe

Advanced Marketing and Technology Department of Granger Associates Co., Santa Clara, CA, where he
was a major contributor to the digital signal processing functions of the digital echo canceller system for the
company. From December 1984 to June 1986 he was General Manager of the Information Products Division
and the Computer Plant, Tatung Company, Taiwan, during which time he turned the plant over from huge
losses to profit and computerized the company. Since 1986, he has also been a consultant of President Room
for the company. He was a recipient of the Industrial Education Award sponsored by the Hsieh-Chih
Industrial Foundation, Taiwan, in 1989, and the Distinguished Teaching Award sponsored by the Ministry
of Education of the Republic of China in 1992. His current research interests include digital systems and
digital signal processing.



×