Tải bản đầy đủ (.pdf) (11 trang)

chap5 imbalanced classes

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (146.32 KB, 11 trang )

Data Mining
Classification: Alternative Techniques

Imbalanced Class Problem

10/09/2007

Introduction to Data Mining

1

Class Imbalance Problem
z

Lots of classification problems where the classes
are skewed (more records from one class than
another)
– Credit card fraud
– Intrusion detection
– Defective products in manufacturing assembly
line

10/09/2007

Introduction to Data Mining

2


Challenges
z



Evaluation measures such as accuracy is not
well-suited for imbalanced class

z

Detecting the rare class is like finding needle in a
haystack

10/09/2007

Introduction to Data Mining

3

Confusion Matrix
z

Confusion Matrix:
PREDICTED CLASS
Class=Yes
Class=Yes

ACTUAL
CLASS Class=No

Class=No

a


b

c

d

a: TP (true positive)
b: FN (false negative)
c: FP (false positive)
d: TN (true negative)
10/09/2007

Introduction to Data Mining

4


Accuracy
PREDICTED CLASS
Class=Yes

ACTUAL
CLASS

z

Class=No

Class=Yes


a
(TP)

b
(FN)

Class=No

c
(FP)

d
(TN)

Most widely-used metric:

Accuracy =
10/09/2007

a+d
TP + TN
=
a + b + c + d TP + TN + FP + FN
Introduction to Data Mining

5

Problem with Accuracy
z


Consider a 2-class problem
– Number of Class 0 examples = 9990
– Number of Class 1 examples = 10

z

If a model predicts everything to be class 0,
accuracy is 9990/10000 = 99.9 %
– This is misleading because the model does
not detect any class 1 example
– Detecting the rare class is usually more
interesting (e.g., frauds, intrusions, defects,
etc)

10/09/2007

Introduction to Data Mining

6


Alternative Measures
PREDICTED CLASS
Class=Yes
Class=Yes

ACTUAL
CLASS Class=No

Precision (p) =

Recall (r) =

a

b

c

d

a
a+c

a
a+b

F - measure (F) =
10/09/2007

Class=No

2rp
2a
=
r + p 2a + b + c

Introduction to Data Mining

7


ROC (Receiver Operating Characteristic)
A graphical approach for displaying trade-off
between detection rate and false alarm rate
z Developed
D
l
d iin 1950
1950s ffor signal
i
ld
detection
t ti th
theory tto
analyze noisy signals
z ROC curve plots TPR against FPR
– Performance of a model represented as a
point in an ROC curve
– Changing the threshold parameter of classifier
changes the location of the point
z

10/09/2007

Introduction to Data Mining

8


ROC Curve
(TPR,FPR):

z (0,0): declare everything
to be negative class
z (1,1): declare everything
to be positive class
z (1,0): ideal
z

Diagonal line:
– Random guessing
– Below diagonal line:
‹ prediction is opposite of
the true class

10/09/2007

Introduction to Data Mining

9

ROC (Receiver Operating Characteristic)
z

To draw ROC curve, classifier must produce
continuous-valued output
– Outputs are used to rank test records
records, from the most
likely positive class record to the least likely positive
class record

z


Many classifiers produce only discrete outputs (i.e.,
predicted class)
– How to get continuous-valued outputs?
‹ Decision trees, rule-based classifiers, neural networks,
Bayesian classifiers, k-nearest neighbors, SVM

10/09/2007

Introduction to Data Mining

10


Example: Decision Trees
Decision Tree

C ti
Continuous-valued
l d outputs
t t

10/09/2007

Introduction to Data Mining

11

ROC Curve Example


10/09/2007

Introduction to Data Mining

12


ROC Curve Example
- 1-dimensional data set containing 2 classes (positive and negative)
- Any points located at x > t is classified as positive

At threshold t:
TPR=0.5, FNR=0.5, FPR=0.12, FNR=0.88
10/09/2007

Introduction to Data Mining

13

Using ROC for Model Comparison
z

No model consistently
outperform the other
z M1 is better for
small FPR
z M2 is better for
large FPR

z


Area Under the ROC
curve
z

Ideal:
ƒ Area

z

Random guess:
ƒ Area

10/09/2007

Introduction to Data Mining

=1
= 0.5
14


How to Construct an ROC curve
Instance

• Use classifier that produces
continuous-valued output for
each test instance score(+|A)

score(+|A) True Class


1

0.95

+

2

0 93
0.93

+

3

0.87

-

4

0.85

-

5

0.85


-

6

0.85

+

7

0.76

-

8

0 53
0.53

+

9

0.43

-

10

0.25


+

• Sort the instances according
to score(+|A) in decreasing
order
• Apply threshold at each
unique value of score(+|A)
• Count the number of TP, FP,
TN, FN at each threshold
•TPR = TP/(TP+FN)
•FPR = FP/(FP + TN)

10/09/2007

Introduction to Data Mining

15

How to construct an ROC curve
+

-

+

-

-


-

+

-

+

+

0.25

0.43

0.53

0.76

0.85

0.85

0.85

0.87

0.93

0.95


1.00

TP

5

4

4

3

3

3

3

2

2

1

0

FP

5


5

4

4

3

2

1

1

0

0

0

TN

0

0

1

1


2

3

4

4

5

5

5
5

Class

Threshold >=

FN

0

1

1

2

2


2

2

3

3

4

TPR

1

0.8

0.8

0.6

0.6

0.6

0.6

0.4

0.4


0.2

0

FPR

1

1

0.8

0.8

0.6

0.4

0.2

0.2

0

0

0

ROC Curve:


10/09/2007

Introduction to Data Mining

16


Handling Class Imbalanced Problem
z

Class-based ordering (e.g. RIPPER)
– Rules for rare class have higher priority

z

Cost-sensitive classification
– Misclassifying rare class as majority class is
more expensive than misclassifying majority
as rare class

z

Sampling-based approaches

10/09/2007

Introduction to Data Mining

17


Cost Matrix
PREDICTED CLASS
Class=Yes

ACTUAL
CLASS Class=Yes
Class=No

Cost
Matrix

10/09/2007

f(Yes, Yes)

f(Yes,No)

f(No, Yes)

f(No, No)

C(i,j): Cost of
misclassifying class i
example as class j

PREDICTED CLASS
C(i j)
C(i,


ACTUAL
CLASS

Class=No

Cl
Class=Yes
Y

Cl
Class=No
N

Class=Yes

C(Yes, Yes)

C(Yes, No)

Class=No

C(No, Yes)

C(No, No)

Introduction to Data Mining

C t = ∑ C (i , j ) × f (i , j )
Cost


18


Computing Cost of Classification
Cost
Matrix

PREDICTED CLASS

ACTUAL
CLASS

Model
M1

+

-

+

-1

100

-

1

0


PREDICTED CLASS

+
ACTUAL
CLASS

C(i,j)

Model
M2

-

+

150

40

-

60

250

ACTUAL
CLASS

Accuracy = 80%

Cost = 3910
10/09/2007

PREDICTED CLASS

+

-

+

250

45

-

5

200

Accuracy = 90%
Cost = 4255
Introduction to Data Mining

19

Cost Sensitive Classification
z


Example: Bayesian classifer
– Given a test record x:
Compute p(i|x) for each class i
‹ Decision rule: classify node as class k if
‹

k = arg max p(i | x )
i

– For 2-class, classify x as + if p(+|x) > p(-|x)
‹

10/09/2007

This decision rule implicitly assumes that
C(+|+) = C(-|-) = 0 and C(+|-) = C(-|+)
Introduction to Data Mining

20


Cost Sensitive Classification
z

General decision rule:
– Classify test record x as class k if

k = arg min ∑ p(i | x ) × C (i, j )
j


z

i

2-class:
– Cost(+) = p(+|x) C(+,+) + p(-|x) C(-,+)
– Cost(-)
( ) = p(+|x)
( | ) C(+,-)
( ) + p(-|x)
( | ) C(-,-)
( )
– Decision rule: classify x as + if Cost(+) < Cost(-)
‹

10/09/2007

if C(+,+) = C(-,-) = 0:

p( + | x ) >

C ( −, + )
C ( −, + ) + C ( + , − )

Introduction to Data Mining

21

Sampling-based Approaches
z


Modify the distribution of training data so that rare
class is well-represented in training set
– Undersample
U d
l th
the majority
j it class
l
– Oversample the rare class

z

Advantages and disadvantages

10/09/2007

Introduction to Data Mining

22



Tài liệu bạn tìm kiếm đã sẵn sàng tải về

Tải bản đầy đủ ngay
×