5. 테스팅

5.1 로지스틱 회귀 테스트

랜덤 undersampling: 우리는 랜덤 undersampling subset에서 분류모델의 마지막 성능평가할 것이다. 하지만 이것은 original dataframe으로 부터의 데이터가 아님을 명심하라.
분류 모델 : 가장 성능이 좋은 모델은 logistic regression과 SVC(Support Vector Classifier) 였다.

from sklearn.metrics import confusion_matrix

# SMOTE 기법을 사용하여 Logistic Regression에 fit 하기
y_pred_log_reg = log_reg_sm.predict(X_test)

y_pred_knear = knears_neighbors.predict(X_test)
y_pred_svc = svc.predict(X_test)
y_pred_tree = tree_clf.predict(X_test)

log_reg_cf = confusion_matrix(y_test, y_pred_log_reg)
kneighbors_cf = confusion_matrix(y_test, y_pred_knear)
svc_cf = confusion_matrix(y_test, y_pred_svc)
tree_cf = confusion_matrix(y_test, y_pred_tree)

fig, ax = plt.subplots(2,2,figsize=(22,12))

sns.heatmap(log_reg_cf, ax=ax[0][0], annot=True, cmap=plt.cm.copper)
ax[0][0].set_title("Logistic Regression \n Confusion Matrix", fontsize=14)
ax[0][0].set_xticklabels(['',''], fontsize=14, rotation=90)
ax[0][0].set_yticklabels(['',''], fontsize=14, rotation=360)

sns.heatmap(kneighbors_cf, ax=ax[0][1], annot=True, cmap=plt.cm.copper)
ax[0][1].set_title("KNearsNeighbors \n Confusion Matrix", fontsize=14)
ax[0][1].set_xticklabels(['',''], fontsize=14, rotation=90)
ax[0][1].set_yticklabels(['',''], fontsize=14, rotation=360)

sns.heatmap(log_reg_cf, ax=ax[1][0], annot=True, cmap=plt.cm.copper)
ax[1][0].set_title("Support Vector Classifier \n Confusion Matrix", fontsize=14)
ax[1][0].set_xticklabels(['',''], fontsize=14, rotation=90)
ax[1][0].set_yticklabels(['',''], fontsize=14, rotation=360)

sns.heatmap(log_reg_cf, ax=ax[1][1], annot=True, cmap=plt.cm.copper)
ax[1][1].set_title("Logistic Regression \n Confusion Matrix", fontsize=14)
ax[1][1].set_xticklabels(['',''], fontsize=14, rotation=90)
ax[1][1].set_yticklabels(['',''], fontsize=14, rotation=360)

[Text(0, 0.5, ''), Text(0, 1.5, '')]

from sklearn.metrics import classification_report

print('Logistic Regression:')
print(classification_report(y_test, y_pred_log_reg))

print('KNears Neighbors:')
print(classification_report(y_test, y_pred_knear))

print('Support Vector Classifier:')
print(classification_report(y_test, y_pred_svc))

print('Tree Classifier:')
print(classification_report(y_test, y_pred_tree))

Logistic Regression:
              precision    recall  f1-score   support

           0       0.89      0.98      0.93        89
           1       0.98      0.89      0.93       100

    accuracy                           0.93       189
   macro avg       0.93      0.93      0.93       189
weighted avg       0.94      0.93      0.93       189

KNears Neighbors:
              precision    recall  f1-score   support

           0       0.87      0.99      0.93        89
           1       0.99      0.87      0.93       100

    accuracy                           0.93       189
   macro avg       0.93      0.93      0.93       189
weighted avg       0.93      0.93      0.93       189

Support Vector Classifier:
              precision    recall  f1-score   support

           0       0.89      0.94      0.92        89
           1       0.95      0.90      0.92       100

    accuracy                           0.92       189
   macro avg       0.92      0.92      0.92       189
weighted avg       0.92      0.92      0.92       189

Tree Classifier:
              precision    recall  f1-score   support

           0       0.85      0.97      0.91        89
           1       0.97      0.85      0.90       100

    accuracy                           0.90       189
   macro avg       0.91      0.91      0.90       189
weighted avg       0.91      0.90      0.90       189

# logsitic regression의 테스트 셋의 final score 
from sklearn.metrics import accuracy_score

y_pred = log_reg.predict(X_test)
undersample_score = accuracy_score(y_test, y_pred)

# SMOTE 기술과 Logistic Regression

y_pred_sm = best_est.predict(original_Xtest)
oversample_score = accuracy_score(original_ytest, y_pred_sm)

d = {'Technique': ['Random UnderSampling', 'Oversampling (SMOTE)'], 'Score':[undersample_score, oversample_score]}
final_df = pd.DataFrame(data=d)


# Move Column
score = final_df['Score']
final_df.drop('Score', axis = 1, inplace=True)
final_df.insert(1, 'Score', score)

final_df

점수를 비교한 결과, SMOTE 기법을 적용한 Oversampling 기법이 score가 더 높게 나왔다.

5.2 신경망 테스트

이 섹션에서는 UnderSample 또는 OverSample(SMOTE)를 구현한 두가지 Logistic Regression 모델 중 Fraud, Non-Fraud를 탐지하는데 더 나은 모델을 확인하기 위해 간단한 신경망 모델을 구현한다.
우리는 Fraud 데이터에 집중하지 않고 Non-Fraud 거래에 중점을 둘 것이다. 왜냐하면 카드 소지자가 물건을 구입한 후 은행의 알고리즘에 의해 카드가 막혔다고 하면 안되기 때문이다.
신경망 구조(Neural Network Structure)
- 32개의 노드를 가진 hidden layer 구성
- outnode는 2개(0 혹은 1)
- learning late : 0.001
- optimizer : AdamOptimizer
- activation function : Relu
- final output은 sparse categorical cross entropy 적용
  - 이것은 결과가 Fraud, Non-Fraud의 확률을 제공함.

Keras, RandomUnderSampling

import keras
from keras import backend as K
from keras.models import Sequential
from keras.layers import Activation
from keras.layers.core import Dense
from keras.optimizers import Adam
from keras.metrics import categorical_crossentropy

n_inputs = X_train.shape[1]

undersample_model = Sequential([
    Dense(n_inputs, input_shape=(n_inputs, ), activation='relu'),
    Dense(32, activation='relu'),
    Dense(2, activation='softmax')
])

undersample_model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense (Dense)                (None, 30)                930       
_________________________________________________________________
dense_1 (Dense)              (None, 32)                992       
_________________________________________________________________
dense_2 (Dense)              (None, 2)                 66        
=================================================================
Total params: 1,988
Trainable params: 1,988
Non-trainable params: 0
_________________________________________________________________

undersample_model.compile(Adam(lr=0.001), loss='sparse_categorical_crossentropy', metrics=['accuracy'])

undersample_model.fit(X_train, y_train, validation_split=0.2, batch_size=25, epochs=20, shuffle=True, verbose=2)

Epoch 1/20
25/25 - 0s - loss: 1.0006 - accuracy: 0.6523 - val_loss: 0.3953 - val_accuracy: 0.8026
Epoch 2/20
25/25 - 0s - loss: 0.3127 - accuracy: 0.8874 - val_loss: 0.3401 - val_accuracy: 0.8684
Epoch 3/20
25/25 - 0s - loss: 0.2481 - accuracy: 0.9189 - val_loss: 0.3205 - val_accuracy: 0.8816
Epoch 4/20
25/25 - 0s - loss: 0.2121 - accuracy: 0.9238 - val_loss: 0.3121 - val_accuracy: 0.8816
Epoch 5/20
25/25 - 0s - loss: 0.1821 - accuracy: 0.9354 - val_loss: 0.2986 - val_accuracy: 0.8816
Epoch 6/20
25/25 - 0s - loss: 0.1615 - accuracy: 0.9421 - val_loss: 0.2941 - val_accuracy: 0.8882
Epoch 7/20
25/25 - 0s - loss: 0.1436 - accuracy: 0.9421 - val_loss: 0.2999 - val_accuracy: 0.9079
Epoch 8/20
25/25 - 0s - loss: 0.1314 - accuracy: 0.9520 - val_loss: 0.3027 - val_accuracy: 0.9013
Epoch 9/20
25/25 - 0s - loss: 0.1219 - accuracy: 0.9503 - val_loss: 0.3076 - val_accuracy: 0.9079
Epoch 10/20
25/25 - 0s - loss: 0.1138 - accuracy: 0.9536 - val_loss: 0.3133 - val_accuracy: 0.9211
Epoch 11/20
25/25 - 0s - loss: 0.1062 - accuracy: 0.9570 - val_loss: 0.3280 - val_accuracy: 0.9276
Epoch 12/20
25/25 - 0s - loss: 0.0999 - accuracy: 0.9603 - val_loss: 0.3240 - val_accuracy: 0.9276
Epoch 13/20
25/25 - 0s - loss: 0.0951 - accuracy: 0.9603 - val_loss: 0.3347 - val_accuracy: 0.9276
Epoch 14/20
25/25 - 0s - loss: 0.0929 - accuracy: 0.9636 - val_loss: 0.3394 - val_accuracy: 0.9211
Epoch 15/20
25/25 - 0s - loss: 0.0872 - accuracy: 0.9636 - val_loss: 0.3558 - val_accuracy: 0.9211
Epoch 16/20
25/25 - 0s - loss: 0.0835 - accuracy: 0.9685 - val_loss: 0.3670 - val_accuracy: 0.9211
Epoch 17/20
25/25 - 0s - loss: 0.0798 - accuracy: 0.9702 - val_loss: 0.3746 - val_accuracy: 0.9145
Epoch 18/20
25/25 - 0s - loss: 0.0772 - accuracy: 0.9719 - val_loss: 0.3783 - val_accuracy: 0.9145
Epoch 19/20
25/25 - 0s - loss: 0.0731 - accuracy: 0.9702 - val_loss: 0.3945 - val_accuracy: 0.9145
Epoch 20/20
25/25 - 0s - loss: 0.0689 - accuracy: 0.9735 - val_loss: 0.4000 - val_accuracy: 0.9145

<tensorflow.python.keras.callbacks.History at 0x1264fc7a3c8>

undersample_prediction = undersample_model.predict(original_Xtest, batch_size=200, verbose=0)

undersample_fraud_predictions = undersample_model.predict_classes(original_Xtest, batch_size=200, verbose=0)

WARNING:tensorflow:From <ipython-input-79-4a211ea67a85>:1: Sequential.predict_classes (from tensorflow.python.keras.engine.sequential) is deprecated and will be removed after 2021-01-01.
Instructions for updating:
Please use instead:* `np.argmax(model.predict(x), axis=-1)`,   if your model does multi-class classification   (e.g. if it uses a `softmax` last-layer activation).* `(model.predict(x) > 0.5).astype("int32")`,   if your model does binary classification   (e.g. if it uses a `sigmoid` last-layer activation).

import itertools

def plot_confusion_matrix(cm, classes, 
                         normalize=False,
                         title = 'Confusion matrix',
                         cmap=plt.cm.Blues):
    if normalize:
        cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
        print("Normalized confused matrix")        
    else:
        print('Confusion matrix, without normalization')
        
    print(cm)
    
    plt.imshow(cm, interpolation='nearest', cmap= cmap)
    plt.title(title, fontsize=14)
    plt.colorbar()
    tick_marks = np.arange(len(classes))
    plt.xticks(tick_marks, classes, rotation=45)
    plt.yticks(tick_marks, classes)
    
    fmt = '.2f' if normalize else 'd'
    
    thresh = cm.max() / 2.
    
    for i , j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
        plt.text(j, i, format(cm[i,j], fmt), 
                horizontalalignment="center",
                color="white" if cm[i,j] > thresh else "black")
        plt.tight_layout()
        plt.ylabel('True label')
        plt.xlabel('Predicted label')

undersample_cm = confusion_matrix(original_ytest, undersample_fraud_predictions)
actual_cm = confusion_matrix(original_ytest, original_ytest)
labels = ['No Fraud', 'Fraud']

fig = plt.figure(figsize=(16,8))

fig.add_subplot(221)
plot_confusion_matrix(undersample_cm, labels, title="Random UnderSample \n Confusion Matrix",
                     cmap=plt.cm.Reds)
fig.add_subplot(222)
plot_confusion_matrix(actual_cm, labels, title="Confusion Matrix \n (with 100% accuracy)", 
                      cmap=plt.cm.Greens)

Confusion matrix, without normalization
[[55158  1705]
 [    9    89]]
Confusion matrix, without normalization
[[56863     0]
 [    0    98]]

Keras, OverSampling(SMOTE)

n_inputs = Xsm_train.shape[1]
oversample_model = Sequential([
    Dense(n_inputs, input_shape=(n_inputs, ), activation = 'relu'),
    Dense(32, activation='relu'),
    Dense(2, activation='softmax')
])

oversample_model.compile(Adam(lr=0.001), loss='sparse_categorical_crossentropy', metrics=['accuracy'])

oversample_model.fit(Xsm_train, ysm_train, validation_split=0.2, batch_size = 300,
                    epochs=20, shuffle=True, verbose=2)

Epoch 1/20
971/971 - 1s - loss: 0.0770 - accuracy: 0.9715 - val_loss: 0.0788 - val_accuracy: 0.9685
Epoch 2/20
971/971 - 1s - loss: 0.0178 - accuracy: 0.9945 - val_loss: 0.0148 - val_accuracy: 0.9975
Epoch 3/20
971/971 - 1s - loss: 0.0099 - accuracy: 0.9977 - val_loss: 0.0076 - val_accuracy: 0.9993
Epoch 4/20
971/971 - 1s - loss: 0.0070 - accuracy: 0.9985 - val_loss: 0.0049 - val_accuracy: 0.9999
Epoch 5/20
971/971 - 1s - loss: 0.0054 - accuracy: 0.9989 - val_loss: 0.0065 - val_accuracy: 0.9997
Epoch 6/20
971/971 - 1s - loss: 0.0042 - accuracy: 0.9991 - val_loss: 0.0093 - val_accuracy: 0.9994
Epoch 7/20
971/971 - 1s - loss: 0.0037 - accuracy: 0.9993 - val_loss: 0.0062 - val_accuracy: 0.9997
Epoch 8/20
971/971 - 1s - loss: 0.0034 - accuracy: 0.9993 - val_loss: 0.0022 - val_accuracy: 1.0000
Epoch 9/20
971/971 - 1s - loss: 0.0029 - accuracy: 0.9994 - val_loss: 0.0016 - val_accuracy: 1.0000
Epoch 10/20
971/971 - 1s - loss: 0.0023 - accuracy: 0.9995 - val_loss: 0.0018 - val_accuracy: 1.0000
Epoch 11/20
971/971 - 1s - loss: 0.0025 - accuracy: 0.9995 - val_loss: 0.0014 - val_accuracy: 0.9999
Epoch 12/20
971/971 - 1s - loss: 0.0018 - accuracy: 0.9996 - val_loss: 0.0035 - val_accuracy: 0.9994
Epoch 13/20
971/971 - 1s - loss: 0.0021 - accuracy: 0.9995 - val_loss: 0.0015 - val_accuracy: 1.0000
Epoch 14/20
971/971 - 1s - loss: 0.0014 - accuracy: 0.9996 - val_loss: 0.0016 - val_accuracy: 1.0000
Epoch 15/20
971/971 - 1s - loss: 0.0017 - accuracy: 0.9996 - val_loss: 5.8743e-04 - val_accuracy: 1.0000
Epoch 16/20
971/971 - 1s - loss: 0.0016 - accuracy: 0.9996 - val_loss: 0.0022 - val_accuracy: 0.9996
Epoch 17/20
971/971 - 1s - loss: 0.0015 - accuracy: 0.9997 - val_loss: 5.7087e-04 - val_accuracy: 1.0000
Epoch 18/20
971/971 - 1s - loss: 0.0013 - accuracy: 0.9997 - val_loss: 0.0086 - val_accuracy: 0.9975
Epoch 19/20
971/971 - 1s - loss: 0.0010 - accuracy: 0.9997 - val_loss: 5.7633e-04 - val_accuracy: 1.0000
Epoch 20/20
971/971 - 1s - loss: 0.0017 - accuracy: 0.9996 - val_loss: 0.0011 - val_accuracy: 1.0000

<tensorflow.python.keras.callbacks.History at 0x1265221cec8>

oversample_fraud_predictions = oversample_model.predict_classes(original_Xtest, batch_size=200, verbose=0)

oversample_smote = confusion_matrix(original_ytest, oversample_fraud_predictions)
actual_cm = confusion_matrix(original_ytest, original_ytest)
labels = ['No Fraud', 'Fraud']

fig = plt.figure(figsize=(16,8))

fig.add_subplot(221)
plot_confusion_matrix(oversample_smote, labels, title="OverSample (SMOTE) \n Confusion Matrix", cmap=plt.cm.Oranges)

fig.add_subplot(222)
plot_confusion_matrix(actual_cm, labels, title="Confusion Matrix \n (with 100% accuracy)", cmap=plt.cm.Greens)

Confusion matrix, without normalization
[[56842    21]
 [   29    69]]
Confusion matrix, without normalization
[[56863     0]
 [    0    98]]

결론
- 불균형 데이터 세트에 SMOTE를 구현하면 레이블의 불균형을 해결할 수 있었다.
- 때때로 OverSampling data set를 사용한 신경망은 UnderSampling data set를 사용하는 모델보다 덜 정확한 Fraud를 예측한다.
- 그러나 특이치 제거는 OverSampling된 데이터 집합이 아니라 Random UnderSampling 데이터에서만 실행해야한다.
- 또한 UnderSampling 데이터 를 사용한 모델에서는 많은 수의 Non-Fraud 트랜잭션에 대해 정확하게 감지 할수 없다. Non-Fraud 거래 또한, Fraud 거래로 잘못 분류할 수 있다.

티스토리

[kaggle][필사] Credit Card Fraud Detection (3)

[kaggle][필사] Credit Card Fraud Detection (3)

5. 테스팅

5.1 로지스틱 회귀 테스트

5.2 신경망 테스트

	Technique	Score
0	Random UnderSampling	0.931217
1	Oversampling (SMOTE)	0.988273