Collectives™ on Stack Overflow

Find centralized, trusted content and collaborate around the technologies you use most.

Learn more about Collectives

Teams

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Learn more about Teams

Here is a sample of my data set:

   Pat_ID  Flare_Up  Demo1     Demo2     Demo3     Demo4  Demo5     Demo6  DisHis1  DisHis1Times  DisHis2    ...     Dis6Treat  Dis7  RespQues1  ResQues1a  ResQues1b  ResQues1c  ResQues2a  SmokHis1  SmokHis2  SmokHis3  SmokHis4
0       1         0      1  0.246004  0.391931  0.237792      0  0.443526        0      0.000000        0    ...             1     0    0.12623     0.1032     0.2439     0.0597        0.0  0.411765  0.263620  0.482759    0.1875
1       2         1      1  0.225851  0.268012  0.268481      0  0.286501        0      0.000000        1    ...             1     0    0.60707     0.3808     0.8637     0.4949        0.1  0.117647  0.098418  0.624138    0.0000
2       3         0      0  0.342599  0.476945  0.296468      1  0.159780        1      0.166667        1    ...             0     0    0.77541     0.6318     1.0000     0.6570        0.3  0.035294  0.020211  0.510345    0.0000
[3 rows x 62 columns]  

My code to traverse through that data set and print ROC is:

import pandas as pd 
import matplotlib.pyplot as plt 
import numpy as np 
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import StratifiedKFold
from sklearn.metrics import confusion_matrix, classification_report, roc_curve, roc_auc_score
import itertools
def plot_confusion_matrix(cm, classes, normalize=True, title='Confusion matrix', cmap=plt.cm.Blues):
    This function prints and plots the confusion matrix.
    Normalization can be applied by setting `normalize=True`.
    plt.imshow(cm, interpolation='nearest', cmap=cmap)
    plt.title(title)
    plt.colorbar()
    tick_marks = np.arange(len(classes))
    plt.xticks(tick_marks, classes, rotation=45)
    plt.yticks(tick_marks, classes)
    if normalize:
        cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
        print("Normalized confusion matrix")
#    else:
#        print('Confusion matrix, without normalization')
#    print(cm)
    thresh = cm.max() / 2.
    for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
        plt.text(j, i, cm[i, j],
                 horizontalalignment="center",
                 color="white" if cm[i, j] > thresh else "black")
    plt.tight_layout()
    plt.ylabel('True label')
    plt.xlabel('Predicted label')
    plt.show()
def show_data(cm, print_res = 0):
    tp = cm[1,1]
    fn = cm[1,0]
    fp = cm[0,1]
    tn = cm[0,0]
    if print_res == 1:
        print('Precision =     {:.3f}'.format(tp/(tp+fp)))
        print('Recall (TPR) =  {:.3f}'.format(tp/(tp+fn)))
        print('Fallout (FPR) = {:.3e}'.format(fp/(fp+tn)))
    return tp/(tp+fp), tp/(tp+fn), fp/(fp+tn)
df = pd.read_csv("datasource/DevelopmentData.csv")
print(df.head(3))
y = np.array(df.Class.tolist())     #classes: 1..fraud, 0..no fraud
df = df.drop('Class', 1)
df = df.drop('Time', 1)     # optional
df['Amount'] = StandardScaler().fit_transform(df['Amount'].values.reshape(-1,1))    #optionally rescale non-normalized column
X = np.array(df.as_matrix())   # features  

A class of 0 means that the transaction was in order, and a class of 1 means that the transaction was fraudulent.
When I run my code, I get this error:

Traceback (most recent call last):
  File "finalindex.py", line 54, in <module>
    y = np.array(df.Class.tolist())     #classes: 1..fraud, 0..no fraud
  File "C:\Users\kulkaa\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas\core\generic.py", line 4376, in __getattr__
    return object.__getattribute__(self, name)
AttributeError: 'DataFrame' object has no attribute 'Class'  

How can I fix that error? Do I need to change column names according to data set?

I don't know anything about roc or auc, but is DataFrame supposed to have a Class attribute? If it doesn't, you will always get that error. – Ctrl S Sep 25, 2018 at 13:14 Ummm... yes, I guess Dataframe has Class attribute. The author of this link (kaggle.com/dstuerzer/optimized-logistic-regression) has used it and it is working fine with his code. – Ajay Kulkarni Sep 25, 2018 at 13:17 It is good that you mentioned this! I noticed his database has a column named "Class". See my answer below... – Ctrl S Sep 25, 2018 at 15:56

[...] The author of this link (kaggle.com/dstuerzer/optimized-logistic-regression) has used it and it is working fine with his code.

In the link you mentioned, example, the author's database has a column named "Class" but the database that you have shown does not. As a result, the Class attribute does not exist in your database and therefore cannot be accessed.

Dominik Stuerzer:

   Time        V1        V2        V3        V4        V5        V6        V7  \
0   0.0 -1.359807 -0.072781  2.536347  1.378155 -0.338321  0.462388  0.239599   
1   0.0  1.191857  0.266151  0.166480  0.448154  0.060018 -0.082361 -0.078803   
2   1.0 -1.358354 -1.340163  1.773209  0.379780 -0.503198  1.800499  0.791461   
         V8        V9  ...         V21       V22       V23       V24  \
0  0.098698  0.363787  ...   -0.018307  0.277838 -0.110474  0.066928   
1  0.085102 -0.255425  ...   -0.225775 -0.638672  0.101288 -0.339846   
2  0.247676 -1.514654  ...    0.247998  0.771679  0.909412 -0.689281   
        V25       V26       V27       V28  Amount  Class  
0  0.128539 -0.189115  0.133558 -0.021053  149.62      0  
1  0.167170  0.125895 -0.008983  0.014724    2.69      0  
2 -0.327642 -0.139097 -0.055353 -0.059752  378.66      0  
[3 rows x 31 columns]
  

A class of 0 means that the transaction was in order, and a class of 1 means that the transaction was fraudulent. From personal experience we expect frauds to make up only a tiny fraction of all transactions. Indeed, in this dataset, for every fraud there are almost 600 non-fraudulent transactions: [...]

when I added your code and ran it, it threw this error: Invalid syntax. So I used df = df.drop('Class', axis=1) and df = df.drop('Time', axis=1). But I got same error as mentioned in question – Ajay Kulkarni Sep 25, 2018 at 13:12 I modified like this: df = df.drop(['Demo1'],axis=1) df = df.drop(['Demo2'],axis=1). But after running updated code, I'm still getting DataFrame has no attribute Class error – Ajay Kulkarni Sep 25, 2018 at 13:15 Output of print(df.columns) is given in this link: pastebin.com/embed_js/kkCei96z. There is no column named as Class. – Ajay Kulkarni Sep 25, 2018 at 13:23

Thanks for contributing an answer to Stack Overflow!

  • Please be sure to answer the question. Provide details and share your research!

But avoid

  • Asking for help, clarification, or responding to other answers.
  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.