Collectives™ on Stack Overflow

Find centralized, trusted content and collaborate around the technologies you use most.

Learn more about Collectives

Teams

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Learn more about Teams

I started playing with Decision Trees lately and I wanted to train my own simple model with some manufactured data. I wanted to use this model to predict some further mock data, just to get a feel of how it works, but then I got stuck. Once your model is trained, how do you pass data to predict()?

http://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html

Docs state: clf.predict(X)

Parameters: X : array-like or sparse matrix of shape = [n_samples, n_features]

But when trying to pass np.array, np.ndarray, list, tuple or DataFrame it just throws an error. Can you help me understand why please?

Code below:

from IPython.core.display import display, HTML
display(HTML("<style>.container { width:100% !important; }</style>"))
import graphviz
import pandas as pd
import numpy as np
import random
from sklearn import tree
pd.options.display.max_seq_items=5000
pd.options.display.max_rows=20
pd.options.display.max_columns=150
lenght = 50000
miles_commuting = [random.choice([2,3,4,5,7,10,20,25,30]) for x in range(lenght)]
salary = [random.choice([1300,1600,1800,1900,2300,2500,2700,3300,4000]) for x in range(lenght)]
full_time = [random.choice([1,0,1,1,0,1]) for x in range(lenght)]
DataFrame = pd.DataFrame({'CommuteInMiles':miles_commuting,'Salary':salary,'FullTimeEmployee':full_time})
DataFrame['Moving'] = np.where((DataFrame.CommuteInMiles > 20) & (DataFrame.Salary > 2000) & (DataFrame.FullTimeEmployee == 1),1,0)
DataFrame['TargetLabel'] = np.where((DataFrame.Moving == 1),'Considering move','Not moving')
target = DataFrame.loc[:,'Moving']
data = DataFrame.loc[:,['CommuteInMiles','Salary','FullTimeEmployee']]
target_names = DataFrame.TargetLabel
features = data.columns.values
clf = tree.DecisionTreeClassifier()
clf = clf.fit(data, target)
clf.predict(?????) #### <===== What should go here?
clf.predict([30,4000,1])

ValueError: Expected 2D array, got 1D array instead: array=[3.e+01 4.e+03 1.e+00]. Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

clf.predict(np.array(30,4000,1))

ValueError: only 2 non-keyword arguments accepted

Where is your "mock data" that you want to predict?

Your data should be of the same shape that you used when calling fit(). From the code above, I see that your X has three columns ['CommuteInMiles','Salary','FullTimeEmployee']. You need to have those many columns in your prediction data, number of rows can be arbitrary.

Now when you do

clf.predict([30,4000,1])

The model is not able to understand that these are columns of a same row or data of different rows.

So you need to convert that into 2-d array, where inner array represents the single row.

Do this:

clf.predict([[30,4000,1]])     #<== Observe the two square brackets

You can have multiple rows to be predicted, each in inner list. Something like this:

X_test = [[30,4000,1],
          [35,15000,0],
          [40,2000,1],]
clf.predict(X_test)

Now as for your last error clf.predict(np.array(30,4000,1)), this has nothing to do with predict(). You are using the np.array() wrong.

According to the documentation, the signature of np.array is:

(object, dtype=None, copy=True, order='K', subok=False, ndmin=0)

Leaving the first (object) all others are keyword arguments, so they need to be used as such. But when you do this: np.array(30,4000,1), each value is considered as input to separate param here: object=30, dtype=4000, copy=1. This is not allowed and hence error. If you want to make a numpy array from list, you need to pass a list.

Like this: np.array([30,4000,1]) Now this will be considered correctly as input to object param.

Thanks for contributing an answer to Stack Overflow!

  • Please be sure to answer the question. Provide details and share your research!

But avoid

  • Asking for help, clarification, or responding to other answers.
  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.