Collectives™ on Stack Overflow
Find centralized, trusted content and collaborate around the technologies you use most.
Learn more about Collectives
Teams
Q&A for work
Connect and share knowledge within a single location that is structured and easy to search.
Learn more about Teams
Ask Question
num_classes = len(np.unique(y_train))
y_train_categorical = keras.utils.to_categorical(y_train, num_classes)
kf=StratifiedKFold(n_splits=5, shuffle=True, random_state=999)
# splitting data into different folds
for i, (train_index, val_index) in enumerate(kf.split(x_train, y_train_categorical)):
x_train_kf, x_val_kf = x_train[train_index], x_train[val_index]
y_train_kf, y_val_kf = y_train[train_index], y_train[val_index]
ValueError: Supported target types are: ('binary', 'multiclass'). Got 'multilabel-indicator' instead.
keras.utils.to_categorical
produces a one-hot encoded class vector, i.e. the multilabel-indicator
mentioned in the error message. StratifiedKFold
is not designed to work with such input; from the split
method docs:
split
(X, y, groups=None)
[...]
y : array-like, shape (n_samples,)
The target variable for supervised learning problems. Stratification is done based on the y labels.
i.e. your y
must be a 1-D array of your class labels.
Essentially, what you have to do is simply to invert the order of the operations: split first (using your intial y_train
), and convert to_categorical
afterwards.
–
–
–
–
Call to split()
like this:
for i, (train_index, val_index) in enumerate(kf.split(x_train, y_train_categorical.argmax(1))):
x_train_kf, x_val_kf = x_train[train_index], x_train[val_index]
y_train_kf, y_val_kf = y_train[train_index], y_train[val_index]
If your target variable is continuous then use simple KFold cross validation instead of StratifiedKFold.
from sklearn.model_selection import KFold
kfold = KFold(n_splits=5, shuffle=True, random_state=42)
I bumped into the same problem and found out that you can check the type of the target with this util
function:
from sklearn.utils.multiclass import type_of_target
type_of_target(y)
'multilabel-indicator'
From its docstring:
'binary': y
contains <= 2 discrete values and is 1d or a column
vector.
'multiclass': y
contains more than two discrete values, is not a
sequence of sequences, and is 1d or a column vector.
'multiclass-multioutput': y
is a 2d array that contains more
than two discrete values, is not a sequence of sequences, and both
dimensions are of size > 1.
'multilabel-indicator': y
is a label indicator matrix, an array
of two dimensions with at least two columns, and at most 2 unique
values.
With LabelEncoder
you can transform your classes into an 1d array of numbers (given your target labels are in an 1d array of categoricals/object):
from sklearn.preprocessing import LabelEncoder
label_encoder = LabelEncoder()
y = label_encoder.fit_transform(target_labels)
In my case, x
was a 2D matrix, and y
was also a 2d matrix, i.e. indeed a multi-class multi-output case. I just passed a dummy np.zeros(shape=(n,1))
for the y
and the x
as usual. Full code example:
import numpy as np
from sklearn.model_selection import RepeatedStratifiedKFold
X = np.array([[1, 2], [3, 4], [1, 2], [3, 4], [3, 7], [9, 4]])
# y = np.array([0, 0, 1, 1, 0, 1]) # <<< works
y = X # does not work if passed into `.split`
rskf = RepeatedStratifiedKFold(n_splits=3, n_repeats=3, random_state=36851234)
for train_index, test_index in rskf.split(X, np.zeros(shape=(X.shape[0], 1))):
print("TRAIN:", train_index, "TEST:", test_index)
X_train, X_test = X[train_index], X[test_index]
y_train, y_test = y[train_index], y[test_index]
–
–
Complementing what @desertnaut said, in order to convert your one-hot-encoding
back to 1-D array you will only need to do is:
class_labels = np.argmax(y_train, axis=1)
This will convert back to the initial representation of your classes.
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.