Why Does Scikit-learn Demand Different Data Shapes For Different Regressors?
Solution 1:
When you do y = np.random.rand(10)
, y is a one dimensional array of [10,]
. It doesnt matter if its a row vector or column vector. Its just a vector with only one dimension. Take a look at this answer and this too to understand the philosophy behind it.
Its a part of "numpy philosophy". And sklearn depends on numpy.
As for your comment:-
why sklearn doesn't automatically understand that if I pass it something of the shape (n,) that n_samples=n and n_features=1
sklearn may not infer whether its n_samples=n and n_features=1
or other way around (n_samples=1 and n_features=n
) based on X data alone. It may be done, if y is passed which may make it clear about the n_samples
.
But that means changing all the code which relies on this type of semantics and that may break many things, because sklearn
depends on numpy
operations heavily.
You may also want to check the following links where similar issues are discussed.
Post a Comment for "Why Does Scikit-learn Demand Different Data Shapes For Different Regressors?"