Skip to content Skip to sidebar Skip to footer

Attributeerror: 'list' Object Has No Attribute 'lower' : Clustering

I'm trying to do a clustering. I'm doing with pandas and sklearn. import pandas import pprint import pandas as pd from sklearn.cluster import KMeans from sklearn.metrics import adj

Solution 1:

The error is in this line:

dataset_list = dataset.values.tolist()

You see, dataset is a pandas DataFrame, so when you do dataset.values, it will be converted to a 2-d dataset of shape (n_rows, 1) (Even if the number of columns are 1). Then calling tolist() on this will result in a list of lists, something like this:

print(dataset_list)

[[hello wish to cancel order thank you confirmation],
 [hello would liketo cancel order made today store house world],
 [dimensions bed not compatible would liketo know how to pass cancellation refund send today cordially]
 ...
 ...
 ...]]

As you see, there are two square brackets here.

Now TfidfVectorizer only requires a list of sentences, not lists of list and hence the error (because TfidfVectorizer assumes internal data to be sentences, but here it is a list).

So you just need to do this:

# Use ravel to convert 2-d to 1-d arraydataset_list = dataset.values.ravel().tolist()

OR

# Replace `column_name` with your actual column header, # which converts DataFrame to Seriesdataset_list = dataset['column_name'].values).tolist()

Post a Comment for "Attributeerror: 'list' Object Has No Attribute 'lower' : Clustering"