Skip to content Skip to sidebar Skip to footer

How Do I Use .loc With Groupby So That Creating A New Column Based On Grouped Data Won't Be Considered A Copy?

I have a CSV file with groups of data, and am using the groupby() method to segregate them. Each group is processed by a bit of simple math that includes the use of min() and max()

Solution 1:

So you are asking for:

  1. How to stop setting values to copies.
  2. How to create a plot with a subplot for each group in matplotlib.

The "SettingWithCopyWarning" happens because you are creating a column and setting values on each group, which is itself a copy of some rows of the DataFrame. Instead of setting the values on each loop I would store 'Test_Point_Error' on a list of series and pd.concat(list) after exiting for-loop, then add that to the DF.

---Edit--- Try replacing:

group['Test Point Error']=100*(group['Test Reading']- (group['Test Point']*R+x1))

with

error_list.append(100 * (group['Test Reading']- (group['Test Point']*R+x1)))

This will append a series for each group, with Indexes matching df.index. When you're done it will have exactly one row of error for each row in df. Therefore after you exit for-loop:

df.assign(test_point_error=pd.concat(error_list))

Will match each row exactly regardless of any sorting on df.

---end of edit---

The subplotting issue is similar, you are plotting each group separately while looping. If you plot after exiting for-loop then

df.groupby().plot(subplots=True)

will return what you want.

On a separate topic, I would do away with the string concatenation for 'Test' and do:

df.groupby(['Model No', 'Serial No', 'Test Time'])

This might make your code a lot faster if there are many rows.

Post a Comment for "How Do I Use .loc With Groupby So That Creating A New Column Based On Grouped Data Won't Be Considered A Copy?"