How Do You Write A Django Model That Can Automatically Normalize Data?
Solution 1:
There are two ways to trigger some action when a model is saved: override the save
method, or write a post_save
listener. I'll show the override method since it's a little simpler, and fits this use case nicely.
To to get the max / min, you can use Django's queryset aggregation functions:
from django.db.models import Max, Min
classParty(models.Model):
...
defsave(self, *args, **kwargs):
max = Party.objects.all().aggregate(Max('tfidf'))['tfidf__max']
min = Party.objects.all().aggregate(Min('tfidf'))['tfidf__min']
self.normalized_tfidf = (self.tfidf - min) / (max - min)
super(Party, self).save(*args, **kwargs)
Overriding default model methods like save
is pretty straightforward but there's some more info here if you're interested.
Note that if you are doing bulk updates to Party.tfidf
at any point, the save handler won't get called (or post_save signals sent, for that matter), so you'd have to process all of the rows manually - which would mean a lot of DB writes and would pretty much make doing bulk updates pointless.
Solution 2:
To prevent issues with stale data, etc., as mentioned by @klaws in the comments above, it may not be ideal to calculate the normalized value at the time a new song is added.
Instead, you could use a query that lets the database calculate the normalized value, whenever it is needed.
You'll need to import some stuff from django's expressions and aggregates:
from django.db.modelsimportWindow, F, Min, Max
Here's a simple example, applied to the OP's problem, assuming no grouping is needed:
defquery_normalized_tfidf(party_queryset):
w_min = Window(expression=Min('tfidf'))
w_max = Window(expression=Max('tfidf'))
return party_queryset.annotate(
normalized_tfidf=(F('tfidf') - w_min) / (w_max - w_min))
The Window
class allows us to continue annotating the individual objects, as explained e.g. here and in Django's docs.
Instead of using a separate query function, we could also add this to a custom model manager.
If you need the normalized values to be calculated with respect to certain groups (e.g. if the song had a genre
), the above could be extended, and generalized, as follows:
defquery_normalized_values(queryset, value_lookup, group_lookups=None):
"""
a generalized version that normalizes data with respect to the
extreme values within each group
"""
partitions = Noneif group_lookups:
partitions = [F(group_lookup) for group_lookup in group_lookups]
w_min = Window(expression=Min(value_lookup), partition_by=partitions)
w_max = Window(expression=Max(value_lookup), partition_by=partitions)
return queryset.annotate(
normalized=(F(value_lookup) - w_min) / (w_max - w_min))
This could be used as follows, assuming there would be a Party.genre
field:
annotated_parties = query_normalized_values(
queryset=Party.objects.all(), value_lookup='tfidf',
group_lookups=['genre'])
This would normalize the tfidf
values with respect to the extreme tfidf
values within each genre
.
Note: In the special case of division by zero (when w_min
equals w_max
), the resulting "normalized value" will be None
.
Post a Comment for "How Do You Write A Django Model That Can Automatically Normalize Data?"