Skip to content Skip to sidebar Skip to footer

What Is The Most Memory Efficient Way To Combine Read_sorted And Expr In Pytables?

I am looking for the most memory efficient way to combine reading a Pytables table (columns: x,y,z) in a sorted order(z column has a CSI) and evaluating an expression like x+a*y+b*

Solution 1:

There are two basic options, depending on if you need to iterate in a sorted fashion or not.

If you need to iterate over the table in a sorted table, then the reading in will be much more expensive than computing the expression. Thus you should efficiently read in using Table.read_sorted() and compute this expression in a list comprehension, or similar:

a = [row['x']+a*row['y']+b*row['z'] forrowin 
     tab.read_sorted('z', checkCSI=True)]

If you don't need to iterate in a sorted manner (which it doesn't look like you do), you should set up and evaluate the expression using the Expr class, read in the CSI from the column, and apply this to expression results. This would look something like:

x = tab.cols.x
y = tab.cols.y
z = tab.cols.z
expr = tb.Expr('x+a*y+b*z')
unsorted_res = expr.eval()
idx = z.read_indices()
sorted_res = unsored_res[idx]

Post a Comment for "What Is The Most Memory Efficient Way To Combine Read_sorted And Expr In Pytables?"