What Is The Most Memory Efficient Way To Combine Read_sorted And Expr In Pytables?

August 21, 2024 Post a Comment

I am looking for the most memory efficient way to combine reading a Pytables table (columns: x,y,z) in a sorted order(z column has a CSI) and evaluating an expression like x+a*y+b*

Solution 1:

There are two basic options, depending on if you need to iterate in a sorted fashion or not.

If you need to iterate over the table in a sorted table, then the reading in will be much more expensive than computing the expression. Thus you should efficiently read in using Table.read_sorted() and compute this expression in a list comprehension, or similar:

a = [row['x']+a*row['y']+b*row['z'] forrowin 
     tab.read_sorted('z', checkCSI=True)]

If you don't need to iterate in a sorted manner (which it doesn't look like you do), you should set up and evaluate the expression using the Expr class, read in the CSI from the column, and apply this to expression results. This would look something like:

x = tab.cols.x
y = tab.cols.y
z = tab.cols.z
expr = tb.Expr('x+a*y+b*z')
unsorted_res = expr.eval()
idx = z.read_indices()
sorted_res = unsored_res[idx]

Python Developer

What Is The Most Memory Efficient Way To Combine Read_sorted And Expr In Pytables?

Solution 1:

Post a Comment for "What Is The Most Memory Efficient Way To Combine Read_sorted And Expr In Pytables?"