How To Specify The From Tables In Sqlalchemy Subqueries?

November 30, 2023 Post a Comment

I am trying to fetch in a single query a fixed set of rows, plus some other rows found by a subquery. My problem is that the query generated by my SQLAlchemy code is incorrect. The

Solution 1:

What you are missing is a correlation between the innermost sub-query and the next level up; without the correlation, SQLAlchemy will include the t1 alias in the innermost sub-query:

>>> print str(q1)
SELECT t3.id AS t3_id 
FROM tbl AS t3, tbl AS t1 
WHERE t3.id < t1.id ORDERBY t3.id DESC
 LIMIT ? OFFSET ?
>>> print str(q1.correlate(t1))
SELECT t3.id AS t3_id 
FROM tbl AS t3 
WHERE t3.id < t1.id ORDERBY t3.id DESC
 LIMIT ? OFFSET ?

Note that tbl AS t1 is now missing from the query. From the .correlate() method documentation:

Return a Query construct which will correlate the given FROM clauses to that of an enclosing Query or select().

Thus, t1 is assumed to be part of the enclosing query, and isn't listed in the query itself.

Now your query works:

>>>q1 = session.query(t3.c.id).filter(t3.c.id < t1.c.id).order_by(t3.c.id.desc()).\...             limit(1).correlate(t1)>>>q2 = session.query(t2.c.id).filter(t2.c.id == q1, t1.c.id.in_([4, 8]))>>>q3 = session.query(table).filter(...                               or_(table.c.id.in_(q2), table.c.id.in_([0, 8])))>>>printlist(q3)
2012-10-24 22:16:22,239 INFO sqlalchemy.engine.base.Engine SELECT tbl.id AS tbl_id 
FROM tbl 
WHERE tbl.id IN (SELECT t2.id AS t2_id 
FROM tbl AS t2, tbl AS t1 
WHERE t2.id = (SELECT t3.id AS t3_id 
FROM tbl AS t3 
WHERE t3.id < t1.id ORDER BY t3.id DESC
 LIMIT ? OFFSET ?) AND t1.id IN (?, ?)) OR tbl.id IN (?, ?)
2012-10-24 22:16:22,239 INFO sqlalchemy.engine.base.Engine (1, 0, 4, 8, 0, 8)
[(0,), (2,), (6,), (8,)]

Solution 2:

I'm only kinda sure I understand the query you're asking for. Lets break it down, though:

the goal from this query is to select IDs 0 and 8, as well as the IDs just before 4 and 8.

It looks like you want to query for two kinds of things, and then combine them. The proper operator for that is union. Do the simple queries and add them up at the end. I'll start with the second bit, "ids just before X".

To start with; lets look at the all the ids that are before some given value. For this, we'll join the table on itself with a <:

# select t1.id t1_id, t2.id t2_id from tbl t1 join tbl t2 on t1.id < t2.id;
 t1_id | t2_id 
-------+-------0|20|40|60|82|42|62|84|64|86|8
(10rows)

That certainly gives us all of the pairs of rows where the left is less than the right. Of all of them, we want the rows for a given t2_id that is as high as possible; We'll group by t2_id and select the maximum t1_id

# selectmax(t1.id), t2.id from tbl t1 join tbl t2 on t1.id < t2.id groupby t2.id;
 max | id 
-----+-------0|22|44|66|8
(4rows)

Your query, using a limit, could achieve this, but its usually a good idea to avoid using this technique when alternatives exist because partitioning does not have good, portable support across Database implementations. Sqlite can use this technique, but postgresql doesn't like it, it uses a technique called "analytic queries" (which are both standardised and more general). MySQL can do neither. The above query, though, works consistently across all sql database engines.

the rest of the work is just using in or other equivalent filtering queries and are not difficult to express in sqlalchemy. The boilerplate...

>>>import sqlalchemy as sa>>>from sqlalchemy.orm import Query>>>engine = sa.create_engine('sqlite:///:memory:')>>>meta = sa.MetaData(bind=engine)>>>table = sa.Table('tbl', meta, sa.Column('id', sa.Integer))>>>meta.create_all()>>>table.insert().execute([{'id':i} for i inrange(0, 10, 2)])>>>t1 = table.alias()>>>t2 = table.alias()>>>before_filter = [4, 8]

First interesting bit is we give the 'max(id)' expression a name. this is needed so that we can refer to it more than once, and to lift it out of a subquery.

>>>c1 = sa.func.max(t1.c.id).label('max_id')>>>#                                ^^^^^^

The 'heavy lifting' portion of the query, join the above aliases, group and select the max

>>>q1 = Query([c1, t2.c.id]) \...     .join((t2, t1.c.id < t2.c.id)) \...     .group_by(t2.c.id) \...     .filter(t2.c.id.in_(before_filter))

Because we'll be using a union, we need this to produce the right number of fields: we wrap it in a subquery and project down to the only column we're interested in. This will have the name we gave it in the above label() call.

>>>q2 = Query(q1.subquery().c.max_id)>>>#                          ^^^^^^

The other half of the union is much simpler:

>>>t3 = table.alias()>>>exact_filter = [0, 8]>>>q3 = Query(t3).filter(t3.c.id.in_(exact_filter))

All that's left is to combine them:

>>>q4 = q2.union(q3)>>>engine.execute(q4.statement).fetchall()
[(0,), (2,), (6,), (8,)]

Solution 3:

The responses here helped me fix my issue but in my case I had to use both correlate() and subquery():

# ...
subquery = subquery.correlate(OuterCorrelationTable).subquery()
filter_query = db.session.query(func.sum(subquery.c.some_count_column))
filter = filter_query.as_scalar() == as_many_as_some_param
# ...
final_query = db.session.query(OuterCorrelationTable).filter(filter)

Python Developer

How To Specify The From Tables In Sqlalchemy Subqueries?

Solution 1:

Solution 2:

Solution 3:

Post a Comment for "How To Specify The From Tables In Sqlalchemy Subqueries?"