Question

I want to use array_agg in a subquery, then make use of the aggregated data by it's array index in my main query, however, after trying many different ways, I'm really at a loss as to how it should be done; could someone explain why in the example below I get a series of None values instead of the first category in the array?

I understand that the following, simplified example can be done without doing a SELECT on array[i], but it will explain the nature of the problem:

from sqlalchemy import Integer
from sqlalchemy.dialects.postgres import ARRAY

prods = (
    session.query(
        Product.id.label('id'),
        func.array_agg(ProductCategory.id, type_=ARRAY(Integer)).label('cats'))
    .outerjoin(
        ProductCategory,
        ProductCategory.product_id == Product.id)
    .group_by(Product.id).subquery()
)

# Confirm that there's categories:
[x for x in session.query(prods) if len(x[1]) > 1][:10]
"""
Out[48]: 
[(2428, [1633667, 1633665, 1633666]),
 (2462, [1162046, 1162043, 2543783, 1162045]),
 (2573, [1633697, 1633696]),
 (2598, [2546824, 922288, 922289]),
 (2645, [2544843, 338411]),
 (2660, [1633713, 1633714, 1633712, 1633711]),
 (2686, [2547480, 466995, 466996]),
 (2748, [2546706, 2879]),
 (2785, [467074, 467073, 2545804]),
 (2806, [2545326, 686295, 686298, 686297])]
"""

# Ok now try to query to get the first category of each array:
[x for x in session.query(prods.c.cats[0].label('first_cat'))]

"""
 (None),
 (None),
 (None),
 (None),
 (None),
 (None),
 (None),
 (None),
 (None),
 (None),
 (None),
"""
Was it helpful?

Solution

You are probably doing everything right, but getting empty arrays first. In your previous query you used in-python filtering (len(x[1]) > 1). You can print Query expression before executing it to make sure.

You should probably add a having clause to you base query:

from sqlalchemy import Integer
from sqlalchemy.dialects.postgresql import ARRAY

cats_agg = func.array_agg(ProductCategory.id, type_=ARRAY(Integer)).label('cats')
prods = (
    session.query(
        Product.id.label('id'),
        cats_agg,
    .outerjoin(
        ProductCategory,
        ProductCategory.product_id == Product.id)
    .group_by(Product.id)
    .having(func.array_length(cats_agg, 1) > 1)
    .subquery()
)

Then you won't need in-python filtering either.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top