python - SQLAlchemy - filtering func.count within query

Collectives™ on Stack Overflow

Find centralized, trusted content and collaborate around the technologies you use most.

Learn more about Collectives

Teams

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Learn more about Teams

Let's say that I have a table with a column, that has some integer values and I want to calculate the percentage of values that are over 200 for that column.

Here's the kicker, I would prefer if I could do it inside one query that I could use group_by on.

results = db.session.query(
        ClassA.some_variable,
        label('entries', func.count(ClassA.some_variable)),
        label('percent', *no clue*)
  ).filter(ClassA.value.isnot(None)).group_by(ClassA.some_variable)
Alternately it would be okay thought not prefered to do the percentage calculation on the client side, something like this.
results = db.session.query(
        ClassA.some_variable,
        label('entries', func.count(ClassA.some_variable)),
        label('total_count', func.count(ClassA.value)),
        label('over_200_count', func.count(ClassA.value > 200)),
  ).filter(ClassA.value.isnot(None)).group_by(ClassA.some_variable)
But I obviously can't filter within the count statemenet, and I can't apply the filter at the end of the query, since if I apply the > 200 constraint at the end, total_count wouldn't work.
Using RAW SQL is an option too, it doesn't have to be Sqlalchemy
MariaDB unfortunately does not support the aggregate FILTER clause, but you can work around that using a CASE expression or NULLIF, since COUNT returns the count of non-null values of given expression:
from sqlalchemy import case
func.count(case([(ClassA.value > 200, 1)])).label('over_200_count')
With that in mind you can calculate the percentage simply as
(func.count(case([(ClassA.value > 200, 1)])) * 1.0 /
 func.count(ClassA.value)).label('percent')
though there's that one edge: what if func.count(ClassA.value) is 0? Depending on whether you'd consider 0 or NULL a valid return value you could either use yet another CASE expression or NULLIF:
dividend = func.count(case([(ClassA.value > 200, 1)])) * 1.0
divisor = func.count(ClassA.value)
# Zero
case([(divisor == 0, 0)],
     else_=dividend / divisor).label('percent')
# NULL
(dividend / func.nullif(divisor, 0)).label('percent')
Finally, you could create a compilation extension for mysql dialect that rewrites a FILTER clause to a suitable CASE expression:
from sqlalchemy.ext.compiler import compiles
from sqlalchemy.sql.expression import FunctionFilter
from sqlalchemy.sql.functions import Function
from sqlalchemy import case
@compiles(FunctionFilter, 'mysql')
def compile_functionfilter_mysql(element, compiler, **kwgs):
    # Support unary functions only
    arg0, = element.func.clauses
    new_func = Function(
        element.func.name,
        case([(element.criterion, arg0)]),
        packagenames=element.func.packagenames,
        type_=element.func.type,
        bind=element.func._bind)
    return new_func._compiler_dispatch(compiler, **kwgs)
With that in place you could express the dividend as
dividend = func.count(1).filter(ClassA.value > 200) * 1.0
which compiles to
In [28]: print(dividend.compile(dialect=mysql.dialect()))
count(CASE WHEN (class_a.value > %s) THEN %s END) * %s
        Thanks for contributing an answer to Stack Overflow!
Please be sure to answer the question. Provide details and share your research!
But avoid …
Asking for help, clarification, or responding to other answers.
Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.