This browser is no longer supported.

Upgrade to Microsoft Edge to take advantage of the latest features, security updates, and technical support.

Download Microsoft Edge More info about Internet Explorer and Microsoft Edge

The partition operator partitions the records of its input table into multiple subtables according to values in a key column. The operator runs a subquery on each subtable, and produces a single output table that is the union of the results of all subqueries. This operator is useful when you need to perform a subquery only on a subset of rows that belongs to the same partition key, and not query the whole dataset. These subqueries could include aggregate functions, window functions, top N and others.

The partition operator supports several strategies of subquery operation:

  • Native - use with an implicit data source with thousands of key partition values.
  • Shuffle - use with an implicit source with millions of key partition values.
  • Legacy - use with an implicit or explicit source for 64 or less key partition values.
  • Native strategy

    This subquery is a tabular transformation that doesn't specify a tabular source. The source is implicit and is assigned according to the subtable partitions. It should be applied when the number of distinct values of the partition key isn't large, roughly in the thousand. Use hint.strategy=native for this strategy. There's no restriction on the number of partitions.

    Shuffle strategy

    This subquery is a tabular transformation that doesn't specify a tabular source. The source is implicit and will be assigned according to the subtable partitions. The strategy applies when the number of distinct values of the partition key is large, in the millions. Use hint.strategy=shuffle for this strategy. There's no restriction on the number of partitions. For more information about shuffle strategy and performance, see shuffle .

    Native and shuffle strategy operators

    The difference between hint.strategy=native and hint.strategy=shuffle is mainly to allow the caller to indicate the cardinality and execution strategy of the subquery, and can affect the execution time. There's no other semantic difference between the two.

    For native and shuffle strategy, the source of the subquery is implicit, and can't be referenced by the subquery. This strategy supports a limited set of operators: project , sort , summarize , take , top , order , mv-expand , mv-apply , make-series , limit , extend , distinct , count , project-away , project-keep , project-rename , project-reorder , parse , parse-where , reduce , sample , sample-distinct , scan , search , serialize , top-nested , top-hitters and where .

    Operators like join , union , external_data , plugins , or any other operator that involves table source that isn't the subtable partitions, aren't allowed.

    Legacy strategy

    Legacy subqueries can use the following sources:

  • Implicit - The source is a tabular transformation that doesn't specify a tabular source. The source is implicit and will be assigned according to the subtable partitions. This scenario applies when there are 64 or less key values.
  • Explicit - The subquery must include a tabular source explicitly. Only the key column of the input table is available in the subquery, and referenced by using its name in the toscalar() function.
  • For both implicit and explicit sources, the subquery type is used for legacy purposes only, and indicated by the use of hint.strategy=legacy , or by not including any strategy indication.

    Any other reference to the source is taken to mean the entire input table, for example, by using the as operator and calling up the value again.

    It is recommended to use the native or shuffle strategies rather than the legacy strategy, since the legacy strategy is limited to 64 partitions and is less efficient. The legacy partition operator is currently limited by the number of partitions. The operator will yield an error if the partition column ( Column ) has more than 64 distinct values.

    All strategies

    For native, shuffle and legacy subqueries, the result must be a single tabular result. Multiple tabular results and the use of the fork operator aren't supported. A subquery can't include other statements, for example, it can't have a let statement.

    Syntax

    T | partition [ hint.strategy= strategy ] [ PartitionParameters ] by Column ( TransformationSubQuery )

    T | partition [ PartitionParameters ] by Column { ContextFreeSubQuery }

    Parameters

    Required Description The partition strategy, native , shuffle or legacy . native strategy is used with an implicit source with thousands of key partition values. shuffle strategy is used with an implicit source with millions of key partition values. legacy strategy is used with an explicit or implicit source with 64 or less key partition values. Column The name of a column in T whose values determine how the input table is to be partitioned. TransformationSubQuery A tabular transformation expression, whose source is implicitly the subtables produced by partitioning the records of T , each subtable being homogenous on the value of Column . ContextFreeSubQuery A tabular expression that includes its own tabular source, such as a table reference. The expression can reference a single column from T , being the key column Column using the syntax toscalar( Column ) . PartitionParameters Zero or more space-separated parameters in the form of: HintName = Value that control the behavior of the operator. See the supported hints .

    Supported hints

    HintName Description Native/Shuffle/Legacy strategy hint.strategy string The value legacy , shuffle , or native . This hint defines the execution strategy of the partition operator. Native, Shuffle, Legacy hint.shufflekey string The partition key. Runs the partition operator in shuffle strategy where the shuffle key is the specified partition key. Shuffle hint.materialized If set to true , will materialize the source of the partition operator. The default value is false . Legacy hint.concurrency Hints the system how many partitions to run in parallel. The default value is 16. Legacy hint.spread Hints the system how to distribute the partitions among cluster nodes. For example, if there are N partitions and the spread hint is set to P, then the N partitions will be processed by P different cluster nodes equally in parallel/sequentially depending on the concurrency hint. The default value is 1. Legacy

    Returns

    The operator returns a union of the results of the individual subqueries.

    Examples

    Native strategy examples

    Use hint.strategy=native for this strategy. See the following examples:

    This query returns foreach InjuriesDirect, the count of events and total injuries in each State that starts with 'W'.

    Run the query

    StormEvents
    | where State startswith 'W'
    | partition hint.strategy=native by InjuriesDirect (summarize Events=count(), Injuries=sum(InjuriesDirect) by State);
    

    Output

    State Events Injuries

    This query returns the top 2 EventType by total injuries for each State that starts with 'W':

    Run the query

    StormEvents
    | where State startswith 'W'
    | partition hint.strategy = native by State
        summarize TotalInjueries = sum(InjuriesDirect) by EventType
        | top 2 by TotalInjueries
    

    Output

    EventType TotalInjueries

    Shuffle strategy example

    Use hint.strategy=shuffle for this strategy. See the following example:

    This query will return the top 3 DamagedProperty foreach EpisodeId, it returns also the columns EpisodeId and State.

    Run the query

    StormEvents
    | partition hint.strategy=shuffle by EpisodeId
        top 3 by DamageProperty
        | project EpisodeId, State, DamageProperty
    | count
    

    Output

    Count

    Legacy strategy with explicit source

    This strategy is for legacy purposes only, and indicated by the use of hint.strategy=legacy or by not including a strategy indication at all. See the following example:

    This query will run two subqueries:

  • When x == 1, the query will return all rows from StormEvents that have InjuriesIndirect == 1.
  • When x == 2, the query will return all rows from StormEvents that have InjuriesIndirect == 2.
  • the final result is the union of these 2 subqueries.

    Run the query

    range x from 1 to 2 step 1
    | partition hint.strategy=legacy by x {StormEvents | where x == InjuriesIndirect}
    | count 
    

    Output

    Count

    Partition operator

    In some cases, it's more performant and easier to write a query using the partition operator than using the top-nested operator. The following example runs a subquery calculating summarize and top for each of States starting with W: (WYOMING, WASHINGTON, WEST VIRGINIA, WISCONSIN)

    Run the query

    StormEvents
    | where State startswith 'W'
    | partition hint.strategy=native by State 
        summarize Events=count(), Injuries=sum(InjuriesDirect) by EventType, State
        | top 3 by Events 
    

    Output

    EventType State Events Injuries

    Partition reference

    The following example shows how to use the as operator to give a "name" to each data partition and then reuse that name within the subquery. This approach is only relevant to the legacy strategy.

    | partition by Dim as Partition | extend MetricPct = Metric * 100.0 / toscalar(Partition | summarize sum(Metric))