SELECT Query
SELECT
queries perform data retrieval. By default, the requested data is returned to the client, while in conjunction with
INSERT INTO
it can be forwarded to a different table.
Syntax
All clauses are optional, except for the required list of expressions immediately after
SELECT
which is covered in more detail
below
.
Specifics of each optional clause are covered in separate sections, which are listed in the same order as they are executed:
SELECT Clause
Expressions
specified in the
SELECT
clause are calculated after all the operations in the clauses described above are finished. These expressions work as if they apply to separate rows in the result. If expressions in the
SELECT
clause contain aggregate functions, then ClickHouse processes aggregate functions and expressions used as their arguments during the
GROUP BY
aggregation.
If you want to include all columns in the result, use the asterisk (
*
) symbol. For example,
SELECT * FROM ...
.
Dynamic column selection
Dynamic column selection (also known as a COLUMNS expression) allows you to match some columns in a result with a re2 regular expression.
For example, consider the table:
The following query selects data from all the columns containing the
a
symbol in their name.
The selected columns are returned not in the alphabetical order.
You can use multiple
COLUMNS
expressions in a query and apply functions to them.
For example:
Each column returned by the
COLUMNS
expression is passed to the function as a separate argument. Also you can pass other arguments to the function if it supports them. Be careful when using functions. If a function does not support the number of arguments you have passed to it, ClickHouse throws an exception.
For example:
In this example,
COLUMNS('a')
returns two columns:
aa
and
ab
.
COLUMNS('c')
returns the
bc
column. The
+
operator can't apply to 3 arguments, so ClickHouse throws an exception with the relevant message.
Columns that matched the
COLUMNS
expression can have different data types. If
COLUMNS
does not match any columns and is the only expression in
SELECT
, ClickHouse throws an exception.
Asterisk
You can put an asterisk in any part of a query instead of an expression. When the query is analyzed, the asterisk is expanded to a list of all table columns (excluding the
MATERIALIZED
and
ALIAS
columns). There are only a few cases when using an asterisk is justified:
LIMIT 1
. But it is better to use the
DESC TABLE
query.
PREWHERE
.
In all other cases, we do not recommend using the asterisk, since it only gives you the drawbacks of a columnar DBMS instead of the advantages. In other words using the asterisk is not recommended.
Extreme Values
In addition to results, you can also get minimum and maximum values for the results columns. To do this, set the extremes setting to 1. Minimums and maximums are calculated for numeric types, dates, and dates with times. For other columns, the default values are output.
An extra two rows are calculated – the minimums and maximums, respectively. These extra two rows are output in
XML
,
JSON*
,
TabSeparated*
,
CSV*
,
Vertical
,
Template
and
Pretty*
formats
, separate from the other rows. They are not output for other formats.
In
JSON*
and
XML
formats, the extreme values are output in a separate 'extremes' field. In
TabSeparated*
,
CSV*
and
Vertical
formats, the row comes after the main result, and after 'totals' if present. It is preceded by an empty row (after the other data). In
Pretty*
formats, the row is output as a separate table after the main result, and after
totals
if present. In
Template
format the extreme values are output according to specified template.
Extreme values are calculated for rows before
LIMIT
, but after
LIMIT BY
. However, when using
LIMIT offset, size
, the rows before
offset
are included in
extremes
. In stream requests, the result may also include a small number of rows that passed through
LIMIT
.
Notes
You can use synonyms (
AS
aliases) in any part of a query.
The
GROUP BY
,
ORDER BY
, and
LIMIT BY
clauses can support positional arguments. To enable this, switch on the
enable_positional_arguments
setting. Then, for example,
ORDER BY 1,2
will be sorting rows in the table on the first and then the second column.
Implementation Details
If the query omits the
DISTINCT
,
GROUP BY
and
ORDER BY
clauses and the
IN
and
JOIN
subqueries, the query will be completely stream processed, using O(1) amount of RAM. Otherwise, the query might consume a lot of RAM if the appropriate restrictions are not specified:
max_memory_usage
max_rows_to_group_by
max_rows_to_sort
max_rows_in_distinct
max_bytes_in_distinct
max_rows_in_set
max_bytes_in_set
max_rows_in_join
max_bytes_in_join
max_bytes_before_external_sort
max_bytes_ratio_before_external_sort
max_bytes_before_external_group_by
max_bytes_ratio_before_external_group_by
For more information, see the section "Settings". It is possible to use external sorting (saving temporary tables to a disk) and external aggregation.
SELECT modifiers
You can use the following modifiers in
SELECT
queries.
APPLY
Allows you to invoke some function for each row returned by an outer table expression of a query.
Syntax:
Example:
EXCEPT
Specifies the names of one or more columns to exclude from the result. All matching column names are omitted from the output.
Syntax:
Example:
REPLACE
Specifies one or more
expression aliases
. Each alias must match a column name from the
SELECT *
statement. In the output column list, the column that matches the alias is replaced by the expression in that
REPLACE
.
This modifier does not change the names or order of columns. However, it can change the value and the value type.
Syntax:
Example:
Modifier Combinations
You can use each modifier separately or combine them.
Examples:
Using the same modifier multiple times.
Using multiple modifiers in a single query.