相关文章推荐
有胆有识的人字拖  ·  matplotlib ...·  6 天前    · 
要出家的海豚  ·  傣族·  1 月前    · 
时尚的砖头  ·  吉林省能源局·  2 年前    · 
乖乖的橙子  ·  盘点 2020 | OTA 之下 ...·  2 年前    · 
Collectives™ on Stack Overflow

Find centralized, trusted content and collaborate around the technologies you use most.

Learn more about Collectives

Teams

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Learn more about Teams

I am trying to get 25 random samples of 15,000 IDs from a table. Instead of manually pressing run every time, I'm trying to do a loop. Which I fully understand is not the optimum use of Postgres, but it is the tool I have. This is what I have so far:

for i in 1..25 LOOP
   insert into playtime.meta_random_sample
   select i, ID
   from   tbl
   order  by random() limit 15000
end loop

Procedural elements like loops are not part of the SQL language and can only be used inside the body of a procedural language function, procedure (Postgres 11 or later) or a DO statement, where such additional elements are defined by the respective procedural language. The default is PL/pgSQL, but there are others.

Example with plpgsql:

BEGIN FOR i IN 1..25 LOOP INSERT INTO playtime.meta_random_sample (col_i, col_id) -- declare target columns! SELECT i, id FROM tbl ORDER BY random() LIMIT 15000; END LOOP; $do$;

For many tasks that can be solved with a loop, there is a shorter and faster set-based solution around the corner. Pure SQL equivalent for your example:

INSERT INTO playtime.meta_random_sample (col_i, col_id)
SELECT t.*
FROM   generate_series(1,25) i
CROSS  JOIN LATERAL (
   SELECT i, id
   FROM   tbl
   ORDER  BY random()
   LIMIT  15000

About generate_series():

  • What is the expected behaviour for multiple set-returning functions in SELECT clause?
  • About optimizing performance of random selections:

  • Best way to select random rows PostgreSQL
  • +1 I know this is almost 9 years old, but thanks so much for the set based example. I struggle with non trivial set based queries, and this example was exactly what i needed for my data generation task. Sql is pretty elegant. – 6footunder Jan 13, 2022 at 20:58 @KyleMcClellan: That's not "for the loop", that's the syntax of any PL/pgSQL code block. Read the manual here. – Erwin Brandstetter Apr 13, 2022 at 18:21 for i in 1..100000 insert into test2 values (random(), i * random(), i / random(), i + random(), i * random(), i / random(), i + random(), i * random(), i / random(), i + random()); end loop; $do$;

    I just ran into this question and, while it is old, I figured I'd add an answer for the archives. The OP asked about for loops, but their goal was to gather a random sample of rows from the table. For that task, Postgres 9.5+ offers the TABLESAMPLE clause on WHERE. Here's a good rundown:

    https://www.2ndquadrant.com/en/blog/tablesample-in-postgresql-9-5-2/

    I tend to use Bernoulli as it's row-based rather than page-based, but the original question is about a specific row count. For that, there's a built-in extension:

    https://www.postgresql.org/docs/current/tsm-system-rows.html

    CREATE EXTENSION tsm_system_rows;
    

    Then you can grab whatever number of rows you want:

    select * from playtime tablesample system_rows (15);
    

    I find it more convenient to make a connection using a procedural programming language (like Python) and do these types of queries.

    import psycopg2
    connection_psql = psycopg2.connect( user="admin_user"
                                      , password="***"
                                      , port="5432"
                                      , database="myDB"
                                      , host="[ENDPOINT]")
    cursor_psql = connection_psql.cursor()
    myList = [...]
    for item in myList:
      cursor_psql.execute('''
        -- The query goes here
    connection_psql.commit()
    cursor_psql.close()
                    I was thinking the same thing but then it can become a problem when the number of calls to the db you need to make are large.
    – shwifty chill
                    Jun 2, 2021 at 14:32
    

    Here is the one complex postgres function involving UUID Array, For loop, Case condition and Enum data update. This function parses each row and checks for the condition and updates the individual row.

    CREATE OR REPLACE FUNCTION order_status_update() RETURNS void AS $$
    DECLARE
      oid_list uuid[];
      oid uuid;
    BEGIN
      SELECT array_agg(order_id) FROM order INTO oid_list;
      FOREACH uid IN ARRAY uid_list
        WITH status_cmp AS (select COUNT(sku)=0 AS empty, 
                        COUNT(sku)<COUNT(sku_order_id) AS partial, 
                        COUNT(sku)=COUNT(sku_order_id) AS full 
                        FROM fulfillment 
                        WHERE order_id=oid)
        UPDATE order
        SET status=CASE WHEN status_cmp.empty THEN 'EMPTY'::orderstatus
        WHEN status_cmp.full THEN 'FULL'::orderstatus
        WHEN status_cmp.partial THEN 'PARTIAL'::orderstatus
        ELSE null
        FROM status_cmp
        WHERE order_id=uid;
      END LOOP;
    $$ LANGUAGE plpgsql;
    

    To run the above function

    SELECT order_status_update();
    INSERT INTO meta_random_sample(col_serial, parent_id)
    SELECT t.*
    FROM   generate_series(1,25) i
    CROSS  JOIN LATERAL (
       SELECT i, parent_id
       FROM    parent_tree order by random() limit 2
    

    Call the procedure.

    call pg_temp_3.insert_data();
    

    PostgreSQL manual: https://www.postgresql.org/docs/current/sql-createprocedure.html

    Thanks for contributing an answer to Stack Overflow!

    • Please be sure to answer the question. Provide details and share your research!

    But avoid

    • Asking for help, clarification, or responding to other answers.
    • Making statements based on opinion; back them up with references or personal experience.

    To learn more, see our tips on writing great answers.