Collectives™ on Stack Overflow

Find centralized, trusted content and collaborate around the technologies you use most.

Learn more about Collectives

Teams

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Learn more about Teams

I'm trying to use MySQL SET type in PostgreSQL , but I found only Arrays, that has quite similar functionality but doesn't met requirements.

Does PostgreSQL has similar datatype?

You can use following workarounds:

1. BIT strings

You can define your set of maximum N elements as simply BIT(N) . It is little bit awkward to populate and retrieve - you will have to use bit masks as set members. But bit strings really shine for set operations : intersection is simply & , union is | .

This type is stored very efficiently - bit per bit with small overhead for length.

Also, it is nice that length is not really limited (but you have to decide it upfront).

2. HSTORE

HSTORE type is an extension, but very easy to install. Simply executing

CREATE EXTENSION hstore

for most installations (9.1+) will make it available. Rumor has it that PostgreSQL 9.3 will have HSTORE as standard type.

It is not really a set type, but more like Perl hash or Python dictionary: it keeps arbitrary set of key=>value pairs.

With that, it is not very efficient (certainly not BIT string efficient), but it does provide functions essential for sets: || for union, but intersection is little bit awkward: use

slice(a,akeys(b)) || slice(b,akeys(a))

You can read more about HSTORE here.

In general, I completely agree. But, bit strings could save your day if you want to store really long (like 1000s) bit sets - because it is extremely efficient. – mvp Feb 4, 2013 at 8:35 In general I would agree with @a_horse_with_no_name here. A table with a unique constraint or primary key is a set. Of course if you are migrating from another platform you often want something to get up and running with first.... – Chris Travers Mar 2, 2013 at 0:35 Again, if you want to keep 1 million bitmap, by using VARBIT you will expend only 1000000/8 = 125KB (and even that will be compressed by Postgres TOAST, probably by factor 5x-10x to 20KB). If you use table with unique constraint and such you will expend 1000000*(32+8) = 40 MB (row overhead is about 32 bytes + 4-8 bytes for your unique keys and data), and it will NOT be compressed. So difference can be by factor of 1000x or more. – mvp Mar 2, 2013 at 2:14 constraint check_set check ( array_length(myset,1) <= 2 and (myset = array[''] or 'one'= ANY(myset) or 'two' = ANY(myset))

This would match a the definition of SET('one', 'two') as explained in the MySQL manual.

The only thing that this would not do, is to "normalize" the array. So

insert into foobar values (array['one', 'two']);
insert into foobar values (array['two', 'one']);

would be displayed differently than in MySQL (where both would wind up as 'one','two')

The check constraint will however get messy with more than 3 or 4 elements.

Building on a_horse_with_no_name's answer above, I would suggest something just a little more complex:

 CREATE FUNCTION set_check(in_value anyarray, in_check anyarray)
 RETURNS BOOL LANGUAGE SQL IMMUTABLE AS
    WITH basic_check AS (
         select bool_and(v = any($2)) as condition, count(*) as ct 
              FROM unnest($1) v
          GROUP BY v
    ), length_check AS (
         SELECT count(*) = 0 as test FROM unnest($1)
    SELECT bool_and(condition AND ct = 1)
      FROM basic_check
     UNION
    SELECT test from length_check where test;

Then you should be able to do something like:

 CREATE TABLE set_test (
       my_set text[] CHECK (set_check(my_set, array['one'::text,'two']))

This works:

postgres=# insert into set_test values ('{}');
INSERT 0 1
postgres=# insert into set_test values ('{one}');
INSERT 0 1
postgres=# insert into set_test values ('{one,two}');
INSERT 0 1
postgres=# insert into set_test values ('{one,three}');
ERROR:  new row for relation "set_test" violates check constraint "set_test_my_set_check"
postgres=# insert into set_test values ('{one,one}');
ERROR:  new row for relation "set_test" violates check constraint "set_test_my_set_check"

Note this assumes that for your set, every value must be unique (we are talking sets here). The function should perform very well and should meet your needs. However this has the advantage of handling any size sets.

Storage-wise it is completely different from MySQL's implementation. It will take up more space on disk but should handle sets with as many members as you like, provided that you aren't running up against storage limits.... So this should have a superset of functionality in comparison to MySQL's implementation. One significant difference though is that this does not collapse the array into distinct values. It just prohibits them. If you need that too, look at a trigger.

This solution also leaves the ordinality of input data intact so '{one,two}' is distinct from '{two,one}' so if you need to ensure that behavior has changed, you may want to look into exclusion constraints on PostgreSQL 9.2.

While that is the "best" match in Postgres, neither solution (SET or ENUM) really good. I consider both to be a workaround for a proper 1:n relationship model. – a_horse_with_no_name Feb 3, 2013 at 16:34

but it is not complete

some more complete and close to mysql is functionality from pltoolkit

http://okbob.blogspot.cz/2010/12/bitmapset-for-plpgsql.html http://pgfoundry.org/frs/download.php/3203/pltoolbox-1.0.2.tar.gz http://postgres.cz/wiki/PL_toolbox_%28en%29

function find_in_set can be emulated via arrays

http://okbob.blogspot.cz/2009/08/mysql-functions-for-postgresql.html

From reading the page referenced in the question, it seems like a SET is a way of storing up to 64 named boolean values in one column. PostgreSQL does not provide a way to do this. You could use independent boolean columns, or some size of integer and twiddle the bits directly. Adding two new tables (one for the valid names, and the other to join names to detail rows) could make sense, especially if there is the possibility of needing to associate any other data to individual values.