Collectives™ on Stack Overflow

Find centralized, trusted content and collaborate around the technologies you use most.

Learn more about Collectives

Teams

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Learn more about Teams

I am contemplating to use MongoDB for my next project. One of the core requirements for this application is to provide facet search. Has anyone tried using MongoDB to achieve a facet search?

I have a product model with various attributes like size, color, brand etc. On searching a product, this Rails application should show facet filters on sidebar. Facet filters will look something like this:

Size:
XXS (34)
XS (22)
S (23)
M (37)
L (19)
XL (29)
Color:
Black (32)
Blue (87)
Green (14)
Red (21)
White (43)
Brand:
Brand 1 (43)
Brand 2 (27)

I think using Apache Solr or ElasticSearch you get more flexibility and performance, but this is supported using Aggregation Framework.

The main problem using MongoDB is you have to query it N Times: First for get matching results and then once per group; while using a full text search engine you get it all in one query.

Example

//'tags' filter simulates the search
//this query gets the products
db.products.find({tags: {$all: ["tag1", "tag2"]}})
//this query gets the size facet
db.products.aggregate(
    {$match: {tags: {$all: ["tag1", "tag2"]}}}, 
    {$group: {_id: "$size"}, count: {$sum:1}}, 
    {$sort: {count:-1}}
//this query gets the color facet
db.products.aggregate(
    {$match: {tags: {$all: ["tag1", "tag2"]}}}, 
    {$group: {_id: "$color"}, count: {$sum:1}}, 
    {$sort: {count:-1}}
//this query gets the brand facet
db.products.aggregate(
    {$match: {tags: {$all: ["tag1", "tag2"]}}}, 
    {$group: {_id: "$brand"}, count: {$sum:1}}, 
    {$sort: {count:-1}}

Once the user filters the search using facets, you have to add this filter to query predicate and match predicate as follows.

//user clicks on "Brand 1" facet
db.products.find({tags: {$all: ["tag1", "tag2"]}, brand: "Brand 1"})
db.products.aggregate(
    {$match: {tags: {$all: ["tag1", "tag2"]}}, brand: "Brand 1"}, 
    {$group: {_id: "$size"}, count: {$sum:1}}, 
    {$sort: {count:-1}}
db.products.aggregate(
    {$match: {tags: {$all: ["tag1", "tag2"]}}, brand: "Brand 1"}, 
    {$group: {_id: "$color"}, count: {$sum:1}}, 
    {$sort: {count:-1}}
db.products.aggregate(
    {$match: {tags: {$all: ["tag1", "tag2"]}}, brand: "Brand 1"}, 
    {$group: {_id: "$brand"}, count: {$sum:1}}, 
    {$sort: {count:-1}}
                Aggregation Framework seems promising. I don't see any issue with executing additional queries per facet group. Let me create a POC application to validate this implementation.
– Firoz Ansari
                Sep 17, 2012 at 15:27
                Yes, it's really powerful and it gives us a lot of possibilities. The main problem with this framework is query optimization. Using sharding there is a lack of query optimization. I'm working on patching this issues and pulling it in github.
– Samuel García
                Sep 18, 2012 at 11:24
  

The $facet stage allows you to create multi-faceted aggregations which characterize data across multiple dimensions, or facets, within a single aggregation stage. Multi-faceted aggregations provide multiple filters and categorizations to guide data browsing and analysis.

Input documents are passed to the $facet stage only once.

Now, you dont need to query N times for retrieving aggregations on N groups.

$facet enables various aggregations on the same set of input documents, without needing to retrieve the input documents multiple times.

A sample query for the OP use-case would be something like

db.products.aggregate( [
    $facet: {
      "categorizedByColor": [
        { $match: { color: { $exists: 1 } } },
          $bucket: {
            groupBy: "$color",
            default: "Other",
            output: {
              "count": { $sum: 1 }
      "categorizedBySize": [
        { $match: { size: { $exists: 1 } } },
          $bucket: {
            groupBy: "$size",
            default: "Other",
            output: {
              "count": { $sum: 1 }
      "categorizedByBrand": [
        { $match: { brand: { $exists: 1 } } },
          $bucket: {
            groupBy: "$brand",
            default: "Other",
            output: {
              "count": { $sum: 1 }
                you would still need to run two searches though correct one for the documents and then the example you have here for the associated facets?
– Ominus
                Aug 2, 2017 at 22:23
                Yes... seems like that. It just solves the use-case for multiple facets in a single query
– Rahul
                Aug 3, 2017 at 3:36
                Is there any way in mongodb for running only one query to get documents and the associated facets?
– qmn1711
                Jul 4, 2018 at 7:48

A popular option for more advanced search with MongoDB is to use ElasticSearch in conjunction with the community supported MongoDB River Plugin. The MongoDB River plugin feeds a stream of documents from MongoDB into ElasticSearch for indexing.

ElasticSearch is a distributed search engine based on Apache Lucene, and features a RESTful JSON interface over http. There is a Facet Search API and a number of other advanced features such as Percolate and "More like this".

You can do the query, the question would be is it fast or not. ie something like:

find( { size:'S', color:'Blue', Brand:{$in:[...]} } )

the question is then how is the performance. There isn't any special facility for faceted search in the product yet. Down the road there might be some set intersection-like query plans that are good but that is tbd/future.

  • If your properties are a predefined set and you know what they are you could create an index on each of them. Only one of the indexes will be used in the current implementation so this will help but only get you so far: if the data set is medium plus in size it might be fine.

  • You could use compound indexes which perhaps compound two or more of the properties. If you have a small # of properties this might work pretty well. The index need not use all the variables queries on but in the one above a compound index on any two of the three is likely to perform better than an index on a single item.

  • If you dont have too many skus brute force would work; e.g. if you are 1MM skues a table scan in ram might be fast enough. in this case i would make a table with just the facet values and make it as small as possible and keep the full sku docs in a separate collection. e.g.:

    facets_collection: {sz:1,brand:123,clr:'b',_id:}

    if the # of facet dimensions isnt' too high you could instead make a highly compound index of the facit dimensions and you would get the equivalent to the above without the extra work.

    if you create quit a few indexes, it is probably best to not create so many that they no longer fit in ram.

    given the query runs and it is a performance question one might just with mongo and if it isn't fast enough then bolt on solr.

    However, if one is able to feed data in the above format where facets and their values are joined together to form a consistent tag, then using the below query

    db.productcolon.aggregate(
          { $unwind : "$tags" },
            $group : {
              _id : '$tags',
              count: { $sum: 1 }
    

    See the result output below

    "_id" : "color:green", "count" : NumberInt(1) "_id" : "color:red", "count" : NumberInt(1) "_id" : "size:M", "count" : NumberInt(3) "_id" : "color:yellow", "count" : NumberInt(1) "_id" : "height:5", "count" : NumberInt(1)

    Beyond this step, your application server can do a color/size grouping before sending back to the client.

    Note - The approach to combine facet and its values gives you all facet values agggregated and you can avoid - "The main problem using MongoDB is you have to query it N Times: First for get matching results and then once per group; while using a full text search engine you get it all in one query." see Garcia's answer

    Thanks for contributing an answer to Stack Overflow!

    • Please be sure to answer the question. Provide details and share your research!

    But avoid

    • Asking for help, clarification, or responding to other answers.
    • Making statements based on opinion; back them up with references or personal experience.

    To learn more, see our tips on writing great answers.

  •