Collectives™ on Stack Overflow
Find centralized, trusted content and collaborate around the technologies you use most.
Learn more about Collectives
Teams
Q&A for work
Connect and share knowledge within a single location that is structured and easy to search.
Learn more about Teams
I am contemplating to use MongoDB for my next project. One of the core requirements for this application is to provide facet search. Has anyone tried using MongoDB to achieve a facet search?
I have a product model with various attributes like size, color, brand etc. On searching a product, this Rails application should show facet filters on sidebar. Facet filters will look something like this:
Size:
XXS (34)
XS (22)
S (23)
M (37)
L (19)
XL (29)
Color:
Black (32)
Blue (87)
Green (14)
Red (21)
White (43)
Brand:
Brand 1 (43)
Brand 2 (27)
I think using Apache Solr or ElasticSearch you get more flexibility and performance, but this is supported using Aggregation Framework.
The main problem using MongoDB is you have to query it N Times: First for get matching results and then once per group; while using a full text search engine you get it all in one query.
Example
//'tags' filter simulates the search
//this query gets the products
db.products.find({tags: {$all: ["tag1", "tag2"]}})
//this query gets the size facet
db.products.aggregate(
{$match: {tags: {$all: ["tag1", "tag2"]}}},
{$group: {_id: "$size"}, count: {$sum:1}},
{$sort: {count:-1}}
//this query gets the color facet
db.products.aggregate(
{$match: {tags: {$all: ["tag1", "tag2"]}}},
{$group: {_id: "$color"}, count: {$sum:1}},
{$sort: {count:-1}}
//this query gets the brand facet
db.products.aggregate(
{$match: {tags: {$all: ["tag1", "tag2"]}}},
{$group: {_id: "$brand"}, count: {$sum:1}},
{$sort: {count:-1}}
Once the user filters the search using facets, you have to add this filter to query predicate and match predicate as follows.
//user clicks on "Brand 1" facet
db.products.find({tags: {$all: ["tag1", "tag2"]}, brand: "Brand 1"})
db.products.aggregate(
{$match: {tags: {$all: ["tag1", "tag2"]}}, brand: "Brand 1"},
{$group: {_id: "$size"}, count: {$sum:1}},
{$sort: {count:-1}}
db.products.aggregate(
{$match: {tags: {$all: ["tag1", "tag2"]}}, brand: "Brand 1"},
{$group: {_id: "$color"}, count: {$sum:1}},
{$sort: {count:-1}}
db.products.aggregate(
{$match: {tags: {$all: ["tag1", "tag2"]}}, brand: "Brand 1"},
{$group: {_id: "$brand"}, count: {$sum:1}},
{$sort: {count:-1}}
–
–
The $facet stage allows you to create multi-faceted aggregations which
characterize data across multiple dimensions, or facets, within a
single aggregation stage. Multi-faceted aggregations provide multiple
filters and categorizations to guide data browsing and analysis.
Input documents are passed to the $facet stage only once.
Now, you dont need to query N times for retrieving aggregations on N groups.
$facet enables various aggregations on the same set of input documents,
without needing to retrieve the input documents multiple times.
A sample query for the OP use-case would be something like
db.products.aggregate( [
$facet: {
"categorizedByColor": [
{ $match: { color: { $exists: 1 } } },
$bucket: {
groupBy: "$color",
default: "Other",
output: {
"count": { $sum: 1 }
"categorizedBySize": [
{ $match: { size: { $exists: 1 } } },
$bucket: {
groupBy: "$size",
default: "Other",
output: {
"count": { $sum: 1 }
"categorizedByBrand": [
{ $match: { brand: { $exists: 1 } } },
$bucket: {
groupBy: "$brand",
default: "Other",
output: {
"count": { $sum: 1 }
–
–
–
A popular option for more advanced search with MongoDB is to use ElasticSearch in conjunction with the community supported MongoDB River Plugin. The MongoDB River plugin feeds a stream of documents from MongoDB into ElasticSearch for indexing.
ElasticSearch is a distributed search engine based on Apache Lucene, and features a RESTful JSON interface over http. There is a Facet Search API and a number of other advanced features such as Percolate and "More like this".
You can do the query, the question would be is it fast or not. ie something like:
find( { size:'S', color:'Blue', Brand:{$in:[...]} } )
the question is then how is the performance. There isn't any special facility for faceted search in the product yet. Down the road there might be some set intersection-like query plans that are good but that is tbd/future.
If your properties are a predefined set and you know what they are you could create an index on each of them. Only one of the indexes will be used in the current implementation so this will help but only get you so far: if the data set is medium plus in size it might be fine.
You could use compound indexes which perhaps compound two or more of the properties. If you have a small # of properties this might work pretty well. The index need not use all the variables queries on but in the one above a compound index on any two of the three is likely to perform better than an index on a single item.
If you dont have too many skus brute force would work; e.g. if you are 1MM skues a table scan in ram might be fast enough. in this case i would make a table with just the facet values and make it as small as possible and keep the full sku docs in a separate collection. e.g.:
facets_collection:
{sz:1,brand:123,clr:'b',_id:}
if the # of facet dimensions isnt' too high you could instead make a highly compound index of the facit dimensions and you would get the equivalent to the above without the extra work.
if you create quit a few indexes, it is probably best to not create so many that they no longer fit in ram.
given the query runs and it is a performance question one might just with mongo and if it isn't fast enough then bolt on solr.
However, if one is able to feed data in the above format where facets and their values are joined together to form a consistent tag, then using the below query
db.productcolon.aggregate(
{ $unwind : "$tags" },
$group : {
_id : '$tags',
count: { $sum: 1 }
See the result output below
"_id" : "color:green",
"count" : NumberInt(1)
"_id" : "color:red",
"count" : NumberInt(1)
"_id" : "size:M",
"count" : NumberInt(3)
"_id" : "color:yellow",
"count" : NumberInt(1)
"_id" : "height:5",
"count" : NumberInt(1)
Beyond this step, your application server can do a color/size grouping before sending back to the client.
Note - The approach to combine facet and its values gives you all facet values agggregated and you can avoid - "The main problem using MongoDB is you have to query it N Times: First for get matching results and then once per group; while using a full text search engine you get it all in one query." see Garcia's answer
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.