// Tutorial //

How To Use Aggregations in MongoDB

Published on October 21, 2021
Default avatar

By Mateusz Papiernik

Software Engineer, CTO @Makimo

How To Use Aggregations in MongoDB

The author selected the Open Internet/Free Speech Fund to receive a donation as part of the Write for DOnations program.

Introduction

Prerequisites

  • initial server setup tutorial for Ubuntu 20.04 .
  • MongoDB installed on your server. To set this up, follow our tutorial on How to Install MongoDB on Ubuntu 20.04 .
  • Your server’s MongoDB instance secured by enabling authentication and creating an administrative user. To secure MongoDB like this, follow our tutorial on How To Secure MongoDB on Ubuntu 20.04 .
  • Familiarity with querying MongoDB collections and filtering results. To learn how to use MongoDB queries, follow the tutorial How To Create Queries in MongoDB .
  • Note: The linked tutorials on how to configure your server, install, and then secure MongoDB installation refer to Ubuntu 20.04. This tutorial concentrates on MongoDB itself, not the underlying operating system. It will generally work with any MongoDB installation regardless of the operating system as long as authentication has been enabled.

    Understanding Aggregation Pipelines

    Step 1 — Preparing the Test Data

    MongoDB security tutorial and assumes the name of this administrative user is AdminSammy and its authentication database is admin . Be sure to change these details in the following command to reflect your own setup, if different:

    1. mongo -u AdminSammy -p --authenticationDatabase admin

      Enter the password you set during installation to gain access to the shell. After providing the password, your prompt will change to a greater-than sign (>).

      Note: On a fresh connection, the MongoDB shell will automatically connect to the test database by default. You can safely use this database to experiment with MongoDB and the MongoDB shell.

      Alternatively, you could also switch to another database to run all of the example commands given in this tutorial. To switch to another database, run the use command followed by the name of your database:

      1. use database_name

        To understand how the aggregation pipelines work, you’ll need a collection of documents with multiple fields of different types you can filter, sort, group, and summarize in different ways. This guide will use a sample collection describing the twenty most populated cities in the world. These documents will have the same format as the following sample document, which describes the city of Tokyo:

        The Tokyo document
        "name": "Tokyo", "country": "Japan", "continent": "Asia", "population": 37.400

        This document contains the following information:

      2. name: the city’s name.
      3. country: the country where the city is located.
      4. continent: the continent where the city is located.
      5. population: the city’s population, in millions.
      6. Run the following insertMany() method in the MongoDB shell to simultaneously create a collection named cities and insert twenty sample documents into it. These documents describe the twenty most populated cities in the world:

        1. db.cities.insertMany([
        2. {"name": "Seoul", "country": "South Korea", "continent": "Asia", "population": 25.674 },
        3. {"name": "Mumbai", "country": "India", "continent": "Asia", "population": 19.980 },
        4. {"name": "Lagos", "country": "Nigeria", "continent": "Africa", "population": 13.463 },
        5. {"name": "Beijing", "country": "China", "continent": "Asia", "population": 19.618 },
        6. {"name": "Shanghai", "country": "China", "continent": "Asia", "population": 25.582 },
        7. {"name": "Osaka", "country": "Japan", "continent": "Asia", "population": 19.281 },
        8. {"name": "Cairo", "country": "Egypt", "continent": "Africa", "population": 20.076 },
        9. {"name": "Tokyo", "country": "Japan", "continent": "Asia", "population": 37.400 },
        10. {"name": "Karachi", "country": "Pakistan", "continent": "Asia", "population": 15.400 },
        11. {"name": "Dhaka", "country": "Bangladesh", "continent": "Asia", "population": 19.578 },
        12. {"name": "Rio de Janeiro", "country": "Brazil", "continent": "South America", "population": 13.293 },
        13. {"name": "São Paulo", "country": "Brazil", "continent": "South America", "population": 21.650 },
        14. {"name": "Mexico City", "country": "Mexico", "continent": "North America", "population": 21.581 },
        15. {"name": "Delhi", "country": "India", "continent": "Asia", "population": 28.514 },
        16. {"name": "Buenos Aires", "country": "Argentina", "continent": "South America", "population": 14.967 },
        17. {"name": "Kolkata", "country": "India", "continent": "Asia", "population": 14.681 },
        18. {"name": "New York", "country": "United States", "continent": "North America", "population": 18.819 },
        19. {"name": "Manila", "country": "Philippines", "continent": "Asia", "population": 13.482 },
        20. {"name": "Chongqing", "country": "China", "continent": "Asia", "population": 14.838 },
        21. {"name": "Istanbul", "country": "Turkey", "continent": "Europe", "population": 14.751 }

          The output will contain a list of object identifiers assigned to the newly inserted objects.

          Output
          { "acknowledged" : true, "insertedIds" : [ ObjectId("612d1e835ebee16872a109a4"), ObjectId("612d1e835ebee16872a109a5"), ObjectId("612d1e835ebee16872a109a6"), ObjectId("612d1e835ebee16872a109a7"), ObjectId("612d1e835ebee16872a109a8"), ObjectId("612d1e835ebee16872a109a9"), ObjectId("612d1e835ebee16872a109aa"), ObjectId("612d1e835ebee16872a109ab"), ObjectId("612d1e835ebee16872a109ac"), ObjectId("612d1e835ebee16872a109ad"), ObjectId("612d1e835ebee16872a109ae"), ObjectId("612d1e835ebee16872a109af"), ObjectId("612d1e835ebee16872a109b0"), ObjectId("612d1e835ebee16872a109b1"), ObjectId("612d1e835ebee16872a109b2"), ObjectId("612d1e835ebee16872a109b3"), ObjectId("612d1e835ebee16872a109b4"), ObjectId("612d1e835ebee16872a109b5"), ObjectId("612d1e835ebee16872a109b6"), ObjectId("612d1e835ebee16872a109b7")

          You can verify that the documents were properly inserted by running the find() method on the cities collection with no arguments. This will retrieve all the documents in the collection:

          1. db.cities.find()
            Output
            { "_id" : ObjectId("612d1e835ebee16872a109a4"), "name" : "Seoul", "country" : "South Korea", "continent" : "Asia", "population" : 25.674 } . . .

            With the sample data in place, you can continue on to the next step to learn how to build an aggregation pipeline using the $match stage.

            Step 2 — Using the $match Aggregation Stage

            1. How To Create Queries in MongoDB tutorial listed in the Prerequisites section. The biggest difference is that $match can be used multiple times in the aggregation pipeline, allowing you to query documents that have already been processed and transformed earlier in the pipeline. You’ll learn more about using the same stage multiple times in the same aggregation pipeline later on in this guide.

              Run the following aggregate() method. This example includes a $match stage to select only cities from North America:

              1. db.cities.aggregate([
              2. { $match: { "continent": "North America" } }

                This time the { "continent": "North America" } query document appears as the parameter to the $match stage. Consequently, MongoDB returns two cities from North America:

                Output
                { "_id" : ObjectId("612d1e835ebee16872a109b0"), "name" : "Mexico City", "country" : "Mexico", "continent" : "North America", "population" : 21.581 } { "_id" : ObjectId("612d1e835ebee16872a109b4"), "name" : "New York", "country" : "United States", "continent" : "North America", "population" : 18.819 }

                This command returns the same output as the following one which instead uses the find() method to query the database:

                1. db.cities.find({ "continent": "North America" })

                  The previous aggregate() method only returns two cities, so there isn’t much to experiment with. To return more results, alter this command so it returns cities from North America and Asia:

                  1. db.cities.aggregate([
                  2. { $match: { "continent": { $in: ["North America", "Asia"] } } }

                    Notice that the query document syntax is once again identical to how you’d retrive the same data using the find() method. This time MongoDB returns 14 different cities:

                    Output
                    { "_id" : ObjectId("612d1e835ebee16872a109a4"), "name" : "Seoul", "country" : "South Korea", "continent" : "Asia", "population" : 25.674 } { "_id" : ObjectId("612d1e835ebee16872a109a5"), "name" : "Mumbai", "country" : "India", "continent" : "Asia", "population" : 19.98 } . . .

                    With that, you’ve learned how to execute an aggregation pipeline and using the $match stage to narrow down the collection’s documents. Continue reading to learn how to build more complex pipelines by using $sort stage to order the results and by combining multiple stages together.

                    Step 3 — Using the $sort Aggregation Stage

                        1. How to Use Indexes in MongoDB.

                          Step 4 — Using the $group Aggregation Stage

                              1. official MongoDB documentation.

                                Step 5 — Using the $project Aggregation Stage

                                1. those used in queries, constructed as inclusion projections or exclusion projections. The projection document keys correspond to the keys from input documents entering the $project stage.

                                  When the projection document contains keys with 1 as their values, it describes the list of fields that will be included in the result. If, on the other hand, projection keys are set to 0, the projection document describes the list of fields that will be excluded from the result.

                                  In an aggregation pipeline, projections can also include additional computed fields. In such cases, the projection automatically becomes an inclusion projection, and only the _id field can be suppressed by appending "_id": 0 to the projection document. Computed fields use the dollar sign field path notation for their values and can refer to the values from input documents.

                                  In this example, the document identifier is suppressed with "_id": 0, the name and population are computed fields referring to the name and population fields from the input documents, respectively. The location field becomes an embedded document with two additional keys: country and continent, referring to fields from the input documents.

                                  Using this projection stage, MongoDB will return the following documents:

                                  Output
                                  { "location" : { "country" : "South Korea", "continent" : "Asia" }, "name" : "Seoul", "population" : 25.674 } { "location" : { "country" : "India", "continent" : "Asia" }, "name" : "Mumbai", "population" : 19.98 } { "location" : { "country" : "Nigeria", "continent" : "Africa" }, "name" : "Lagos", "population" : 13.463 } { "location" : { "country" : "China", "continent" : "Asia" }, "name" : "Beijing", "population" : 19.618 } { "location" : { "country" : "China", "continent" : "Asia" }, "name" : "Shanghai", "population" : 25.582 } { "location" : { "country" : "Japan", "continent" : "Asia" }, "name" : "Osaka", "population" : 19.281 } { "location" : { "country" : "Egypt", "continent" : "Africa" }, "name" : "Cairo", "population" : 20.076 } { "location" : { "country" : "Japan", "continent" : "Asia" }, "name" : "Tokyo", "population" : 37.4 } { "location" : { "country" : "Pakistan", "continent" : "Asia" }, "name" : "Karachi", "population" : 15.4 } { "location" : { "country" : "Bangladesh", "continent" : "Asia" }, "name" : "Dhaka", "population" : 19.578 } { "location" : { "country" : "Brazil", "continent" : "South America" }, "name" : "Rio de Janeiro", "population" : 13.293 } { "location" : { "country" : "Brazil", "continent" : "South America" }, "name" : "São Paulo", "population" : 21.65 } { "location" : { "country" : "Mexico", "continent" : "North America" }, "name" : "Mexico City", "population" : 21.581 } { "location" : { "country" : "India", "continent" : "Asia" }, "name" : "Delhi", "population" : 28.514 } { "location" : { "country" : "Argentina", "continent" : "South America" }, "name" : "Buenos Aires", "population" : 14.967 } { "location" : { "country" : "India", "continent" : "Asia" }, "name" : "Kolkata", "population" : 14.681 } { "location" : { "country" : "United States", "continent" : "North America" }, "name" : "New York", "population" : 18.819 } { "location" : { "country" : "Philippines", "continent" : "Asia" }, "name" : "Manila", "population" : 13.482 } { "location" : { "country" : "China", "continent" : "Asia" }, "name" : "Chongqing", "population" : 14.838 } { "location" : { "country" : "Turkey", "continent" : "Europe" }, "name" : "Istanbul", "population" : 14.751 }

                                  Each document now follows the new format transformed through the projection stage.

                                  Now that you’ve learned how to use $project stage to construct a new document structure for the documents going through an aggregation pipeline, you’re ready to combine all the pipeline stages covered throughout this guide in a single aggregation pipeline.

                                  Step 6 — Putting All the Stages Together

                                            1. official MongoDB documentation to learn more about aggregation pipelines and how they can help you work with data stored in the database.


    Tutorial Series: How To Manage Data with MongoDB

    MongoDB is a document-oriented NoSQL database management system (DBMS). Unlike traditional relational DBMSs, which store data in tables consisting of rows and columns, MongoDB stores data in JSON-like structures referred to as documents .

    This series provides an overview of MongoDB’s features and how you can use them to manage and interact with your data.