最近有把一个小项目的底层数据库由 MySQL 修改成 MongoDB,借此更进一步熟悉了 Aggregation Pipeline Stages

本文仅包含翻译小项目 SQL 版本所需的 MongoDB Aggregation Pipeline Stages 知识,成文时 MongoDB 最新版本为 V4.2

db.collection.aggregate( [ { <stage> }, ... ] )
    outmerge、$geoNear 所有 stage(阶段)均可在 pipeline(管道)中出现多次,意味着本文所提这些均可出现多次

Examples

示例仅演示本文相关 stage 主要用法,详细请访问各个 stage 的官方文档

db.message.insertMany([
    {"id": 1,"num": 2},
    {"id": 1,"num": 4},
    {"id": 2,"num": 6},
    {"id": 2,"num": 4},
    {"id": 2,"num": 2},
    {"id": 3,"num": 2}
db.user.insertMany([
    {"id": 1,"name": "n1"},
    {"id": 2,"name": "n2"},
    {"id": 3,"name": "n3"}
db.message.aggregate([{
    $match: {
        id: 2
    $sort: {
        num: 1
    $limit: 2
    $project: {
        _id: 0
    {"id": 2,"num": 2},
    {"id": 2,"num": 4}
db.message.aggregate([{
    $group: {
        _id: "$id",
        max: {
            $max: "$num"
        sum: {
            $sum: "$num"
        count: {
            $sum: 1
    {"_id": 3,"max": 2,"sum": 2,"count": 1},
    {"_id": 2,"max": 6,"sum": 12,"count": 3},
    {"_id": 1,"max": 4,"sum": 6,"count": 2}
db.user.aggregate([{
    $match: {
        id: 3
    $lookup: {
        from: "message",
        localField: "id",
        foreignField: "id",
        as: "messages"
    $project: {
        _id: 0,
        "messages._id": 0
{"id": 3,"name": "n3","messages": [
        {"id": 3,"num": 2}

$match

{ $match: { <query> } }

Place the $match as early in the aggregation pipeline as possible. Because $match limits the total number of documents in the aggregation pipeline, earlier $match operations minimize the amount of processing down the pipe

If you place a $match at the very beginning of a pipeline, the query can take advantage of indexes

$sort

{ $sort: { <field1>: <sort order>, <field2>: <sort order> ... } }

set the sort order to 1 or -1 to specify an ascending or descending sort respectively

When a $sort precedes a $limit and there are no intervening stages that modify the number of documents, the optimizer can coalesce the $limit into the $sort

The $sort stage has a limit of 100 megabytes of RAM. By default, if the stage exceeds this limit, $sort will produce an error. To allow for the handling of large datasets, set the allowDiskUse option to true to enable $sort operations to write to temporary files

$sort operator can take advantage of an index when placed at the beginning of the pipeline or placed before the $project, $unwind, and $group aggregation operators. If $project, $unwind, or $group occur prior to the $sort operation, $sort cannot use any indexes

$limit

{ $limit: <positive integer> }

$limit takes a positive integer that specifies the maximum number of documents to pass along

$group

{ $group: { _id: <expression>, <field1>: { <accumulator1> : <expression1> }, ... } }

The _id field is mandatory; however, you can specify an _id value of null, or any other constant value, to calculate accumulated values for all the input documents as a whole

The remaining computed fields are optional and computed using the <accumulator> operators

Accumulator Operator

$avg Returns an average of numerical values. Ignores non-numeric values

$max Returns the highest expression value for each group

$min Returns the lowest expression value for each group

$sum Returns a sum of numerical values. Ignores non-numeric values

$lookup

Equality Match

To perform an equality match between a field from the input documents with a field from the documents of the “joined” collection

$lookup: from: <collection to join>, localField: <field from the input documents>, foreignField: <field from the documents of the "from" collection>, as: <output array field>

from Specifies the collection in the same database to perform the join with. The from collection cannot be sharded

localField Specifies the field from the documents input to the $lookup stage

foreignField Specifies the field from the documents in the from collection

as Specifies the name of the new array field to add to the input documents. The new array field contains the matching documents from the from collection. If the specified name already exists in the input document, the existing field is overwritten

if does not contain the localField or foreignField, the $lookup treats the value as null for matching purposes

Join Conditions and Uncorrelated Sub-queries

$skip

{ $skip: <positive integer> }

$skip takes a positive integer that specifies the maximum number of documents to skip

$sample

{ $sample: { size: <positive integer> } }

Randomly selects the specified number of documents from its input

$sample may output the same document more than once in its result set

$count

{ $count: <string> }

<string> is the name of the output field which has the count as its value. <string> must be a non-empty string, must not start with $ and must not contain the . character

The $count stage is equivalent to the following $group + $project sequence

db.collection.aggregate( [
   { $group: { _id: null, myCount: { $sum: 1 } } },
   { $project: { _id: 0 } }
                                    MongoDB Sharded Cluster原理如果你还不了解 MongoDB Sharded Cluster,可以先看文档认识一下:
中文简介:MongoDB Sharded Cluster架构原理
英文汇总:https://docs.mongodb.com/manual/sharding/
什么时候考虑用Sharded Cluster?当你考虑使用Sharded Cluster时,通常是要解...
本文整理了一年多以来我常用MongoDB操作,涉及mongo-shell、pymongo,既有运维层面也有应用层面,内容有浅有深,这也就是我从零到熟练的历程。
MongoDB的使用之前也分享过一篇,稍微高阶点:见这里:《MongoDB使用小结》
1、shell登陆和显示
MongoDB分片集群,英文名称为: Sharded Cluster
旨在通过横向扩展,来提高数据吞吐性能、增大数据存储量。
分片集群由三个组件:“mongos”, “config server”, “shard” 组成。
框架如下(图片来自mongodb官网说明):
mongos:数据库请求路由。负责接收所有客户端应用程序的连接查询请求,并将请求路由到集群内部对应的分片上。...
                                    分片是将数据分布在多台机器的方法。MongoDB 使用分片来提供海量数据的部署和高吞吐操作 .
这种数据库系统可以挑战单服务器的最高容量,例如高频率的查询可以将服务器的cpu容量发挥到极致。超出系统内存的数据集则会充分调用磁盘的IO。
面对系统扩充有两种方法:垂直扩展和水平扩展
提高单台服务器的容量,例如使用更强的cpu,加内存,存储空间。技术的局限限制了单台机器面对高负载的能...
		lookup()
			(String from, String localField, String foreignField, String as)
			from:关联表localField:主记录关联字段,传入的是MongoDB中的字段名,非实体类字段名foreignField:关联表关联字段,字段名同上as:别名,及实体类映射字段名(lookup默认返回的类型是Arr
                                    使用MongoDB关联查询使用MongoDB关联查询Mongo shell 关联查询$lookup 简单教程$lookup 示例功能快捷键合理的创建标题,有助于目录的生成如何改变文本的样式插入链接与图片如何插入一段漂亮的代码片生成一个适合你的列表创建一个表格设定内容居中、居左、居右SmartyPants创建一个自定义列表如何创建一个注脚注释也是必不可少的KaTeX数学公式新的甘特图功能,丰富你的文章UML 图表FLowchart流程图导出与导入导出导入
使用MongoDB关联查询
在工作中,我们有时会使用M
$addFields
Adds new fields to documents. Similar to $project, addFieldsreshapeseachdocumentinthestream;specifically,byaddingnewfieldstooutputdocumentsthatcontainboththeexisti...
                                    第三篇笔记本想着记录一些简单的增删改查,由于中间很久没有写就一时懒得整理了,先把最近刚遇到的问题记录一下
通过Aggregationlookup进行多级关联查询
在SQL中可以通过left join操作进行多表的关联查询,在mongo中,类似的操作为Aggregation中的lookup,可以看一下如下数据结构
@Data
public class Subject {
    private ObjectId id;
    private String name;
    private String
                                    Sharding Introduction      	Sharding is a method for storing data across multiple machines. MongoDB uses sharding to support deployments with very large data sets and high throughput ope...
副本集1:192.168.115.11:27017,192.168.115.12:27017,192.168.115.11:47017(arbiter)
副本集2:192.168.115.11:37017,192.168.1
                                    目录1 管道2 管道表达式3 聚合管道行为3.1 管道操作符和索引3.2 早期过滤(Early Filtering)4 注意事项4.1 分片集合4.2 聚合 vs Map-Reduce4.3 限制4.4 管道优化
进入MongoDB中文手册(4.2版本)专栏
聚合管道是一个基于数据处理管道概念建模的数据聚合框架。文档进入一个多阶段的管道,该管道将文档转换成聚合的结果。例如:
db.orders.aggregate([
   { $match: { status: "A" } },
   { $group: 
                                    原文地址https://docs.mongodb.org/manual/reference/limits/BSON文档BSON 文档大小BSON 最大是16M限制BSON文档大小的原因是:1.文档内存占用2.网络传输时带宽的占用3.要存储大文档请使用GridFSBSON 内嵌文档的深度MongoDB 支持...