![]() |
风流倜傥的热带鱼 · js操作两个json数组合并、去重,以及删除 ...· 7 月前 · |
![]() |
冷静的日记本 · 学习Nest.js(五):使用管道、DTO ...· 1 年前 · |
![]() |
深情的奔马 · RxJava之异常捕获操作符介绍_103st ...· 1 年前 · |
![]() |
鬼畜的开水瓶 · 当指定结束位时,DeviceRadixSor ...· 1 年前 · |
IX mongodb
目录
架构 1
CRUD操作: 4
mongodb aggregation framework 6
复制集: 7
mongodb全家桶: 8
常见部署方式: 9
常用监控工具及手段: 14
备份与恢复 15
安全架构 19
索引机制 25
读写性能机制 26
高级集群设计 29
什么是MongoDB?
以json为数据模型的文档数据库;
为什么叫文档数据库?
文档来自于JSON Document,并非我们一般理解的PDF、word文档;
认证开发MongoDB?
上市公司MongoDB Inc.,总部位于美国纽约;
主要用途?
应用数据库,类似于Oracle|MySQL,海量数据处理,数据平台;
主要特点?
建模为可选|Json数据模型比较适合开发者|横向扩展可支撑很大数据量和并发|4.0开始支持ACID;
是免费的?
有两个发布版本,社区版(基于SSPL,一种和AGPL基本类似的开源协议)和企业版(基于商业协议,需付费使用);
版本变迁:
2008 0.x起步阶段;
2010 1.x支持复制集和分片集;
2012 2.x更丰富的数据库功能;
2014 3.x WiredTiger和周边生态环境;
2018 4.x分布式事务支持;
|
MongoDB |
RDBMS |
数据模型 |
文档模型 |
关系模型 |
数据库类型 |
OLTP |
OLTP |
CRUD操作 |
MQL(mongo query language)/SQL |
SQL |
高可用 |
复制集 |
集群模式 |
横向扩展能力 |
通过原生分片完善支持 |
数据分区或应用侵入式 |
索引支持 |
B树|全文索引|地理位置索引|multikey多键索引|TTL索引 |
B树 |
开发难度 |
容易 |
困难 |
数据容量 |
没有理论上限 |
千万|亿 |
扩展方式 |
垂直扩展+水平扩展 |
垂直扩展 |
优势:面向开发者的易用+高效数据库;
简单直观,以自然的方式建模,以直观的方式与数据库交互;
结构灵活,弹性模式从容响应需求的频繁变化;
快速开发,做更多的事,写更少的代码;
一目了然的对象模型:
objects<-->database
Customer<-->Customer
Cart<-->Cart
Product<-->Product
Order<-->Order
json结构和对象模型接近,开发代码量低;
json的动态模型意味着更容易响应新的业务需求;
复制集提供99.999%高可用;
分片架构支持海量数据和无缝扩容;
灵活,快速响应业务变化(可动态增加新字段):
多形性,同一个集合中可以包含不同字段(类型)的文档对象;
动态性,线上修改数据模式,修改是应用与数据库均无须下线;
数据治理,支持使用json schema来规范数据模式,在保证模式灵活动态的前提下,提供数据治理能力;
快速,最简单快速的开发方式:
json模型之快速特性,数据库引擎只需要在一个存储区读写|反范式、无关联的组织极大优化查询速度|程序API自然、开发快速;
原生的高可用:
replica set,2到50个成员(建议3个,1个Primary2个Secondary,application通过driver只和Primary通信);
自恢复;
多中心容灾能力;
滚动服务,最小化服务终端;
横向扩展能力:
需要的时候无缝扩展;
应用全透明;
多种数据分布策略;
轻松支持TB-PB数量级;
安装:
正式版本的次版本号均偶数,如3.0|3.2|4.0|4.2;
https://raw.githubusercontent.com/tapdata/geektime-mongodb-course/master/aggregation/dump.tar.gz
tar xf dump.tar.gz
mongorestore -h localhost:27017
>show dbs
>use mock
>show collections # 或show tables
>db.orders.findOne()
MongoDB Compass连接工具;
db.<集合>.insertOne(<JSON对象>) # db.fruit.insertOne({name: "apple"})
db.<集合>.insertMany([<JSON1>, <JSON2>...<JSONn>]) # db.fruit.insertMany([{name: "apple"}, {name: "pear"}, {name: "orange"}])
find:
是mongo中查询数据的基本指令,相当于sql中的select;
返回的是游标;
后加.pretty()可显示更友好;
示例:
db.movies.find({"year": 1975}) # 单条件查询
db.movies.find({"year":1989, "title": "Batman"}) # 多条件and查询
db.movies.find({$and: [{"title": "Batman"}, {"category": "action"}]}) # and的另一种形式
db.movies.find({$or: [{"year":1989}, {"title": "Batman"}]}) # 多条件or查询
db.movies.find({"title": /^B/}) // 按正则查询
查询条件对照表:
SQL |
MQL |
a = 1 |
{a: 1} |
a <> 1 |
{a: {$ne: 1}} |
a > 1 |
{a: {$gt: 1}} |
a >= 1 |
{a: {$gte: 1}} |
a < 1 |
{a : {$lt: 1}} |
a <= 1 |
{a: {$lte: 1}} |
a = 1 and b = 1 |
{a: 1, b: 1} 或 {$and: [{a: 1}, {b: 1}]} |
a = 1 or b = 1 |
{$or: [{a: 1},{b: 1}]} |
a is null |
{a: {$exists: false}} |
a in (1,2,3) |
{a: {$in: [1,2,3]}} |
$lt,存在并小于;
$lte,存在并小于等于;
$gt,存在并大于;
$gte,存在并大于等于;
$ne,不存在或存在但不等于;
$in,存在并在指定数组中;
$nin,不存在或不在指定数组中;
$or,匹配2个或多个条件中的一个;
$and,匹配全部条件;
find支持使用field.sub_field的形式查询子文档:
db.fruit.insertOne({name: "apple", from: {country: "China", province: "Guangdong"}})
db.fruit.find({"from.country": "China"}) # 正确
db.fruit.find({"from": {country: "China"}}) # 错误
df.movies.find({"filming_locations.city": "Rome"}) # 查找城市是Rome的记录
使用find搜索数组:
find支持对数组中的元素进行搜索;
db.fruit.insert([{"name": "Apple", color: ["red", "green"]},{"name": "Mango", color: ["yellow", "green"]}])
db.fruit.find({color: "red"})
db.fruit.find({$or: [{color: "red"}, {color: "yellow"}]})
db.getCollections('movies').find({"filming_locations": { $elemMatch : {"city": "Rome", "country": "USA"}}}) # 在数组中搜索子对象的多个字段时,若使用 $elemMatch,表示必须是同一个子对象满足多个条件
db.getCollection('movies').find({"filming_locations.city": "Rome", "filming_locations.country": "USA"}) # 对比这2个查询
db.movies.find({"category": "action"} , {"_id": 0, title: 1} ) # find可指定只返回指定的字段,_id字段必须明确指明不返回否则默认返回,mongo中称这为projection投影,此例不返回_id返回title
db.testcol.remove({a: 1}) # remove命令需要配合查询条件使用,匹配查询条件的文件会被删除,指定一个空文档条件会删除所有文档
db.testcol.remove({a: {$lt: 5}}) # 删除a<5的记录
db.testcol.remove({}) # 删除所有记录
db.testcol.remove() # 报错
db.<集合>.update(<查询条件>, <更新字段>)
使用updateOne()|updateMany()时,要求更新条件部分必须具有以下之一,否则将报错:$set|$unset|
$push(增加一个对象对数组底部)|
$pushAll(增加多个对象到数组底部)|
$pop(从数组底部删除一个对象)|
$pull(如果匹配指定的值,从数组中删除相应的对象)|
$pullAll(如果匹配任意的值,从数组中删除相应的对象)|
$addToSet(如果不存在则增加一个值到数组),如df.fruit.updateOne({name: "apple"}, {from: "China"})会报错;
df.fruit.updateOne({name: "apple"}, {$set: {from: "China"}} ) # 无论条件匹配多少条记录始终只更新第一条查询name为apple的记录,将找到记录的from设置为China
db.<集合>.drop() # 删除一个集合,集合中的全部文档都会被删除,集合相关的索引也会被删除,即使有复制集也同样会把所有数据都删掉
>db # 列出当前所在库
>db.dropDatabase() # 删除当前库
pip install pymongo
pymongo.version
mongodb://ip:port
from pymongo import MongoClient
uri = 'mongodb://127.0.0.1:27017'
client = MongoClient(uri)
print(client)
db = client['eshop'] # 获取到db对象
user_coll = db['users'] # 获取到collection对象
new_user = {'username': 'nina', 'password': 'xxxx', 'email': '123345@qq.com'}
result = user_coll.insert_one(new_user)
result
result = user_coll.update_one({'username': 'nina'}, {$set: {'phone': '13888888888'}})
use eshop
db.users.find()
聚合框架;
是一个计算框架,它可以作用在一个或几个集合上,对集合中的数据进行的一系列运算,将这些数据转化为期望的形式;
从效果而言,聚合框架相当于sql查询中的group by|left outer join|as等;
使用场景,可用于OLAP(分析一段时间内的销售总额|均值,计算一段时间内的净利润|分析购买人的年龄分布|分析学生成绩分布|统计员工绩效)和OLTP(计算)场景;
pipeline管道和stage步骤:
整个聚合运算过程称为pipeline,它是由多个stage组成的,每个管道接受一系列文档(原始数据),每个步骤对这些文档进行一系列运算,结果文档输出给下一个步骤;
pipeline = [$stage1, $stage2, ... $stageN]
db.<collection>.aggregate(pipeline, {options}) # 聚合运算的基本格式
常见stage:
步骤 |
作用 |
sql等价运算符 |
$match |
过滤 |
where |
$project |
投影 |
as |
$sort |
排序 |
order by |
$group |
分组 |
group by |
$skip/$limit |
结果限制 |
skip/limit |
$lookup |
左外连接 |
left outer join |
$unwind |
展开数组 |
N/A |
$graphLookup |
图搜索 |
N/A |
$facet/$bucket |
分面搜索 |
N/A |
常见步骤中的操作符:
$match:$eq|$gt|$gte|$lt|$lte|$and|$or|$not|$in|$geoWithin|$intersect;
$project:$map|$reduce|$filter|$range|$multiply|$divide|$substract|$add|$year|$month|$dayOfMonth|$hour|$minute|$second;
$group:$sum|$avg|$push|$addToSet|$first|$last|$max|$min;
聚合查询实验:todo
主要意义在于实现服务高可用;
现实依赖2个方面功能,数据写入时将数据迅速复制到另一个独立节点上,在接受写入的节点发生故障时自动选举出一个新的替代节点;
在实现高可用的同时,复制集实现了其它几个附加作用:
数据分发,将数据从一个区域复制到另一个区域,减少另一个区域的读延迟;
读写分离,不同类型的压力分别在不同的节点上执行;
异地容灾,在数据中心故障时快速切换到异地;
一个典型的复制集由3个以上具有投票权的节点组成,包括:
一个Primary主节点,接受写入操作和选举时投票;
2个(或>=50)从节点Secondary,复制主节点上的新数据和选举时投票;
不推荐使用arbiter投票节点;
数据是如何复制的?
当一个修改操作,无论是插入、更新或删除,到达主节点时,它对数据的操作将被记录下来(经过一些必要的转换),这些记录称为oplog;
从节点通过在主节点上打开一个tailable游标不断获取新进入主节点的oplog,并在自己的数据上回放,以此保持跟主节点的数据一致;
通过选举完成故障恢复:
具有投票权的节点之间两两互相发送心跳(每2s一次);
当5次心跳未收到时判断为节点失联;
如果失联的是主节点,从节点会发起选举,选出新的主节点;
如果失联的是从节点则不会产生新的选举;
选举基于RAFT一致性算法实现,选举成功的必要条件是大多数投票节点存活;
复制集中最多可以有50个节点,但具有投票权的节点最多7个;
影响选举的因素:
整个集群必须有大多数节点存活;
被选举为主节点的节点必须:能够与多数节点建立连接;具有较新的oplog;具有较高的优先级(如果有配置);
复制集节点有以下常见的选项:
v参数,是否具有投票权,有则参与投票;
priority参数,优先级越高的节点优先成为主节点,优先级为0的节点无法成为主节点;
hidden参数,复制数据,但对应用不可见,隐藏节点可以具有投票权,但优先级必须为0;
slaveDelay参数,复制n秒之前的数据,保持与主节点的时间差,用于误删数据后从此节点恢复;
注意:
硬件,正常的复制集节点都有可能成为主节点,它们的地位是一样的,因此硬件配置上必须一致;为了保证节点不会同时宕机,各节点使用的硬件必须具有独立性;
软件,复制集各节点软件版本必须一致,以避免不可预知的问题;
增加节点不会增加系统写性能;
复制集实验:todo
mongod,MongoDB数据库软件;
mongo,命令行工具,管理数据库;
mongos,路由进程,分片环境下使用;
mongodump|mongorestore,命令行数据库备份与恢复工具;
mongoexport|mongoimport,csv|json导入与导出,用于不同系统间数据迁移;
compass,GUI管理工具;
ops manager(企业版),集群管理软件;
BI Connector(企业版),sql解释器|BI套接件;
MongoDB Charts(企业版),可视化软件;
Atlas(付费及免费),云托管服务,包括永久免费云数据库;
mongodump|mongorestore:
类似于mysqldump|mysqlrestore;
可以完成全库dump,不加条件;
也可根据条件dump部分数据,-q参数;
dump的同时跟踪数据,--oplog;
restore是反操作,把mongodump的输出导入到mongodb;
mongodump -h 127.0.0.1:27017 -d test -c test
mongorestore -h 127.0.0.1:27017 -d test -c test xxx.bson
单机版(开发与测试),仅primary;
复制集(高可用),1primary2secondary;
分片集群(横向扩展),N组shard(<=1024),每组1primary2secondary,另mongos;
为什么要使用分片集群?
数据容量日益增大,访问性能日渐降低;
新品上线异常火爆,如何支撑更多的并发用户;
单库已有10TB数据,恢复需要1-2天,如何加速;
地理分布数据;
mongos路由节点:
提供集群单一入口;
转发应用端请求;
选择合适的数据节点进行读写;
合并多个数据节点的返回;
无状态;
建议至少2个;
mongod配置(目录)节点:
提供集群元数据存储,分片数据分布的映射,如0-1000到shard0,1001-2000到shard1;
普通复制集架构(建议3个高可用);
mongod数据节点:
以复制集为单位;
横向扩展,最大1024分片;
分片之间数据不重复,所有分片在一起才可完整工作;
分片集群特点:
应用全透明,无特殊处理;
数据自动平衡;
动态扩容,无须下线;
提供三种分片方式(基于范围|基于hash|基于zone或tag);
分片集群数据分布方式-基于范围:
分片集群数据分布方式-基于哈希:
分片集群数据分布方式-自定义zone:
给分片打标签;
分片集群可以有效解决性能瓶颈及系统扩容问题;
分片额外消耗较多,管理复杂,尽量不要分片;
如何用好分片集群:
合理的架构:是否需要分片?需要多少分片?数据的分布规则?
正确的姿势:选择需要分片的表;选择正确的片键;使用适合的均衡策略;
足够的资源:cpu|RAM|存储,mongos与config通常消耗很少的资源,可选择低规格虚拟机,资源的重点在于shard服务器(需要足以容纳热数据索引的内存;正确创建索引后cpu通常不会成为瓶颈,除非涉及非常多的计算;磁盘尽量选用SSD),实际测试是最好的检验,来看你的资源配置是否完备;即使项目初期具备了足够的资源,仍需要考虑在合适的时候扩容,建议监控各项资源使用情况,无论哪一项达到60%以上则开始考虑扩容,因为(扩容需要新的资源,申请新资源需要时间;扩容后数据需要均衡,均衡需要时间,应保证新数据入库速度慢于均衡速度;均衡需要资源,如果资源即将或已经耗尽,均衡也是很低效的)
合理的架构-分片大小:
关于数据,数据量不超过3TB,尽可能保持在2TB一个片;
关于索引,常用索引必须容纳进内存;
按以上标准初步确定分片后,还需要考虑业务压力,随着压力增大cpu|mem|disk中的任何一项出现瓶颈时,都可通过添加更多分片来解决;
合理的架构-需要多少分片:
A=所需存储总量 / 单服务器可挂载容量,8TB/2TB=4;
B=工作集大小/单服务器内存容量,400GB/(256*0.6)=3;
C=并发量总数/(单服务器并发量*0.7),30000/(9000*0.7)=6;
分片数量=max(A,B,C)=6;
合理的架构-分片的分布:
是否需要跨机房分布分片;
是否需要容灾;
高可用的要求如何;
正确的姿势-各种概念由小到大:
shard key片键,文档中的一个字段;
doc文档,包含shard key的一行数据;
chunk块,包含n个文档;
shard分片,包含n个chunk;
cluster集群,包含n个分片;
正确的姿势-选择合适片键:
影响片键的主要因素:cardinality取值基数;取值分布;分散写,集中读;被尽可能多的业务场景用到;避免单调递增或递减的片键;
选择基数大的片键:对于小基数的片键(因为备选值有限,那么块的总数量就有限;随着数据增多,块的大小会越来越大;太大的块,会导致水平扩展时移动块会非常困难,如存储一个高中的师生数据,以年龄作为片键(假设年龄范围为15-65岁),那么15<=年龄<=65,且只为整数,最多只会有51个chunk),结论:取值基数要大;
选择分布均匀的片键:对于分布不均匀的片键(造成某些块的数据量急剧增大,这些块压力随之增大,数据均衡以chunk为单位,所以系统无能为力,如存储一个学校的师生数据,以年龄(15-65)作为片键,大部分人的年龄范围为15-18,这4个chunk的数据量,访问压力远大于其他chunk),结论:取值分布应尽可能均匀;
正确的姿势-定向性好:
4个分片的集群,希望读某条特定的数据;
如果用片键作为条件查询,mongos可直接定位到具体的分片;
如果不用片键,mongos需要把查询发到4个分片,等最后一个分片响应,mongos才能响应到应用端;
结论:对主要查询要具有定向能力;
例:
一个Email系统的片键
{
_id: ObjectId(),
user: 'test',
time: Date(),
subject: '',
recipients: [],
body: '',
attachments: []
}
1、片键用自增{_id:1},基数V,写分布X(由于自增量,写会有2次,先在一个分片上写,再搬到其他分片上),定向查询X(一般不会用随机数查询);
2、片键用{_id: "hashed"},基数V,写分布V,定向查询X;
3、片键用{user_id: 1},基数X,写分布V,定向查询V;
4、片键用{user_id: 1, time:1},基数V,写分布V,定向查询V;
分片实验:
配置域名解析;
准备分片目录;
创建第一个分片复制集并初始化;
创建config复制集并初始化;
初始化分片集群,加入第一个分片;
创建分片表;
加入第二个分片;
rs.status()
sh.status()
sh.enableSharding("foo") # 创建分片表
sh.shardCollection("foo.bar", {_id: "hashed"})
MongoDB Ops Manager;
percona;
通用监控平台;
程序脚本;
监控信息的来源:
db.serverStatus() # 主要,此监控信息是从上次开机到现在为止的累计数据,因此不能简单使用
db.isMaster() # 次要
mongostats命令行工具 # 只有部分信息
serverStatus()主要信息:
connections # 关于连接数的信息
locks # 使用的锁情况
network # 网络使用情况统计
opcounters # crud的执行次数统计
repl # 复制集配置信息
wiredTiger # 包含大量wirdTiger执行情况的信息,block-manager(WT数据块的读写情况),session(session使用数量),concurrentTransactions(Ticket使用情况),mem(内存使用情况),metrics(一系列性能指标统计信息)
监控报警的考量:
具备一定的容错机制以减少误报的发生;总结应用各指标峰值;适时调整报警阈值;留出足够的处理时间;
建议监控指标:
指标 |
意义 |
获取 |
opcounters操作计数器 |
查询、更新、插入、删除、getmore和其它命令的数量 |
db.serverStatus().opcounters |
tickets令牌 |
对wiredTiger存储引擎的读写令牌数量,令牌数量表示了可以进入存储引擎的并发操作数量 |
db.serverStatus().wiredTiger.concurrentTransactions |
replication lag复制延迟 |
这个指标代表了写操作到达从节点所需要的最小时间,过高的replication lag会减小从节点的价值且不利于配置了写关注w>1的操作(一般30s-1min要发告警) |
db.adminCommand({'replSetGetStatus': 1}) |
oplog window复制时间窗 |
表示oplog可容纳多长时间的写操作,它表示了一个从节点可离线多长时间仍能追上主节点,建议该值>24h为佳 |
db.oplog.rs.find().sort({$nature: -1}).limit(1).next().ts - db.oplog.rs.find().sort({$nature: 1}).limit(1).next().ts |
connections连接数 |
连接数应作为监控指标的一部分,因为每个连接都将消耗资源,应计算低峰|正常|高峰时间的连接数,并制定合理的报警阈值范围 |
db.serverStatus().connections |
Query targeting查询专注度 |
索引键|文档扫描数量比返回的文档数量,按秒平均,如果该值较高表示查询每户需要进行很多低效的扫描来满足查询,这个情况通常代表了索引不当或缺少索引来支持查询 |
比值最好为1,若为100:1或1000000:1则是没有用合适的索引 var status = db.serverStatus() status.metrics.queryExecutor.scanned / status.metrics.documents.returned status.metrics.queryExecutor.scannedObjects / status.metrics.document.returned |
Scan and Order扫描和排序 |
每秒内内存排序操作所占的平均比例,内存排序可能十分昂贵,因为它们通常要求缓冲大量数据,如果有适当索引的情况下,内存排序是可以避免的 |
var status = db.serverStatus() status.metrics.operations.scanAndOrder / status.opcounters.query |
节点状态 |
每个节点的运行状态,如果状态不是PRIMARY|SECONDARY|ARBITER中的一个,或无法执行上述命令则报警 |
db.runCommand("isMaster") |
dataSize数据大小 |
整个实例数据总量(压缩前) |
每个db执行db.stats() |
StorageSize磁盘空间大小 |
已使用的磁盘空间占总空间的百分比 |
|
备份的目的:防止硬件故障引起的数据丢失;防止人为错误误删数据;时间回溯;监管要求;
首先,mongo生产集群已通过复制集的多节点实现;
mongo的备份机制分为:
延迟节点备份(如一个从节点比主节点晚1h,安全范围内的任意时间点状态=延迟从节点当前状态+定量重放oplog);
全量备份+oplog增量,最近的oplog已在oplog.rs集合中,因此可定期从集合中导出便得到了oplog;如果主节点上的oplog.rs集合足够大,全量备份足够密集,自然也可不用备份oplog;只要有覆盖整个时间段的oplog,就可结合全量备份得到任意时间点的备份;
常见的全量备份方式:mongodump|复制数据库文件|文件系统快照;
全量备份+oplog=任意时间点备份恢复(PIT);
复制数据库文件:
必须先关闭节点才能复制,否则复制到的文件无效;
也可选择db.fsyncLock()锁定节点,但完成后不要忘记db.fsyncUnlock()解锁;
可以且应该在从节点上完成;
该方法实际上会暂时宕机一个从节点,所以整个过程中应注意投票节点总数;
文件系统快照:
mongo支持使用文件系统快照直接获取数据文件在某一时刻的镜像;
快照过程中可以不用停机;
数据文件和journal必须在同一个卷上;
快照完成后请尽快复制文件并删除快照;
mongodump:
备份最灵活,但速度上也是最慢的;
出来的数据不能表示某个时间点,只是某个时间段;
用幂等性解决一致性问题;
mongodump参数:
--oplog # 复制开始到结束过程中的所有oplog并输出到结果中,默认在当前路径下dump/oplog.bson
mongorestore参数:
--oplogReplay # 恢复完数据文件后再重放oplog,默认重放dump/oplog.bson,如果oplog不在这,可以--oplogFile指定位置
--oplogLimit # 重放oplog时,截止到指定时间点
for (var i=0;i<100000;i++) {
db.testtb.insertOne({x: Math.random()*100000);
}
mongodump -h 127.0.0.1:27017 --oplog
show tables;
db.testtb.drop()
show tables;
db.testtb.count()
mongorestore -host localhost:27017 --oplogReplay
更复杂的重放oplog:
假设全量备份已经恢复到数据库中(无论使用快照|mongodump|复制数据文件),要重放一部分增量怎么办?
导出主节点上的oplog,mongodump --host 127.0.0.1 -d local -c oplog.rs,可通过--query添加时间范围;
使用bsondump查看导出的oplog,找到需要截止的时间点;
恢复到指定时间点,用--oplogLimit指定恢复到这条记录之前,mongorestore -h 127.0.0.1 --oplogLimit "1577355175:1" --oplogFile dump/local/oplog.rs;
分片集备份:
大致与复制集原理相同,存在以下差异:
应分别为每个片和config备份;
分片集备份不仅要考虑一个分片内的一致性问题,还要考虑分片间的一致性问题,因此每个片要能够恢复到同一个时间点;
分片集的增量备份:
尽管理论上可使用与复制集同样的方式来为分片集完成增量备份,但实际上分片集的情况更加复杂,这种复杂性来自2个方面:
各个数据节点的时间不一致,每个数据节点很难完全恢复到一个真正的一致时间点上,通常只能做到大致一致,而这种大致一致通常足够好,除以下情况;
分片间的数据迁移,当一部分数据从一个片迁移到另一个片时,最终数据到底在哪里取决于config中的元数据,如果元数据与数据节点之间的时间差异正好导致数据实际已经迁移到新分片上,而元数据仍然认为数据在旧分片上,就会导致数据丢失情况发生,虽然这种情况发生的概率很小,但仍有可能导致问题;
要避免上述问题的发生,只有定期停止均衡器,只有在均衡器停止期间,增量恢复才能保证正确;
authentication认证(客户端-服务器认证|内部集群认证);
authorization鉴权(基于用户角色进行权限控制|基于不同数据库操作行为和不同目标作用域设定角色|除系统默认预设的角色,用户还可按业务需求定制特定的角色);
auditing审计(对DDL/DML|对数据库认证授权|对复制集及分片等集群操作|对不同层面的操作设定不同的过滤策略);
encryption加密(基于tls/ssl加密mongo驱动和mongo数据库之间的通信传输|基于tls/ssl加密数据库内部节点之间的通信传输|应用层加密特定字段,基于KMIP|X.509加密存储文件);
认证方式:
用户名+密码,默认,SCRAM-SHA-1哈希算法,用户信息存于mongo本地数据库;
证书方式,X.509标准,服务端需要提供证书文件启动,客户端需要证书文件连接服务端,证书由外部或内部CA颁发;
LDAP外部认证,连接到外部LDAP服务器,企业版功能;
Kerberos外部认证,连接到外部Kerberos服务器,企业版功能;
mongo集群节点认证:
keyfile,将统一keyfile文件(字符串)拷贝到不同的节点上;
X.509,更加安全,基于证书的认证模式,推荐不同的节点使用不同的证书;
鉴权:
mongo授权基于角色的权限控制,不同权限的用户对数据库的操作不同,如DBA角色可创建用户,应用开发者可插入数据,报表开发者可读取数据;
角色的组成,db.getRole('read', {showPrivileges: true});
mongo内置角色及权限继承关系:
自定义角色:
支持按需自定义角色,适合一些高安全要求的业务场景;
db.createRole({
role: 'sampleRole',
privileges: [
{resource: {db: 'sampledb', collection: 'sample'}, actions: ['update']}
],
roles: [
{role: 'read', db: 'sampledb'}
]
})
db.createUser({ # 将角色绑定至用户
user: 'sampleUser',
pwd: 'password',
roles: [{role: 'sampleRole', db: 'admin'}]
})
加密:
mongo支持tls/ssl来加密所有网络传输( 客户端应用和服务器之间 , 内部复制集之间 );
tls/ssl确保mongo网络传输仅可由允许的客户端读取;
落盘加密(企业级):
生成master key,用来加密每一个数据库的key;
生成每一个数据库的key,用来加密各自的数据库;
基于生成的数据库key加密各个数据库中的数据;
key管理(只针对master key,数据库key保存在数据库内部),通过外部服务KMIP;
字段级加密:
单独文档字段通过自身密钥加密;
数据库只看见密文;
优势,便捷(自动及透明),任务分开(简化基于服务的系统步骤,因为没有服务,工程师能够看到纯文本),合规(监管“被遗忘权”),快速(最小性能代偿);
字段级加密查询流程:
在mongo驱动程序客户端完成的(而落盘加密在服务端完成);
审计(企业版):
数据库等记录型系统通常使用审计监控数据库的一些活动,及对一些可疑的操作进行调查;
记录格式,json;
记录方式,本地文本或syslog;
记录内容,scheme change(ddl)|CRUD(DML)|用户认证;
参数:
--auditDestination syslog # 审计日志记录到syslog
--auditDestination file --auditFormat JSON --auditPath /path/to/auditLog.json # 审计日志记录到指定文件
--auditDestination file --auditFormat JSON --auditPath auditLog.json --auditFilter '{atype: {$in: ["createCollection", "dropCollection"]}}'
最佳实践:
启用身份认证(启用访问控制并强制执行身份认证,使用强密码);
权限控制(基于deny all原则,不多给多余权限);
加密和审计(启用传输加密、数据保护和活动审计);
网络加固(内网部署服务器,设置防火墙,操作系统设置);
遵循安全准则(遵守不同行业或地区安全标准合规性要求);
合理配置权限:
创建管理员;
使用复杂密码;
不同用户不同账户;
应用隔离;
最小权限原则;
启用加密:
使用tls作为传输协议;
使用4.2版本的字段级加密对敏感字段加密;
如有需要,使用企业版进行落盘加密;
如有需要,使用企业版并启用审计日志;
网络和OS加固:
使用专用用户运行,不建议在OS层使用root;
限制网络开放度(通过防火墙iptables控制对mongo的访问|使用vpn/VPCs可创建一个安全通道,mongo服务不应直接暴露在公网上|使用白名单列表限制允许访问的网段|使用bind_ip绑定一个具体地址|修改默认端口27017);
使用安全配置选项运行mongo(如不需要执行js脚本使用--noscripting|如是3.2之前老版本关闭http接口和RestAPI接口,net.http.enabled=False,net.http.JSONPEnabled=False,net.http.RESTInterfaceEnabled=False);
启用认证:
--auth # 方式一,命令行,mongod --auth --port 27017 --dbpath /data/db --fork --logpath mongod.log
authorization: enabled # 在security配置文件下添加
#] mongo # 启用鉴权后,无密码可以登录,但只能执行创建用户操作
>use admin
>db.createUser({user: 'superuser', pwd: 'password', roles: [{role: 'root', db: 'admin'}]})
#] mongo -u superuser -p password --authenticationDatabase admin
> show collections
> db.test.insert({x: 123})
> db.createUser({user: 'writer', pwd: 'abc123', roles: [{role: 'readWrite', db: 'acme'}]})
> db.createUser({user: 'reader', pwd: 'abc123', roles: [{role: 'read', db: 'acme'}]})
> db.runCommand({getParameter: 1, authenticationMechanisms: 1})
> db.system.users.find()
index|key|datapage,索引|键|数据页;
covered query|fetch,查询覆盖|抓取,如果所有需要的字段都在索引中,不需要额外的字段,就可以不再需要从数据页加载数据,这就是查询覆盖,db.human.createIndex({firstName: 1, lastName:1, gender:1, age:1});
IXSCAN|COLLSCAN,索引扫描|集合扫描;
Big O Notation,时间复杂度;
query shape,查询的形状,等值或范围值,不同的形状对索引有影响;
index prefix,索引前缀,所有索引前缀都可被该索引覆盖,没有必要针对这些查询建立额外的索引;
selectivity,过滤性,选择能过滤掉最多的数据的字段,如10000条记录的集合中,满足gender=F的记录4000条、满足city=LA的记录100条、满足ln='parker"的记录10条,条件ln能过滤掉最多的数据,city其次,gender最弱,所以ln的过滤性大于city大于gender,如果要查询同时满足gender==F&&city==SZ&&ln='parker'的记录,但只允许为gender|city|ln中的一个建立索引,应把索引放在ln上;
B树结构,索引背后是B-树,要正确使用索引,B-树(基于B树,但子节点数量可超过2个),由于B树|B-树的工作过程过于复杂,但本质上它是一个有序的数据结构,可用数组来理解;
通过.explain(true)来查看查询是否使用了合适的索引,db.col.find({name:'test'}).explain(true);
索引类型:
单键索引;
compound index组合索引,最佳方式ESR原则(equal精确匹配的字段放最前面,sort排序条件放中间,range范围匹配的字段放最后),ES|ER同样适用;
多值索引;
地理位置索引;
全文索引;
TTL索引;
部分索引;
哈希索引;
db.member.createIndex({city: 'SH', {background: true} ) # 后台创建索引
对BI|报表专用节点单独创建索引(该从节点priority设为0;关闭该从节点;以单机模式启动;添加索引(分析用);关闭该从节点以副本集模式启动);
应用端-选择节点:
对于复制集读操作,选择哪个节点由readPreference决定的,primary/primaryPreferred,secondary/secondaryPreferred,nearest;
如果不希望一个远距离节点被选择,应做到如下之一:将它设置为隐藏节点|通过tag控制可选的节点|使用nearest方式;
应用端-排除等待:
如何发生的?总连接数大于允许的最大连接数maxPoolSize;
解决:加大最大连接数(一定有用);优化查询性能;
应用端-连接与认证:
如果一个请求需要等待创建新连接和进行认证,相比直接从连接池获取连接,它将耗费更长时间;
可能解决方案:设置minPoolSize最小连接数,一次性创建足够的连接,避免突发的大量请求;
数据库端-排队等待:
由ticket不足引起的排队等待,问题往往不在ticket本身,而在于为什么正在执行的操作会长时间占用ticket;
可能解决方案:优化CRUD性能可减少ticket占用时间;zlib压缩方式也可引起ticket不足,因为zlib算法本身在进行压缩、解压时需要的时间较长,从而造成长时间的ticket占用;
磁盘速度必须比写入速度要快才能保持缓存水位;
数据库端-合并结果:
从分片集查询的结果集,在mongos端合并后返回应用程序;
如果顺序不重要则不要排序;
尽可能使用带片键的查询条件以减少参与查询的分片数;
网络的考量:
性能瓶颈总结:
应用端 |
服务端 |
网络 |
选择访问入口节点 |
排队等待ticket |
应用/驱动 -- mongos |
等待数据库连接 |
执行请求 |
mongos -- 片 |
创建连接和完成认证 |
合并执行结果 |
|
性能诊断工具:
mongostat,运行状态的工具,dirty超过20%时阻塞新请求,used超过95%时阻塞新请求,qrw排队的请求,conn连接数量;
mongotop,了解集合压力状态的工具,total总时间消耗,read|wiite;
mongod日志,默认>100ms的查询会记录,有COLLSCAN;
mtools,pip install mtools,mplotqueries日志文件(将所有慢查询通过图表形式展现),mloginfo --queries日志文件(总结出所有慢查询的模式和出现次数、消耗时间等);
容灾级别:
|
|
RPO(recovery point objective)恢复点目标 |
RTO(recovery time objective)恢复时间目标 |
L0无备源中心 |
没有灾难恢复能力,只在本地进行数据备份 |
24h |
4h |
L1本地备份+异地保存 |
本地将关键数据备份,然后送到异地保存; 灾难发生后,按预定数据恢复程序恢复系统和数据; |
24h |
8h |
L2双中心主备模式 |
在异地建立一个热备份点,通过网络进行数据备份; 当出现灾难时,备份站点接替主站点的业务,维护业务持续性; |
秒级 |
数分钟到半小时 |
L3双中心双活 |
在相隔较远的地方分别建立两个数据中心,进行相互数据备份; 当某个数据中心发生灾难时,另一个数据中心接替其工作任务; |
秒级 |
秒级 |
L4双中心双活+异地热备=两地三中心 |
在同城分别建立两个数据中心,进行相互数据备份; 当该城市的2个中心同时不可用(地震|大面积停电|网络等),快速切换到异地; |
秒级 |
分钟级 |
GSLB全局LB;
例1:三中心部署:
节点数5个(2+2+1);
主数据中心的2个节点要设置高一点的优先级,减少跨中心换主节点;
同城双中心间的网络要保证低延迟和频宽,满足writeConcern: Majority的双中心写需求;
使用Retryable Writes and Retryable Reads来保证零下线时间;
用户需要自行处理好业务层的双中心切换;
|
异地容灾 |
多中心负载均衡 |
适合场景 |
注意 |
3中心复制集部署 |
Yes |
仅支持读 |
容灾、高可用场景 |
资源利用率不高 |
全球多写集群global cluster |
yes |
支持 |
容灾高可用+双写(提高用户体验) |
多写有限制,需要数据模型支持 |
独立集群双向同步 |
Yes |
支持 |
容灾高可用+双写(提高用户体验) |
第三方工具写冲突处理 |
例2,全球多写;
上线前,性能测试:
模拟真实压力,对集群完成充分的性能测试,了解集群概况;
性能测试的输出(压测过程中各项指标表现,如CRUD达到多少,连接数达到多少等;根据正常指标范围配置监控阈值;根据压测结果按需调整硬件资源);
上线前,环境检查:
根据最佳实践要求对生产环境所使用的OS进行检查和调整,最常见的需要调整的参数包括:
禁用NUMA,否则在某些情况下会引起突发大量swap交换;
禁用Transparent Huge Page,否则会影响数据库效率;
tcp_keepalive_time调整为120s,避免一些网络问题;
ulimit -n避免打开文件句柄不足的情况;
关闭atime,提高数据文件访问效率;
上线后:
性能监控(为防止突发状况,应对常见性能指标进行监控及时发现问题);
定期健康检查(mongod日志|环境设置是否有变动|mongo配置是否有变动);
版本发布规律:
如4.2.1
主版本号大约一年发布一次,如2.0|2.2|2.4|2.6,3.0|3.2|3.4|3.6,4.0|4.2;
小版本号,主版本发布后的bug fix及小功能增强,和主版本通常没有api或不兼容,1-2个月发布1次,小版本通常可直接原地升级;
主版本升级流程:
检查release notes和驱动兼容性,每个版本升级时都有一些注意事项,如涉及到兼容性的改动等等,通过release notes可了解到升级过程将带来什么影响;
升级驱动本地测试,升级驱动程序,搭建新版本mongo,导入测试数据,详尽测试应用;
线上应用升级,应用应先完成驱动升级后的上线,新版本驱动一般会向前兼容;
线上mongo升级,按流程进行,mongo集群升级;
单机升级流程:
停止mongo;
备份数据库目录;
安装新版本mongodb binaries;
启动mongo;
复制集升级:
升级从节点,参考单机步骤;
升级从节点;
主从切换,rs.stepDown()在从上进行(从-->主,主->从);
升级切换后的从节点;
切换FCV(功能兼容性),db.adminCommand({setFeatureCompatibilityVersion: "4.2"})
分片集群升级流程:
禁用均衡器;
升级config,参考复制集;
升级分片,参考复制集;
轮流升级;
启用均衡器;
切换FCV;
在线升级:
升级过程中虽然会发生主从节点切换,存在短时间不可用,但是3.6版本开始支持自动写重试可自动恢复主从切换引起的集群暂时不可写,4.2开始支持的自动读重试则提供了包括主从切换在内的读问题的自动恢复;
升级需要逐版本完成,不可以跳版本,正确3.2-->3.4-->3.6-->4.0-->4.2,错误3.2-->4.2,原因mongo复制集仅仅允许相邻版本共存,有一些升级内部数据格式,如密码加密字段需要在升级过程中由mongo进行转换;
降级:
如果升级失败,则需要降级到原有旧版本,在降级过程中:
滚动降级过程中集群可保持在线,仅在切换节点时会产生一定的不可写时间;
降级前应先去除已经用到的新版本特性,如用到了NumberDecimal则应把所有使用NumberDecimal的文档先去除该字段;
通过设置FCV(feature compatibility version)可在功能上降到与旧版本兼容;
FCV设置完成后再滚动替换为旧版本;
NoSQL,not only sql;
bigdata大数据存储问题:
并行数据库(水平切分;分区查询);
NoSQL数据库管理系统:
非关系模型;
分布式;
不支持ACID数据库设计范式;
简单数据模型;
元数据和数据分离;
弱一致性;
高吞吐量;
高水平扩展能力和低端硬件集群;
代表(clustrix;GenieDB;ScaleBase;NimbusDB;Drizzle);
云数据管理(DBaaS);
大数据的分析处理(MapReduce);
CAP:
ACID vs BASE:
atomicity basically available
consistency soft state
isolation eventually
durability consistent
注:
最终一致性细分:
因果一致性;
读自己写一致性;
会话一致性;
单调读一致性;
时间轴一致性;
注:
ACID(强一致性;隔离性;采用悲观保守的方法,难以变化);
BASE(弱一致性;可用性优先;采用乐观的方法,适用变化,更简单、更快);
数据一致性的实现技术:
NRW策略(
N,number,所有副本数;
R,read,完成r副本数;
W,write,完成w副本数;
R+W>N(强一致);
R+W<N(弱一致);
);
2PC;
PaxOS;
Vector clock(向量时钟);
http://www.nosql-database.org/
数据存储模型(各流派根据此模型来划分):
列式存储模型(
应用场景:在分布式FS之上,支持随机rw的分布式数据存储;
典型产品:HBase、Hypertable、Cassandra;
数据模型:以列为中心存储,将同一列数据存储在一起;
优点:快速查询,高可扩展性,易于实现分布式扩展;
);
文档模型(
应用场景:非强事务要求的web应用;
典型产品:MongoDB、ElasticSearch、CouchDB、CouchDB Server);
数据模型:键值模型,存储为文档;
优点:数据模型无须事先定义;
);
键值模型(
应用场景:内容缓存,用于大量并行数据访问的高负载场景;
典型产品:redis、DynamoDB(amazon)、Azure、Table Storage、Riak;
数据模型:基于hash表实现的key-value;
优点:查询迅速;
);
图式模型(
应用场景:社交网络、推荐系统、关系图谱;
典型产品:Neo4j、Infinite Graph;
数据模型:图式结构;
优点:适用于图式计算场景;
);
introduction mongodb:
MongoDB(from humongous) is a scalable,high-performance, open source, schema free , document nosql oriented database;
what is MongoDB?
humongous(huge+monstrous);
document database(document oriented databaase: uses JSON(BSON actually);
schema free;
C++;
open source;
GNU AGPL V3.0 License;
OSX,Linux,windows,solaris|32bi,64bit;
development and support by 10gen and was first released in February 2009;
NoSQL;
performance(written C++,full index support, no transactions(has atomic operations)支持原子事务, memory-mapped files(delayed writes));
saclable(replication,auto-sharding);
commercially supported(10gen): lots of documentation;
document-based queries:
flexible document queries expressed in JSON/Javascript;
Map/Reduce:(flexible aggregation and data processing; queries run in parallel on all shards);
gridFS(store files of any size easily);
geospatial indexing(find object based on location(i.e. find closes n items to x));
many production deloyments;
features:
collection oriented storage: easy storage of object/JSON-style data;
dynamic queries;
full index support,including on inner objects and embedded arrays;
query profiling;
replication and fail-over support;
efficient storage of binary data including large objects(e.g. photos and videos);
auto-sharding for cloud-level scalability(currently in alpha);
great for(websites; caching; high volument; low value; high scalability; storage of program objects and json);
not as great for(highly transactional; Ad-hoc business intelligence; problems requiring SQL);
注:
JSON,java script object notation,名称/值对象的集合;值的有序列表;
DDL;
DML
https://www.mongodb.com/
http://downloads-distro.mongodb.org/repo/redhat/os/x86_64/RPMS/
mongodb-org-mongos-2.6.4-1.x86_64.rpm
mongodb-org-server-2.6.4-1.x86_64.rpm
mongodb-org-shell-2.6.4-1.x86_64.rpm
mongodb-org-tools-2.6.4-1.x86_64.rpm
]# yum -y install mongodb-org-server-2.6.4-1.x86_64.rpm mongodb-org-shell-2.6.4-1.x86_64.rpm mongodb-org-tools-2.6.4-1.x86_64.rpm
]# rpm -qi mongodb-org-server ( This package contains the MongoDB server software, default configuration files, and init.d scripts. )
<!-- Description :
MongoDB is built for scalability, performance and high availability, scaling from single server deployments to large, complex multi-site architectures. By leveraging in-memory computing, MongoDB provides high performance for both reads and writes. MongoDB’s native replication and automated failover enable enterprise-grade reliability and operational flexibility.
MongoDB is an open-source database used by companies of all sizes, across all industries and for a wide variety of applications. It is an agile database that allows schemas to change quickly as applications evolve, while still providing the functionality developers expect from traditional databases, such as secondary indexes, a full query language and strict consistency.
MongoDB has a rich client ecosystem including hadoop integration, officially supported drivers for 10 programming languages and environments, as well as 40 drivers supported by the user community.
MongoDB features:
* JSON Data Model with Dynamic Schemas
* Auto-Sharding for Horizontal Scalability
* Built-In Replication for High Availability
* Rich Secondary Indexes, including geospatial
* TTL indexes
* Text Search
* Aggregation Framework & Native MapReduce
-->
]# rpm -qi mongodb-org-shell ( This package contains the mongo shell. )
]# rpm -qi mongodb-org-tools ( This package contains standard utilities for interacting with MongoDB. )
]# mkdir -pv /mongodb/data
mkdir: created directory `/mongodb'
mkdir: created directory `/mongodb/data'
]# id mongod
uid=497(mongod) gid=497(mongod) groups=497(mongod)
]# chown -R mongod.mongod /mongodb/
]# vim /etc/mongod.conf
#dbpath=/var/lib/mongo
dbpath=/mongodb/data
bind_ip=172.168.101.239
#httpinterface=true
httpinterface=true #28017
]# /etc/init.d/mongod start
Starting mongod: /usr/bin/dirname: extra operand `2>&1.pid'
Try `/usr/bin/dirname --help' for more information.
[ OK ]
]# vim /etc/init.d/mongod (解决2>&1.pid问题)
#daemon --user "$MONGO_USER" "$NUMACTL $mongod $OPTIONS >/dev/null 2>&1"
daemon --user "$MONGO_USER" "$NUMACTL $mongod $OPTIONS >/dev/null"
]# /etc/init.d/mongod restart
Stopping mongod: [ OK ]
Starting mongod: [ OK ]
]# ss -tnl | egrep "27017|28017"
LISTEN 0 128 127.0.0.1:27017 *:*
LISTEN 0 128 127.0.0.1:28017 *:*
]# du -sh /mongodb/data
3.1G /mongodb/data
]# ll -h !$
ll -h /mongodb/data
total 81M
drwxr-xr-x 2 mongod mongod 4.0K Jun 29 16:23 journal
-rw------- 1 mongod mongod 64M Jun 29 16:23 local.0
-rw------- 1 mongod mongod 16M Jun 29 16:23 local.ns
-rwxr-xr-x 1 mongod mongod 6 Jun 29 16:23 mongod.lock
]# mongo --host 172.168.101.239 (默认在test库)
MongoDB shell version: 2.6.4
connecting to: 172.168.101.239:27017/test
Welcome to the MongoDB shell.
……
> show dbs
admin (empty)
local 0.078GB
> show collections
> show users
> show logs
global
> help
db.help() help on db methods
db.mycoll.help() help on collection methods
sh.help() sharding helpers
rs.help() replica set helpers
……
> db.help()
……
> db.stats() (数据库状态)
{
"db" : "test",
"collections" : 0,
"objects" : 0,
"avgObjSize" : 0,
"dataSize" : 0,
"storageSize" : 0,
"numExtents" : 0,
"indexes" : 0,
"indexSize" : 0,
"fileSize" : 0,
"dataFileVersion" : {
},
"ok" : 1
}
> db.version()
2.6.4
> db.serverStatus() (mongodb数据库服务器状态)
……
> db.getCollectionNames()
[ ]
> db.mycoll.help()
……
> use testdb
switched to db testdb
> show dbs
admin (empty)
local 0.078GB
test (empty)
> db.stats()
{
"db" : "testdb",
"collections" : 0,
"objects" : 0,
"avgObjSize" : 0,
"dataSize" : 0,
"storageSize" : 0,
"numExtents" : 0,
"indexes" : 0,
"indexSize" : 0,
"fileSize" : 0,
"dataFileVersion" : {
},
"ok" : 1
}
> db.students.insert({name: "tom",age: 22})
WriteResult({ "nInserted" : 1 })
> show collections
students
system.indexes
> show dbs
admin (empty)
local 0.078GB
test (empty)
testdb 0.078GB
> db.students.stats()
{
"ns" : "testdb.students",
"count" : 1,
"size" : 112,
"avgObjSize" : 112,
"storageSize" : 8192,
"numExtents" : 1,
"nindexes" : 1,
"lastExtentSize" : 8192,
"paddingFactor" : 1,
"systemFlags" : 1,
"userFlags" : 1,
"totalIndexSize" : 8176,
"indexSizes" : {
"_id_" : 8176
},
"ok" : 1
}
> db.getCollectionNames()
[ "students", "system.indexes" ]
> db.getCollectionNames()
[ "students", "system.indexes" ]
> db.students.insert({name: "jerry",age: 40,gender: "M"})
WriteResult({ "nInserted" : 1 })
> db.students.insert({name: "ouyang",age:80,course: "hamagong"})
WriteResult({ "nInserted" : 1 })
> db.students.insert({name: "yangguo",age: 20,course: "meivnquan"})
WriteResult({ "nInserted" : 1 })
> db.students.insert({name: "guojing",age: 30,course: "xianglong"})
WriteResult({ "nInserted" : 1 })
> db.students.find()
{ "_id" : ObjectId("5955b505853f9091c4bf4056"), "name" : "tom", "age" : 22 }
{ "_id" : ObjectId("595b3816853f9091c4bf4057"), "name" : "jerry", "age" : 40, "gender" : "M" }
{ "_id" : ObjectId("595b3835853f9091c4bf4058"), "name" : "ouyang", "age" : 80, "course" : "hamagong" }
{ "_id" : ObjectId("595b384f853f9091c4bf4059"), "name" : "yangguo", "age" : 20, "course" : "meivnquan" }
{ "_id" : ObjectId("595b3873853f9091c4bf405a"), "name" : "guojing", "age" : 30, "course" : "xianglong" }
> db.students.count()
5
mongodb documents:
mongodb stores data in the form of documents,which are JSON-like field and value pairs;
documents are analogous to structures in programming languages that associate keys with values,where keys may hold other pairs of keys and values(e.g. dictionaries,hashes,maps,and associative arrays);
formally,mongodb documents are BSON documents,which is a binary representation of JSON with additional type information;
field:value;
document:
stored in collection,think record or row;
can have _id key that works like primary key in MySQL;
tow options for relationships: subdocument or db reference;
mongodb collections:
mongodb stores all documents in collections;
a collection is a group of related documents that have a set of shared common indexes;
collections are analogous to a table in relational databases;
collection:
think table,but with no schema;
for grouping into smaller query sets(speed);
each top entity in your app would have its own collection(users,articles,etc.);
full index support;
database operations: query
in mongodb a query targets a specific collection of documents;
queries specify criteria,or conditions,that identify the documents that mongodb returns to the clients;
a query may include a projection that specifies the fields from the matchina documents to return;
you can optionally modify queries to impose limits,skips,and sort orders;
如:
db.users.find({age: {$gt: 18}}).sort({age: 1}})
db.COLLECTION.find({FIELD: {QUERY CRITERIA}}).MODIFIER
query interface:
for query operations,mongodb provide a db.collection.find() method;
the method accepts both the query criteria and projections and returns a cursor to the matching documents
query behavior:
all queries in mongodb address a single collection;
you can modify the query to impose limits,skips,and sort orders;
the order of documents returned by a query is not defined and is not necessarily consistent unless you specify a sort();
operations that modify existing documents(i.e. updates) use the same query syntax as queries to select documents to update;
in aggregation pipeline,the $match pipeline stage provides access to mongodb queries;
data modification:
data modification refers to operations that create,update,or delete data;
in mongodb,these operations modify the data of a single collection;
all write operations in mongodb are atomic on the level of a single document;
for the update and delete operations,you can specify the criteria to select the documents to update or remove;
RDBMS vs mongodb
table,view collection
row json document
index index
join embedded
partition shard
partition key shard key
db.collection.find(<query>,<projection>)
类似于sql中的select语句,其中<query>相当于where子句,而<projection>相当于要选定的字段;
如果使用的find()方法中不包含<query>,则意味着要返回对应collection的所有文档;
db.collection.count()方法可统计指定collection中文档的个数;
举例:
>use testdb
> for (i=1;i<=100;i++)
{ db.testColl.insert({hostname: "www"+i+".magedu.com"})}
>db.testColl.find({hostname: {$gt: "www95.magedu.com"}})
find的高级用法:
mongodb的查询操作支持挑选机制有comparison,logical,element,javascript等几类;
比较运算comparison:
$gt;$gte;$lt;$lte;$ne,语法格式:{field: {$gt: VALUE}}
举例:
> db.students.find({age: {$gt: 30}})
{ "_id" : ObjectId("595b3816853f9091c4bf4057"), "name" : "jerry", "age" : 40, "gender" : "M" }
{ "_id" : ObjectId("595b3835853f9091c4bf4058"), "name" : "ouyang", "age" : 80, "course" : "hamagong" }
$in;$nin,语法格式:{field: {$in: [VALUE1,VALUE2]}}
举例:
> db.students.find({age: {$in: [20,80]}})
{ "_id" : ObjectId("595b3835853f9091c4bf4058"), "name" : "ouyang", "age" : 80, "course" : "hamagong" }
{ "_id" : ObjectId("595b384f853f9091c4bf4059"), "name" : "yangguo", "age" : 20, "course" : "meivnquan" }
> db.students.find({age: {$in: [20,80]}}).limit(1)
{ "_id" : ObjectId("595b3835853f9091c4bf4058"), "name" : "ouyang", "age" : 80, "course" : "hamagong" }
> db.students.find({age: {$in: [20,80]}}).count()
2
> db.students.find({age: {$in: [20,80]}}).skip(1)
{ "_id" : ObjectId("595b384f853f9091c4bf4059"), "name" : "yangguo", "age" : 20, "course" : "meivnquan" }
组合条件:
逻辑运算:$or或运算;$and与运算;$not非运算;$nor反运算,语法格式:{$or: [{<expression1>},{expression2}]}
举例:
> db.students.find({$or: [{age: {$nin: [20,40]}},{age: {$nin: [22,30]}}]})
{ "_id" : ObjectId("5955b505853f9091c4bf4056"), "name" : "tom", "age" : 22 }
{ "_id" : ObjectId("595b3816853f9091c4bf4057"), "name" : "jerry", "age" : 40, "gender" : "M" }
{ "_id" : ObjectId("595b3835853f9091c4bf4058"), "name" : "ouyang", "age" : 80, "course" : "hamagong" }
{ "_id" : ObjectId("595b384f853f9091c4bf4059"), "name" : "yangguo", "age" : 20, "course" : "meivnquan" }
{ "_id" : ObjectId("595b3873853f9091c4bf405a"), "name" : "guojing", "age" : 30, "course" : "xianglong" }
> db.students.find({$and: [{age: {$nin: [20,40]}},{age: {$nin: [22,30]}}]})
{ "_id" : ObjectId("595b3835853f9091c4bf4058"), "name" : "ouyang", "age" : 80, "course" : "hamagong" }
元素查询:
根据文档中是否存在指定的字段进行的查询,$exists;$mod;$type;
$exists,根据指定字段的存在性挑选文档,语法格式:{field: {$exists: <boolean>}},true则返回存在指定字段的文档,false则返回不存在指定字段的文档
$mod,将指定字段的值进行取模运算,并返回其余数为指定值的文档,语法格式:{field: {$mod: [divisor,remainder]}}
$type返回指定字段的值类型为指定类型的文档,语法格式:{field: {$type: <BSON type>}}
注:double,string,object,array,binary data,undefined,boolean,date,null,regular expression,javascript,timestamp
更新操作:
update()方法可用于更改collection中的数据,默认情况下,update()只更新单个文档,若要一次更新所有符合指定条件的文档,则要使用multi选项;
语法格式:db.collection.update(<query>,<update>,<options>),其中query类似于sql语句中的where,而<update>相当于附带了LIMIT 1的SET,如果在<options>处提供multi选项,则update语句类似于不带LIMIT语句的update();
<update>参数的使用格式比较独特,其仅能包含使用update专有操作符来构建的表达式,其专有操作符大致包含field,array,bitwise,其中filed类常用的操作有:
$set,修改字段的值为新指定的值,语法格式:({field: value},{$set: {field: new_value}})
$unset,删除指定字段,语法格式:({field: value},{$unset: {field1,field2,...}})
$rename,更改字段名,语法格式:($rename: {oldname: newname,oldname: newname})
$inc,增大指定字段的值,语法格式:({field: value},{$inc: {field1: amount}}),其中{field: value}用于指定挑选标准,{$inc: {field1: amount}用于指定要提升其值的字段及提升大小
删除操作:
> db.students.remove({name: "tom"}) ( db.mycoll.remove(query) )
WriteResult({ "nRemoved" : 1 })
> db.collections
testdb.collections
> show dbs
admin (empty)
local 0.078GB
test (empty)
testdb 0.078GB
> db.students.drop() ( db.mycoll.drop() ,drop the collection)
true
> db.dropDatabase() (删除database)
{ "dropped" : "testdb", "ok" : 1 }
> show dbs
admin (empty)
local 0.078GB
test (empty)
> quit()
索引:
索引类型:
B+Tree;
hash;
空间索引;
全文索引;
indexes are special data structures that store a small portion of the collection's set in an easy to traverse form: the index stores the value of a specific field or set of fields,ordered by the value of the field;
mongodb defines indexes at the collection level and supports indexes on any field or sub-field of the documents in a mongodb collection;
mongodb的索引类型:
single field indexes单字段索引:
a single field index only includes data from a single field of the documents in a collection;
mongodb supports single field indexes on fields at the top level of a document and on fields in sub-documents;
compound indexes组合索引:
a compound index includes more than on field of the documents in a collection;
multikey indexes多键索引:
a multikey index references an array and records a match if a query includes any value in the array;
geospatial indexes and queries空间索引;
geospatial indexes support location-based searches on data that is stored as either geoJSON objects or legacy coordinate pairs;
text indexes文本索引:
text indexes supports search of string content in documents;
hashed indexes,hash索引:
hashed indexes maintain entries with hashes of the values of the indexed field;
index creation:
mongodb provides several options that only affect the creation of the index;
specify these options in a document as the second argument to the db.collection.ensureIndex() method:
unique index: db.addresses.ensureIndex({"user_id": 1},{unique: true})
sparse index: db.addresses.ensureIndex({"xmpp_id": 1},{sparse: true})
mongodb与索引相关的方法:
db.mycoll.ensureIndex(field[,options]) #创建索引options有name,unique,dropDups,sparse(稀疏索引)
db.mycoll.dropIndex(INDEX_NAME) #删除索引
db.mycoll.dropIndexes()
db.mycoll.getIndexes()
db.mycoll.reIndex() #重建索引
> show dbs;
admin (empty)
local 0.078GB
test (empty)
> use testdb;
switched to db testdb
> for (i=1;i<=1000;i++){db.students.insert({name: "student"+i,age: (i%120),address: "#85 LeAi Road,ShangHai,China"})}
WriteResult({ "nInserted" : 1 })
> db.students.count()
1000
> db.students.find() (每次显示20行,用it继续查看后20行)
Type "it" for more
> db.students.ensureIndex({name: 1})
{
"createdCollectionAutomatically" : false,
"numIndexesBefore" : 1,
"numIndexesAfter" : 2,
"ok" : 1
}
> db.studens.getIndexes() 当前collect的index个数,索引名为name_1
[
{
"v" : 1,
"key" : {
"_id" : 1
},
"name" : "_id_",
"ns" : "test.students"
},
{
"v" : 1,
"key" : {
"name" : 1
},
"name" : "name_1",
"ns" : "test.students"
}
]
> db.students.dropIndex("name_1") 删除索引,索引名要用引号
{ "nIndexesWas" : 2, "ok" : 1 }
> db.students.ensureIndex({name: 1},{unique: true}) 唯一键索引
{
"createdCollectionAutomatically" : false,
"numIndexesBefore" : 1,
"numIndexesAfter" : 2,
"ok" : 1
}
> db.students.getIndexes()
[
{
"v" : 1,
"key" : {
"_id" : 1
},
"name" : "_id_",
"ns" : "test.students"
},
{
"v" : 1,
"unique" : true,
"key" : {
"name" : 1
},
"name" : "name_1",
"ns" : "test.students"
}
]
> db.students.insert({name: "student20",age: 20}) 在唯一键索引的基础上,再次插入相同的field时会报错
WriteResult({
"nInserted" : 0,
"writeError" : {
"code" : 11000,
"errmsg" : "insertDocument :: caused by :: 11000 E11000 duplicate key error index: test.students.$name_1 dup key: { : \"student20\" }"
}
})
> db.students.find({name: "student500"})
{ "_id" : ObjectId("595c3a9bcf42a9afdbede7c0"), "name" : "student500", "age" : 20, "address" : "#85 LeAi Road,ShangHai,China" }
> db.students.find({name: "student500"}).explain() 显示执行过程
{
"cursor" : "BtreeCursor name_1",
"isMultiKey" : false,
"n" : 1,
"nscannedObjects" : 1,
"nscanned" : 1,
"nscannedObjectsAllPlans" : 1,
"nscannedAllPlans" : 1,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 0,
"indexBounds" : {
"name" : [
[
"student500",
"student500"
]
]
},
"server" : "ebill-redis.novalocal:27017",
"filterSet" : false
}
> db.students.find({name: {$gt: "student500"}}).explain()
{
"cursor" : "BtreeCursor name_1",
"isMultiKey" : false,
"n" : 552,
"nscannedObjects" : 552,
"nscanned" : 552,
"nscannedObjectsAllPlans" : 552,
"nscannedAllPlans" : 552,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 4,
"nChunkSkips" : 0,
"millis" : 1,
"indexBounds" : {
"name" : [
[
"student500",
{
}
]
]
},
"server" : "ebill-redis.novalocal:27017",
"filterSet" : false
}
> db.getCollectionNames()
[ "students", "system.indexes" ]
]# mongod -h 分 General option s;Replication options;Master/slave options;Replica set options;Sharding options;
--bind_ip arg comma separated list of ip addresses to listen on
- all local ips by default
--port arg specify port number - 27017 by default
--maxConns arg max number of simultaneous connections - 1000000
by default ,并发最大连接数
--logpath arg log file to send write to instead of stdout - has
to be a file, not directory
--httpinterface enable http interface ,web页面监控mongodb统计信息
--fork fork server process ,mongod是否运行在后台
--auth run with security ,使用认证才能访问>db.createUser(Userdocument),>db.auth(user,password)
--repair run repair on all dbs ,上次未正常关闭,用于再启动时修复
--journal enable journaling ,启用事务日志,先将日志顺序写到事务日志再每隔一段时间同步到数据日志文件,将随机IO-->顺序IO提高性能,单实例的mongodb此选项一定要打开,这日保证数据日持久的
--journalOptions arg journal diagnostic options
--journalCommitInterval arg how often to group/batch commit (ms )
调试选项:
--cpu periodically show cpu and iowait utilization
--sysinfo print some diagnostic system information
--slowms arg (=100) value of slow for profile and console log ,超过此值的查询都日慢查询
--profile arg 0=off 1=slow, 2=all ,性能剖析
Replication options:
--oplogSize arg size to use (in MB) for replication op log. default is
5% of disk space (i.e. large is good)
Master/slave options (old; use replica sets instead):
--master master mode
--slave slave mode
--source arg when slave: specify master as <server:port>
--only arg when slave: specify a single database to replicate
--slavedelay arg specify delay (in seconds) to be used when applying
master ops to slave
--autoresync automatically resync if slave data is stale
Replica set options:
--replSet arg arg is <setname>[/<optionalseedhostlist>]
--replIndexPrefetch arg specify index prefetching behavior (if secondary)
[none|_id_only|all]
Sharding options:
--configsvr declare this is a config db of a cluster; default port
27019; default dir /data/configdb
--shardsvr declare this is a shard db of a cluster; default port
27018
mongodb的复制:
两种类型(master/slave;replica sets):
master/slave,与mysql近似,此方式已很少用;
replica sets:
一般为奇数个节点(至少3个节点),可使用arbiter参与选举;
heartbeat2S,通过选举方式实现自动失效转移;
复制集(副本集),可实现自动故障转移,易于实现更高级的部署操作,服务于同一数据集的多个mongodb实例,一个复制集只能有一个主节点(rw)其它都是从节点(r),主节点将数据修改操作保存在oplog中;
arbiter仲裁者,一般至少3个节点,一主两从,从每隔2s与主节点传递一次heartbeat信息,若从在10s内监测不到主节点心跳,会自动在两从节点上重新选举一个为主节点;
replica sets中的节点分类:
0优先级的节点,不会被选举为主节点,但可参与选举过程,冷备节点,可被client读);
被隐藏的从节点,首先是一个0优先级的从节点,且对client不可见;
延迟复制的从节点,首先是一个0优先级的从节点,且复制时间落后于主节点一个固定时长;
arbiter仲裁节点;
mongodb的复制架构:
oplog,大小固定的文件,存储在local数据库中,复制集中,仅主可在本地写oplog,即使从有oplog也不会被使用,只有在从被选举为主时才可在本地写自己的oplog,mysql的二进制日志随着使用会越来越大,而oplog大小固定,在首次启动mongodb时会划分oplog大小(oplog.rs),一般为disk空间的5%,若5%的结果小于1G,则至少被指定为1G;
从节点有如下类型对应主节点的操作:
initial sync初始同步;
post-rollback catch-up回滚后追赶;
sharding chunk migrations切分块迁移;
注:
local库存放了复制集的所有元数据和oplog,local库不参与复制过程,除此之外其它库都被复制到从节点,用于存储oplog的是一个名为oplog.rs的collection,当一个实例加入某复制集后,在第一次启动时被自动创建oplog.rs,oplog.rs的大小依赖于OS及FS,可用选项--oplogSize自定义大小;
> show dbs
admin (empty)
local 0.078GB
test 0.078GB
testdb (empty)
> use local
switched to db local
> show collections
startup_log
system.indexes
mongodb的数据同步类型:
initial sync初始同步,节点没有任何数据,节点丢失副本,复制历史,初始同步的步骤:克隆所有数据库、应用数据集的所有改变(复制oplog并应用在本地)、为所有collection构建索引;
复制,主上写一点会同步到从;
注:
一致性(仅主可写);
--journal,保证数据持久性,单实例的mongodb一定要打开;
支持多线程复制,对于单个collection仅可单线程复制(在同一个namespace内依然是单线程);
可实现索引预取(数据预取),提高性能;
purpose of replication:
replication provides redundancy and increases data availability;
with multiple copies of data on different database servers,replication protects a database from the loss of a single server;
replication also allows you to recover from hardware failure and service interruptions;
with additional copies of the data,you can dedicate one to disaster recovery,reporting,or backup;
replication in mongodb:
a replica set is a group of mongod instances that host the same data set:
one mongod,the primary,receives all write operations ;
all other instances,secondaries,apply operations from the primary so that they have the same data set;
the primary accepts all write operations from clients,replica set can have only one primary:
because only one member can accept write opertations,replica sets provide strict consistency;
to support replication,the primary logs all changes to its data sets in its oplog;
the secondaries replicate the primary's oplog and apply the operations to their data sets;
secondaries data sets reflect the primary's data set;
if the primary is unavailable,the replica set will elect a secondary to be primary;
by default,clients read from the primary,however,clients can specify a read preferences to send read operations to secondaries;
arbiter:
you may add an extra mongod instance a replica set as an arbiter;
arbiters do not maintain a data set,arbiters only exist to vote in elections;
if your replica set has an even number of members,add an arbiter to obtain a majority of votes in an election for primary;
arbiters do not require dedicated hardware;
automatic failover:
when a primary does not communicate with the other members of the set for more than 10 seconds,the replica set will attempt to select another member to become the new primary;
the first secondary that receives a majority of the votes becomes primary;
priority 0 replica set members:
a priority 0 member is a secondary that cannot become primary;
priority 0 members cannot trigger elections:
otherwise these members function as normal secondaries;
a priority 0 member maintains a copy of the data set,accepts read opertaions,and votes in elections;
configure a priority 0 member to prevent secondaries from becoming primary,which is particularyly useful in multi-data center deployments;
in a three-member replica set, in one data center hosts the primary and a secondary;
a second data center hosts one priority 0 member that cannot become primary;
复制集重新选举的影响条件:
heartbeat;
priority(优先级,默认为1,0优先级的节点不能触发选举操作);
optime;
网络连接;
网络分区;
选举机制:
触发选举的事件:
新复制集初始化时;
从节点联系不到主节点时;
主节点下台(不可用)时,>rs.stepdown(),某从节点有更高的优先级且已满足成为主节点其它所有条件,主节点无法联系到副本集的多数方;
操作:
172.168.101.239 #master
172.168.101.233 #secondary
172.168.101.221 #secondary
]# vim /etc/mongod.conf (三个节点都用此配置)
fork=true
dbpath=/mongodb/data
bind_ip=172.168.101.239
# in replicated mongo databases, specify the replica set name here
#replSet=setname
replSet=testSet (复制集名称,集群名称)
replIndexPrefetch=_id_only ( specify index prefetching behavior (if secondary) [none|_id_only|all] ,副本索引预取,使复制更高效,仅在secondary上有效)
]# mongo --host 172.168.101.239 (在primary上操作)
> show dbs
admin (empty)
local 0.078GB
test 0.078GB
> use local
switched to db local
> show collections
startup_log
system.indexes
> rs.status()
{
"startupStatus" : 3,
"info" : "run rs.initiate(...) if not yet done for the set",
"ok" : 0,
"errmsg" : "can't get local.system.replset config from self or any seed (EMPTYCONFIG)"
}
> rs.initiate()
{
"info2" : "no configuration explicitly specified -- making one",
"me" : "172.168.101.239:27017",
"info" : "Config now saved locally. Should come online in about a minute.",
"ok" : 1
}
testSet:PRIMARY> rs.add("172.168.101.233:27017")
{ "ok" : 1 }
testSet:PRIMARY> rs.add("172.168.101.221:27017") #(另可用 rs.addArb(hostportstr) 添加仲裁节点)
{ "ok" : 1 }
testSet:PRIMARY> rs.status()
{
"set" : "testSet",
"date" : ISODate("2017-07-05T07:22:59Z"),
"myState" : 1,
"members" : [
{
"_id" : 0,
"name" : "172.168.101.239:27017",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 970,
"optime" : Timestamp(1499238843, 1),
"optimeDate" : ISODate("2017-07-05T07:14:03Z"),
"electionTime" : Timestamp(1499238673, 2),
"electionDate" : ISODate("2017-07-05T07:11:13Z"),
"self" : true
},
{
"_id" : 1,
"name" : "172.168.101.233:27017",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 557,
"optime" : Timestamp(1499238843, 1),
"optimeDate" : ISODate("2017-07-05T07:14:03Z"),
"lastHeartbeat" : ISODate("2017-07-05T07:22:57Z"),
"lastHeartbeatRecv" : ISODate("2017-07-05T07:22:57Z"),
"pingMs" : 1,
"syncingTo" : "172.168.101.239:27017"
},
{
"_id" : 2,
"name" : "172.168.101.221:27017",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 536,
"optime" : Timestamp(1499238843, 1),
"optimeDate" : ISODate("2017-07-05T07:14:03Z"),
"lastHeartbeat" : ISODate("2017-07-05T07:22:57Z"),
"lastHeartbeatRecv" : ISODate("2017-07-05T07:22:57Z"),
"pingMs" : 0,
"syncingTo" : "172.168.101.239:27017"
}
],
"ok" : 1
}
testSet:PRIMARY> use testdb
switched to db testdb
testSet:PRIMARY> db.students.insert({name: "tom",age: 22,course: "hehe"})
WriteResult({ "nInserted" : 1 })
]# mongo --host 172.168.101.233 (在secondary上操作)
testSet:SECONDARY> rs.slaveOk() (执行此操作后才可查询)
testSet:SECONDARY> rs.status()
{
"set" : "testSet",
"date" : ISODate("2017-07-05T07:24:54Z"),
"myState" : 2,
"syncingTo" : "172.168.101.239:27017",
"members" : [
{
"_id" : 0,
"name" : "172.168.101.239:27017",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 669,
"optime" : Timestamp(1499238843, 1),
"optimeDate" : ISODate("2017-07-05T07:14:03Z"),
"lastHeartbeat" : ISODate("2017-07-05T07:24:53Z"),
"lastHeartbeatRecv" : ISODate("2017-07-05T07:24:53Z"),
"pingMs" : 0,
"electionTime" : Timestamp(1499238673, 2),
"electionDate" : ISODate("2017-07-05T07:11:13Z")
},
{
"_id" : 1,
"name" : "172.168.101.233:27017",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 1292,
"optime" : Timestamp(1499238843, 1),
"optimeDate" : ISODate("2017-07-05T07:14:03Z"),
"self" : true
},
{
"_id" : 2,
"name" : "172.168.101.221:27017",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 649,
"optime" : Timestamp(1499238843, 1),
"optimeDate" : ISODate("2017-07-05T07:14:03Z"),
"lastHeartbeat" : ISODate("2017-07-05T07:24:53Z"),
"lastHeartbeatRecv" : ISODate("2017-07-05T07:24:53Z"),
"pingMs" : 5,
"syncingTo" : "172.168.101.239:27017"
}
],
"ok" : 1
}
testSet:SECONDARY> use testdb
switched to db testdb
testSet:SECONDARY> db.students.find()
{ "_id" : ObjectId("595c95095800d50614819643"), "name" : "tom", "age" : 22, "course" : "hehe" }
]# mongo --host 172.168.101.221 (在另一secondary上操作)
testSet:SECONDARY> use testdb
switched to db testdb
testSet:SECONDARY> db.students.find() (未先执行rs.slaveOk())
error: { "$err" : "not master and slaveOk=false", "code" : 13435 }
testSet:SECONDARY> rs.slaveOk()
testSet:SECONDARY> use testdb
switched to db testdb
testSet:SECONDARY> db.students.find()
{ "_id" : ObjectId("595c95095800d50614819643"), "name" : "tom", "age" : 22, "course" : "hehe" }
testSet:SECONDARY> db.students.insert({name: jowin,age: 30,course: "mage"}) (只可在primary上写)
2017-07-05T15:37:05.069+0800 ReferenceError: jowin is not defined
testSet:PRIMARY> rs.stepDown() (在primary上操作, step down as primary (momentarily) (disconnects) ,模拟让主下线)
2017-07-05T15:38:46.819+0800 DBClientCursor::init call() failed
2017-07-05T15:38:46.820+0800 Error: error doing query: failed at src/mongo/shell/query.js:81
2017-07-05T15:38:46.821+0800 trying reconnect to 172.168.101.239:27017 (172.168.101.239) failed
2017-07-05T15:38:46.822+0800 reconnect 172.168.101.239:27017 (172.168.101.239) ok
testSet:SECONDARY> rs.status()
{
"set" : "testSet",
"date" : ISODate("2017-07-05T07:39:46Z"),
"myState" : 2,
"syncingTo" : "172.168.101.221:27017",
"members" : [
{
"_id" : 0,
"name" : "172.168.101.239:27017",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 1977,
"optime" : Timestamp(1499239689, 1),
"optimeDate" : ISODate("2017-07-05T07:28:09Z"),
"infoMessage" : "syncing to: 172.168.101.221:27017",
"self" : true
},
{
"_id" : 1,
"name" : "172.168.101.233:27017",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 1564,
"optime" : Timestamp(1499239689, 1),
"optimeDate" : ISODate("2017-07-05T07:28:09Z"),
"lastHeartbeat" : ISODate("2017-07-05T07:39:45Z"),
"lastHeartbeatRecv" : ISODate("2017-07-05T07:39:45Z"),
"pingMs" : 0,
"lastHeartbeatMessage" : "syncing to: 172.168.101.239:27017",
"syncingTo" : "172.168.101.239:27017"
},
{
"_id" : 2,
"name" : "172.168.101.221:27017",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 1543,
"optime" : Timestamp(1499239689, 1),
"optimeDate" : ISODate("2017-07-05T07:28:09Z"),
"lastHeartbeat" : ISODate("2017-07-05T07:39:46Z"),
"lastHeartbeatRecv" : ISODate("2017-07-05T07:39:46Z"),
"pingMs" : 0,
"electionTime" : Timestamp(1499240328, 1),
"electionDate" : ISODate("2017-07-05T07:38:48Z")
}
],
"ok" : 1
}
testSet:SECONDARY> (在新选举的primary上操作)
testSet:PRIMARY>
testSet:PRIMARY> rs.status()
……
testSet:PRIMARY> db.printReplicationInfo() (查看oplog大小及时间)
configured oplog size: 1180.919921875MB
log length start to end: 846secs (0.24hrs)
oplog first event time: Wed Jul 05 2017 15:14:03 GMT+0800 (CST)
oplog last event time: Wed Jul 05 2017 15:28:09 GMT+0800 (CST)
now: Wed Jul 05 2017 15:42:57 GMT+0800 (CST)
testSet:SECONDARY> quit() (将172.168.101.239停服务,当前的primary在10S内连不上,会将其health标为0)
[root@ebill-redis ~]# /etc/init.d/mongod stop
Stopping mongod: [ OK ]
testSet:PRIMARY> rs.status()
{
"set" : "testSet",
"date" : ISODate("2017-07-05T07:51:58Z"),
"myState" : 1,
"members" : [
{
"_id" : 0,
"name" : "172.168.101.239:27017",
"health" : 0,
"state" : 8,
"stateStr" : "(not reachable/healthy)",
"uptime" : 0,
"optime" : Timestamp(1499239689, 1),
"optimeDate" : ISODate("2017-07-05T07:28:09Z"),
"lastHeartbeat" : ISODate("2017-07-05T07:51:57Z"),
"lastHeartbeatRecv" : ISODate("2017-07-05T07:44:24Z"),
"pingMs" : 0,
"syncingTo" : "172.168.101.221:27017"
},
……
testSet:PRIMARY> cfg=rs.conf() (将 172.168.101.233 从节点的优先级调为2,使之触发选举时成为主,立即生效)
{
"_id" : "testSet",
"version" : 3,
"members" : [
{
"_id" : 0,
"host" : "172.168.101.239:27017"
},
{
"_id" : 1,
"host" : "172.168.101.233:27017"
},
{
"_id" : 2,
"host" : "172.168.101.221:27017"
}
]
}
testSet:PRIMARY> cfg.members[1].priority=2
2
testSet:PRIMARY> rs.reconfig(cfg) (经测试此步骤可不用执行,在执行>cfg.members[1].priority=2,优先级最高的节点将直接成为primary)
2017-07-05T16:04:10.101+0800 DBClientCursor::init call() failed
2017-07-05T16:04:10.103+0800 trying reconnect to 172.168.101.221:27017 (172.168.101.221) failed
2017-07-05T16:04:10.104+0800 reconnect 172.168.101.221:27017 (172.168.101.221) ok
reconnected to server after rs command (which is normal)
testSet:PRIMARY> cfg=rs.conf() (添加仲裁节点,若当前的secondary数据已复制完成则不能修改此属性值,要先>rs.remove(hostportstr)再>rs.addArb(hostportstr)
{
"_id" : "testSet",
"version" : 6,
"members" : [
{
"_id" : 0,
"host" : "172.168.101.239:27017",
"priority" : 3
},
{
"_id" : 1,
"host" : "172.168.101.233:27017",
"priority" : 2
},
{
"_id" : 2,
"host" : "172.168.101.221:27017"
}
]
}
testSet:PRIMARY> cfg.members[2].arbiterOnly=true
true
testSet:PRIMARY> rs.reconfig(cfg)
{
"errmsg" : "exception: arbiterOnly may not change for members",
"code" : 13510,
"ok" : 0
}
testSet:PRIMARY> rs.printSlaveReplicationInfo()
source: 172.168.101.233:27017
syncedTo: Wed Jul 05 2017 16:08:07 GMT+0800 (CST)
0 secs (0 hrs) behind the primary
source: 172.168.101.221:27017
syncedTo: Wed Jul 05 2017 16:08:07 GMT+0800 (CST)
0 secs (0 hrs) behind the primary
mongodb sharding:
数据量大导致单机性能不足;
注:
mysql的分片借助Gizzard(twitter)、HiveDB、MySQL proxy+HSCALE、Hibernate shard(google)、Pyshard;
写离散、读要集中;
只要是集群,各node间时间要同步;
分片架构中的角色:
mongos #router or proxy,生产中mongos要作HA;
config server #元数据服务器,元数据实际是collection的index,生产中config server3台,能实现内部选举;
shard #数据节点,即mongod实例节点,生产中至少3台;
range #基于范围切片
list #基于列表分片
hash #基于hash分片
sharding is the process of storing data records across multiple machines and is mongdb's approach to meeting the demands of data growth;
as the size of the data increases,a single machine may not be sufficient to store the data nor provide an acceptable read and write throughput;
sharding solves the problem with horizontal scaling;
with sharding,you add more machines to support data growth and the demands of read and write operation;
purpose of sharding:
database systems with large data sets and high throughput applications can challenge the capacity of a single server;
high query rates can exhaust the cpu capacity of the server;
larger data sets exceed the storage capacity of a single machine;
finally,working set sizes larger than the system's ram stress the IO capacity of disk drives;
sharding in mongodb:
sharded cluster has the following components:shards,query,routers and config server:
shards store the data. to provide high availability and data consistency,in a production sharded cluster,each shard is a replicaset;
query routers,or mongos instances,interface with client applications and direct operations to the appropriate shard or shards;
the query router processes and targets operations to shards and then returns results to the clients;
a sharded cluster can containn more than one query router to divide the client request load;
a client sends requests to one query router;
most sharded cluster have many query routers;
config servers store the clsuter's metadata:
this data contains a mapping of the cluster's data set to the shards;
the query router uses this metadata to target opertations to specific shards;
production sharded clusters have exactly 3 config server;
data pratitioning:
shard keys:
to shard a collection,you need to select a shard key;
a shard key is either an indexed field or an indexed compound field that exists in every document in the collection;
mongodb divides the shard key values into chunks and distributes the chunks evenly across the shards;
to divide the shard key values into chunks,mongodb uses either range based partitioning and hash based partitioning;
range based sharding:
for range-based sharding,mongodb divides the data set into ranges determined by the shard key values to provide range based partitioning;
hash based sharding:
for hash based partitioning,mongodb computes a hash of a field's value,and then uses these hashes to create chunks;
performance distinctions between range and hash based partitioning:
range based partitioning supports more efficient range queries;
however,range based partitioning can result in an uneven distribution of data,which may negate some of the benefits of sharding;
hash based partitioning,by contrast,ensures an even distribution of data at the expense of efficient range queries;
but random distribution makes it more likely that a range query on the shard key will not be able to target a few shards but would more likely query every shard in order to return a result;
maintaining a balanced data distribution:
mongodb ensures a balanced cluster using two background process: splitting and the balancer
splitting:
a background process that keeps chunks from growing too large;
when a chunk grows beyond a specified chunk size,mongodb splits the chunk in half;
inserts and updates triggers splits;
splits are a efficient meta-data change;
to create splits,mongodb does not migrate any data or affect the shards;
balnacing:
the balancer is a background process that manages chunk migrations;
the balancer runs in all of ther query routers in a cluster;
when the distribution of a sharded collection in a cluster is uneven,the balanacer process migrates chunks from the shard that has the largest number of chunks to the shard with the lesat number of chunks until the collection balances;
the shards mange chunk migrations as a background operation;
during migration,all requests for a chunks data address the origin shard;
primary shard:
every database has a primary shard that holds all the un-sharded collections in that database;
broadcast operations:
mongos instances broadcast queries to all shards for the collection unless the mogos can determine which shard or subset of shards stores this data;
targeted operations:
all insert() operations target to one shard;
all single update(),including upsert operations,and remove() operations must target to one shard;
sharded cluster metadata:
config servers store the metadata for a sharded cluster:
the metadata reflects state and organization of the sharded data sets and system;
the metadata includes the list of chunks on every shard and the ranges that define the chunks;
the mongos instances cache this data and use it to route read and write operations to shards;
config servers store the metadata in the config database;
操作:
172.168.101.228 #mongos
172.168.101.239 #config server
172.168.101.233 &172.168.101.221 #shard{1,2}
在config server操作:
]# vim /etc/mongod.conf
dbpath=/mongodb/data
co nf igsvr=true
bind_ip=172.168.101.239
httpinterface=true
]# /etc/init.d/mongod start
Starting mongod: [ OK ]
]# ss -tnl | egrep "27019|28019"
LISTEN 0 128 172.168.101.239:27019 *:*
LISTEN 0 128 172.168.101.239:28019 *:*
在shard{1,2}操作:
]# vim /etc/mongod.conf
dbpath=/mongodb/data
bind_ip=172.168.101.233
httpinterface=true
]# /etc/init.d/mongod start
Starting mongod: [ OK ]
]# ss -tnl | egrep "27017|28017"
LISTEN 0 128 172.168.101.233:27017 *:*
LISTEN 0 128 172.168.101.233:28017 *:*
在mongos操作:
]# yum -y localinstall mongodb-org-mongos-2.6.4-1.x86_64.rpm mongodb-org-server-2.6.4-1.x86_64.rpm mongodb-org-shell-2.6.4-1.x86_64.rpm mongodb-org-tools-2.6.4-1.x86_64.rpm (仅mongos节点要安装mongodb-org-mongos-2.6.4-1.x86_64.rpm,其它节点仅安装server,shell,tools)
]# mongos --cnotallow=172.168.101.239:27019 --logpath=/var/log/mongodb/mongos.log --httpinterface --fork
2017-07-06T15:00:32.926+0800 warning: running with 1 config server should be done only for testing purposes and is not recommended for production
about to fork child process, waiting until server is ready for connections.
forked process: 5349
child process started successfully, parent exiting
]# ss -tnl | egrep "27017|28017"
LISTEN 0 128 *:27017 *:*
LISTEN 0 128 *:28017 *:*
]# mongo --host 172.168.101.228
MongoDB shell version: 2.6.4
connecting to: 172.168.101.228:27017/test
Welcome to the MongoDB shell.
For interactive help, type "help".
For more comprehensive documentation, see
http://docs.mongodb.org/
Questions? Try the support group
http://groups.google.com/group/mongodb-user
mongos> sh.help()
……
mongos> sh.status()
--- Sharding Status ---
sharding version: {
"_id" : 1,
"version" : 4,
"minCompatibleVersion" : 4,
"currentVersion" : 5,
"clusterId" : ObjectId("595ddf9b52f218b1b6f9fcf2")
}
shards:
databases:
{ "_id" : "admin", "partitioned" : false, "primary" : "config" }
mongos> sh.addShard("172.168.101.233")
{ "shardAdded" : "shard0000", "ok" : 1 }
mongos> sh.addShard("172.168.101.221")
{ "shardAdded" : "shard0001", "ok" : 1 }
mongos> sh.status()
--- Sharding Status ---
sharding version: {
"_id" : 1,
"version" : 4,
"minCompatibleVersion" : 4,
"currentVersion" : 5,
"clusterId" : ObjectId("595ddf9b52f218b1b6f9fcf2")
}
shards:
{ "_id" : "shard0000", "host" : "172.168.101.233:27017" }
{ "_id" : "shard0001", "host" : "172.168.101.221:27017" }
databases:
{ "_id" : "admin", "partitioned" : false, "primary" : "config" }
mongos> sh.enableSharding("testdb") (在testdb库上启用sharding功能)
{ "ok" : 1 }
mongos> sh.status()
--- Sharding Status ---
sharding version: {
"_id" : 1,
"version" : 4,
"minCompatibleVersion" : 4,
"currentVersion" : 5,
"clusterId" : ObjectId("595ddf9b52f218b1b6f9fcf2")
}
shards:
{ "_id" : "shard0000", "host" : "172.168.101.233:27017" }
{ "_id" : "shard0001", "host" : "172.168.101.221:27017" }
databases:
{ "_id" : "admin", "partitioned" : false, "primary" : "config" }
{ "_id" : "test", "partitioned" : false, "primary" : "shard0001" }
{ "_id" : "testdb", "partitioned" : true, "primary" : "shard0001" }
mongos> for (i=1;i<=10000;i++) {db.students.insert({name: "student"+i,age: (i%120),classes: "class"+(i%10),address: "www.magedu.com"})}
WriteResult({ "nInserted" : 1 })
mongos> db.students.find().count()
10000
mongos> sh.shardCollection("testdb.students",{age: 1})
{
"proposedKey" : {
"age" : 1
},
"curIndexes" : [
{
"v" : 1,
"key" : {
"_id" : 1
},
"name" : "_id_",
"ns" : "testdb.students"
}
],
"ok" : 0,
"errmsg" : "please create an index that starts with the shard key before sharding."
}
sh.moveChunk(fullName,find,to) move the chunk where 'find' is to 'to' (name of shard) ,手动均衡,不建议使用
mongos> use admin
switched to db admin
mongos> db.runCommand("listShards")
{
"shards" : [
{
"_id" : "shard0000",
"host" : "172.168.101.233:27017"
},
{
"_id" : "shard0001",
"host" : "172.168.101.221:27017"
}
],
"ok" : 1
}
mongos> sh.isBalancerRunning() (均衡器仅在工作时才显示true)
false
mongos> sh.getBalancerState()
true
mongos> db.printShardingStatus() (同>sh.status())
--- Sharding Status ---
……
注:
> help
db.help() help on db methods
db.mycoll.help() help on collection methods
sh.help() sharding helpers
rs.help() replica set helpers
help admin administrative help
help connect connecting to a db help
help keys key shortcuts
help misc misc things to know
help mr mapreduce
show dbs show database names
show collections show collections in current database
show users show users in current database
show profile show most recent system.profile entries with time >= 1ms
show logs show the accessible logger names
show log [name] prints out the last segment of log in memory, 'global' is default
use <db_name> set current database
db.foo.find() list objects in collection foo
db.foo.find( { a : 1 } ) list objects in foo where a == 1
it result of the last line evaluated; use to further iterate
DBQuery.shellBatchSize = x set default number of items to display on shell
exit quit the mongo shell
> db.help()
DB methods:
db.adminCommand(nameOrDocument) - switches to 'admin' db, and runs command [ just calls db.runCommand(...) ]
db.auth(username, password)
db.cloneDatabase(fromhost)
db.commandHelp(name) returns the help for the command
db.copyDatabase(fromdb, todb, fromhost)
db.createCollection(name, { size : ..., capped : ..., max : ... } )
db.createUser(userDocument)
db.currentOp() displays currently executing operations in the db
db.dropDatabase()
db.eval(func, args) run code server-side
db.fsyncLock() flush data to disk and lock server for backups
db.fsyncUnlock() unlocks server following a db.fsyncLock()
db.getCollection(cname) same as db['cname'] or db.cname
db.getCollectionNames()
db.getLastError() - just returns the err msg string
db.getLastErrorObj() - return full status object
db.getMongo() get the server connection object
db.getMongo().setSlaveOk() allow queries on a replication slave server
db.getName()
db.getPrevError()
db.getProfilingLevel() - deprecated
db.getProfilingStatus() - returns if profiling is on and slow threshold
db.getReplicationInfo()
db.getSiblingDB(name) get the db at the same server as this one
db.getWriteConcern() - returns the write concern used for any operations on this db, inherited from server object if set
db.hostInfo() get details about the server's host
db.isMaster() check replica primary status
db.killOp(opid) kills the current operation in the db
db.listCommands() lists all the db commands
db.loadServerScripts() loads all the scripts in db.system.js
db.logout()
db.printCollectionStats()
db.printReplicationInfo()
db.printShardingStatus()
db.printSlaveReplicationInfo()
db.dropUser(username)
db.repairDatabase()
db.resetError()
db.runCommand(cmdObj) run a database command. if cmdObj is a string, turns it into { cmdObj : 1 }
db.serverStatus()
db.setProfilingLevel(level,<slowms>) 0=off 1=slow 2=all
db.setWriteConcern( <write concern doc> ) - sets the write concern for writes to the db
db.unsetWriteConcern( <write concern doc> ) - unsets the write concern for writes to the db
db.setVerboseShell(flag) display extra information in shell output
db.shutdownServer()
db.stats()
db.version() current version of the server
> db.mycoll.help()
DBCollection help
db.mycoll.find().help() - show DBCursor help
db.mycoll.count()
db.mycoll.copyTo(newColl) - duplicates collection by copying all documents to newColl; no indexes are copied.
db.mycoll.convertToCapped(maxBytes) - calls {convertToCapped:'mycoll', size:maxBytes}} command
db.mycoll.dataSize()
db.mycoll.distinct( key ) - e.g. db.mycoll.distinct( 'x' )
db.mycoll.drop() drop the collection
db.mycoll.dropIndex(index) - e.g. db.mycoll.dropIndex( "indexName" ) or db.mycoll.dropIndex( { "indexKey" : 1 } )
db.mycoll.dropIndexes()
db.mycoll.ensureIndex(keypattern[,options]) - options is an object with these possible fields: name, unique, dropDups
db.mycoll.reIndex()
db.mycoll.find([query],[fields]) - query is an optional query filter. fields is optional set of fields to return.
e.g. db.mycoll.find( {x:77} , {name:1, x:1} )
db.mycoll.find(...).count()
db.mycoll.find(...).limit(n)
db.mycoll.find(...).skip(n)
db.mycoll.find(...).sort(...)
db.mycoll.findOne([query])
db.mycoll.findAndModify( { update : ... , remove : bool [, query: {}, sort: {}, 'new': false] } )
db.mycoll.getDB() get DB object associated with collection
db.mycoll.getPlanCache() get query plan cache associated with collection
db.mycoll.getIndexes()
db.mycoll.group( { key : ..., initial: ..., reduce : ...[, cond: ...] } )
db.mycoll.insert(obj)
db.mycoll.mapReduce( mapFunction , reduceFunction , <optional params> )
db.mycoll.aggregate( [pipeline], <optional params> ) - performs an aggregation on a collection; returns a cursor
db.mycoll.remove(query)
db.mycoll.renameCollection( newName , <dropTarget> ) renames the collection.
db.mycoll.runCommand( name , <options> ) runs a db command with the given name where the first param is the collection name
db.mycoll.save(obj)
db.mycoll.stats()
db.mycoll.storageSize() - includes free space allocated to this collection
db.mycoll.totalIndexSize() - size in bytes of all the indexes
db.mycoll.totalSize() - storage allocated for all data and indexes
db.mycoll.update(query, object[, upsert_bool, multi_bool]) - instead of two flags, you can pass an object with fields: upsert, multi
db.mycoll.validate( <full> ) - SLOW
db.mycoll.getShardVersion() - only for use with sharding
db.mycoll.getShardDistribution() - prints statistics about data distribution in the cluster
db.mycoll.getSplitKeysForChunks( <maxChunkSize> ) - calculates split points over all chunks and returns splitter function
db.mycoll.getWriteConcern() - returns the write concern used for any operations on this collection, inherited from server/db if set
db.mycoll.setWriteConcern( <write concern doc> ) - sets the write concern for writes to the collection
db.mycoll.unsetWriteConcern( <write concern doc> ) - unsets the write concern for writes to the collection
> db.mycoll.find().help()
find() modifiers
.sort( {...} )
.limit( n )
.skip( n )
.count(applySkipLimit) - total # of objects matching query. by default ignores skip,limit
.size() - total # of objects cursor would return, honors skip,limit
.explain([verbose])
.hint(...)
.addOption(n) - adds op_query options -- see wire protocol
._addSpecial(name, value) - http://dochub.mongodb.org/core/advancedqueries#AdvancedQueries-Metaqueryoperators
.batchSize(n) - sets the number of docs to return per getMore
.showDiskLoc() - adds a $diskLoc field to each returned object
.min(idxDoc)
.max(idxDoc)
.comment(comment)
.snapshot()
.readPref(mode, tagset)
Cursor methods
.toArray() - iterates through docs and returns an array of the results
.forEach( func )
.map( func )
.hasNext()
.next()
.objsLeftInBatch() - returns count of docs left in current batch (when exhausted, a new getMore will be issued)
.itcount() - iterates through documents and counts them
.pretty() - pretty print each document, possibly over multiple lines
> rs.help()
rs.status() { replSetGetStatus : 1 } checks repl set status
rs.initiate() { replSetInitiate : null } initiates set with default settings
rs.initiate(cfg) { replSetInitiate : cfg } initiates set with configuration cfg
rs.conf() get the current configuration object from local.system.replset
rs.reconfig(cfg) updates the configuration of a running replica set with cfg (disconnects)
rs.add(hostportstr) add a new member to the set with default attributes (disconnects)
rs.add(membercfgobj) add a new member to the set with extra attributes (disconnects)
rs.addArb(hostportstr) add a new member which is arbiterOnly:true (disconnects)
rs.stepDown([secs]) step down as primary (momentarily) (disconnects)
rs.syncFrom(hostportstr) make a secondary to sync from the given member
rs.freeze(secs) make a node ineligible to become primary for the time specified
rs.remove(hostportstr) remove a host from the replica set (disconnects)
rs.slaveOk() shorthand for db.getMongo().setSlaveOk()
rs.printReplicationInfo() check oplog size and time range
rs.printSlaveReplicationInfo() check replica set members and replication lag
db.isMaster() check who is primary
reconfiguration helpers disconnect from the database so the shell will display
an error, even if the command succeeds.
see also http://<mongod_host>:28017/_replSet for additional diagnostic info
mongos> sh.help()
sh.addShard( host ) server:port OR setname/server:port
sh.enableSharding(dbname) enables sharding on the database dbname
sh.shardCollection(fullName,key,unique) shards the collection
sh.splitFind(fullName,find) splits the chunk that find is in at the median
sh.splitAt(fullName,middle) splits the chunk that middle is in at middle
sh.moveChunk(fullName,find,to) move the chunk where 'find' is to 'to' (name of shard)
sh.setBalancerState( <bool on or not> ) turns the balancer on or off true=on, false=off
sh.getBalancerState() return true if enabled
sh.isBalancerRunning() return true if the balancer has work in progress on any mongos
sh.addShardTag(shard,tag) adds the tag to the shard
sh.removeShardTag(shard,tag) removes the tag from the shard
sh.addTagRange(fullName,min,max,tag) tags the specified range of the given collection
sh.status() prints a general overview of the cluster
《NoSQL数据库入门》
《MongoDB权威指南》
《深入学习MongoDB》
学习模式:
基本概念;
安装;
数据类型与数据模型;
与应用程序对接;
shell操作命令;
主从复制;
分片;
管理维护;
应用案例;
mongodb:
面向文档的NoSQL;
https://www.mongodb.com/
名称源自humongous(巨大无比);
C++编写,支持linux/windows(32bit/64bit),solaris64bit,OS X64bit;
10gen(云平台)研发的面向文档的数据库(2007年10月,mongodb由10gen团队所发展,2009年2月首度推出),目前最新版3.4.2;
开源软件,GPL许可,也有10gen颁发的商业许可(具有商业性质的企业级支持、培训和咨询服务);
https://repo.mongodb.com/yum/redhat/
mongodb是NoSQL中发展最好的一款,最接近关系型DB,并且最有可能取而代之它;
mongodb基本不需要多做做优化,只要掌握索引就行,它自身的自动分片集群会化解压力;
mongodb查询语句是在java script基础上发展而来;
LAMP-->LNMP(nginx,mongodb,python)
注:
hadoop和redis是由个人开发的,mongodb是由公司开发的;
技术特点:
面向文档的存储引擎,可方便支持非结构化数据;
全面的索引支持,可在任意属性上建立索引;
数据库本身内置的复制与高可用;
数据库本身支持的自动化分片集群;
丰富的基于文档的查询功能;
原子化的查询操作;
支持map/reduce;
GridFS;
非结构化数据(举例:调查表,公司设备管理):
不能确定表的列结构的数据(而SQL,sturcture query language结构化查询语言,先确定表结构即列);
误区(认为多媒体数据、大数据是非结构化数据);
非结构化的烦恼(无法固定模式/模型;数据结构持续变化中;数据库管理员和开发人员的压力被非必须的扩大);
文档(行)
可理解为调查表,一行是一个调查表,由多个"key:value"(“属性名:值”)组成,在多行中,可给不固定的key(属性)加上索引,value(值)有数据类型区分;
举例:
{"foo":3,"greeting":"Hello,world!"}
数值:
{"foo":3}
字符串:
{"foo":"3"}
区分大小写:
{"foo":3}
{"Foo":3}
key不能重复:
{"greeting":"Hello world!","greeting":"Hello,MongoDB!"}
文档中可以嵌入文档
集合:
集合就是一组文档;
文档类似于关系库里的行;
集合类似于关系库里的表;
集合是无模式的,即集合中的文档可以五花八门,无需固定结构;
集合的命名;
数据库:
数据库是由多个集合组成;
数据库的命名与命名空间规则,见《MongoDB the definitive guide.pdf》;
特殊数据库;
注:
集合的命名:
a collection is identified by its name.collection names can be any utf-8 string,with a few resrictions:
1.the empty string("") is not a valid collection name;
2.collection names may not contain the character \0(the null character) because this delineates the end of a collection name;
3.you should not create any collections that start with system,a prefix reserved for system collections,for example,the system .users collection contains the database's users,and the system namespaces collection contains information about all of the databases's collections.
4.user-created collections should not contain the reserved characters $ in the name. the various drivers avaious drivers available for the database do support using $ in collection names because some system-generated collections contain it. you should not use $ in a name unless you are accessing one of these collections.
数据库命名:
1.the empty string("") is not a valid database name;
2.a database name cannot contain any of these characters:' '(single space)、-、$、/、\、\0(the null character);
3.database names should be all lowercase;
4.database names are limited to a maximum of 64bytes;
缺省库:
admin (this is the "root" database,in terms of authentication.if a user is added to the admin database,the user automatically inherits permissions for all databases,there are also certain server-wide commands that can be run only from the admin database; such as listing all of the databases or shutting down server);
local (the database will never be replicated and can be used to store any collections that should be local to a single server(see chapter 9 for more information about replication and the local database));
config (when mongo is being used in a sharded set up (see chapter 10),the config database is used internally to store information about the shards);
注:
condor cluster,秃鹰群,2010-12-7,美国空军研究实验室近日研制出一台号称美国国防部速度最快的超级计算机--“秃鹰群”。这台超级计算机由1760台索尼PS3游戏机组成,运算性能可达每秒500万亿次浮点运算,它也是世界上第33大超级计算机;“秃鹰群”超级计算机项目开始于4年前,当时每台PS游戏机价值约400美元,组成超级计算机核心的PS3游戏机共花费大约为 200万美元 ,美国空军研究实验室高性能计算部主任马克-巴内尔表示,这样的花费只相当于利用成品计算机部件组装的同等系统花费的5%到10%;基于PS3游戏机的超级计算机的另一个优点就是它的能效,它所消耗的能量仅仅相当于类似超级计算机的10%。除了PS3游戏机外,这台超级计算机还包括了168个独立的图像处理单元和84个协调服务器;
“天河一号”超级计算机由中国国防科学技术大学研制,部署在天津的国家超级计算机中心,其测试运算速度可以达到每秒2570万亿次。2009年10月29日,作为第一台国产千万亿次超级计算机“天河一号”在湖南长沙亮相;该超级计算机峰值性能速度能达到每秒1206万亿次双精度浮点数,而LINPACK实测性能速度达到每秒563.1万亿次;10亿人民币;
二、操作:
[root@test1 ~]# uname -rm
2.6.32-431.el6.x86_64 x86_64
[root@test1 ~]# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 6.5 (Santiago)
[root@test1 ~]# vim /etc/yum.repos.d/mongodb-enterprise.repo (将baseurl中的$releasever改为6)
[mongodb-enterprise]
name=MongoDB Enterprise Repository
baseurl=https://repo.mongodb.com/yum/redhat/$releasever/mongodb-enterprise/3.4/$basearch/
gpgcheck=1
enabled=1
gpgkey=https://www.mongodb.org/static/pgp/server-3.4.asc
[root@test1 ~]# yum clean all
[root@test1 ~]# yum makecache
[root@test1 ~]# yum -y install mongodb-enterprise
Installing:
mongodb-enterprise x86_64 3.4.2-1.el6 mongodb-enterprise 5.8 k
Installing for dependencies:
cyrus-sasl-gssapi x86_64 2.1.23-15.el6_6.2 base 34 k
mongodb-enterprise-mongos x86_64 3.4.2-1.el6 mongodb-enterprise 12 M
mongodb-enterprise-server x86_64 3.4.2-1.el6 mongodb-enterprise 20 M
mongodb-enterprise-shell x86_64 3.4.2-1.el6 mongodb-enterprise 12 M
mongodb-enterprise-tools x86_64 3.4.2-1.el6 mongodb-enterprise 61 M
……
Downloading Packages:
(1/6): cyrus-sasl-gssapi-2.1.23-15.el6_6.2.x86_64.rpm | 34 kB 00:00
(2/6): mongodb-enterprise-3.4.2-1.el6.x86_64.rpm | 5.8 kB 00:00
(3/6): mongodb-enterprise-mongos-3.4.2-1.el6.x86_64.rpm | 12 MB 00:36
(4/6): mongodb-enterprise-server-3.4.2-1.el6.x86_64.rpm | 20 MB 01:04
(5/6): mongodb-enterprise-shell-3.4.2-1.el6.x86_64.rpm | 12 MB 00:46
(6/6): mongodb-enterprise-tools-3.4.2-1.el6.x86_64.rpm | 61 MB 04:06
[root@test1 ~]# yum info mongodb-enterprise
Name : mongodb-enterprise
Arch : x86_64
Version : 3.4.2
Release : 1.el6
Size : 0.0
Repo : installed
From repo : mongodb-enterprise
Summary : MongoDB open source document-oriented database system (enterprise metapackage)
URL : http://www.mongodb.org
License : Commercial
Description : MongoDB is built for scalability, performance and high availability, scaling from single server deployments
: to large, complex multi-site architectures. By leveraging in-memory computing, MongoDB provides high
: performance for both reads and writes. MongoDB’s native replication and automated failover enable
: enterprise-grade reliability and operational flexibility.
:
: MongoDB is an open-source database used by companies of all sizes, across all industries and for a wide
: variety of applications. It is an agile database that allows schemas to change quickly as applications
: evolve, while still providing the functionality developers expect from traditional databases, such as
: secondary indexes, a full query language and strict consistency.
:
: MongoDB has a rich client ecosystem including hadoop integration, officially supported drivers for 10
: programming languages and environments, as well as 40 drivers supported by the user community.
:
: MongoDB features:
: * JSON Data Model with Dynamic Schemas
: * Auto-Sharding for Horizontal Scalability
: * Built-In Replication for High Availability
: * Rich Secondary Indexes, including geospatial
: * TTL indexes
: * Text Search
: * Aggregation Framework & Native MapReduce
:
: This metapackage will install the mongo shell, import/export tools, other client utilities, server
: software, default configuration, and init.d scripts.
[root@test1 ~]# vim /etc/mongod.conf
systemLog:
destination: file
logAppend: true
path: /var/log/mongodb/mongod.log
storage:
dbPath: /var/lib/mongo
journal:
enabled: true
processManagement:
fork: true # fork and run in background
pidFilePath: /var/run/mongodb/mongod.pid # location of pidfile
net:
port: 27017
bindIp: 127.0.0.1 # Listen to local interface only, comment to listen on all interfaces.
#security:
#operationProfiling:
#replication:
#sharding:
## Enterprise-Only Options
#auditLog:
#snmp:
[root@test1 ~]# which mongod
/usr/bin/mongod
[root@test1 ~]# which mongo
/usr/bin/mongo
[root@test1 ~]# /etc/init.d/mongod start (也可#mongod -f /etc/mongod.conf)
Starting mongod: [ OK ]
[root@test1 ~]# mongo (running the shell)
MongoDB shell version v3.4.2
connecting to: mongodb://127.0.0.1:27017
MongoDB server version: 3.4.2
Welcome to the MongoDB shell.
……
MongoDB Enterprise > help
MongoDB Enterprise > db.help()
MongoDB Enterprise > db.mycollection.help()
MongoDB Enterprise > db.mycollection. find #(查看find函数源码)
MongoDB Enterprise > db.mycollection.find() (读集合mycollection中各数据)
MongoDB Enterprise > x=200
200
MongoDB Enterprise > x/5
40
MongoDB Enterprise > function factorial (n) {
... if (n <= 1) return 1;
... return n * factorial(n - 1);
... }
MongoDB Enterprise > factorial(5)
120
MongoDB Enterprise > Math.sin(Math.PI/2);
1
MongoDB Enterprise > new Date("2017/3/14");
ISODate("2017-03-14T07:00:00Z")
MongoDB Enterprise > "Hello,MongoDB!".replace("World","hello");
Hello,MongoDB!
切换数据库:
MongoDB Enterprise > db (查看当前在哪个库)
test
MongoDB Enterprise > use foobar (切库,即使不存在也能切,只要在里面创建表,这个库就存在了)
switched to db foobar
MongoDB Enterprise > db
foobar
创建:
MongoDB Enterprise > post = {"title":"My Blog Post","content":"Here's my blog post","data":new Date()}
{
"title" : "My Blog Post",
"content" : "Here's my blog post",
"data" : ISODate("2017-03-14T07:15:40.245Z")
}
MongoDB Enterprise > db.blog.insert(post) (将文档post插入到集合blog里)
WriteResult({ "nInserted" : 1 })
读数据:
MongoDB Enterprise > db.blog.find()
{ "_id" : ObjectId("58c798af28fceb7f7c84d505"), "title" : "My Blog Post", "content" : "Here's my blog post", "data" : ISODate("2017-03-14T07:15:40.245Z") }
MongoDB Enterprise > db.blog.findOne() (格式化显示)
{
"_id" : ObjectId("58c798af28fceb7f7c84d505"),
"title" : "My Blog Post",
"content" : "Here's my blog post",
"data" : ISODate("2017-03-14T07:15:40.245Z")
}
修改:
MongoDB Enterprise > post.comments = [] (添加空数组)
[ ]
MongoDB Enterprise > db.blog.update({title:"My Blog Post"},post)
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })
MongoDB Enterprise > db.blog.findOne()
{
"_id" : ObjectId("58c798af28fceb7f7c84d505"),
"title" : "My Blog Post",
"content" : "Here's my blog post",
"data" : ISODate("2017-03-14T07:15:40.245Z"),
"comments" : [ ]
}
删除:
MongoDB Enterprise > db.blog.remove({title:"My Blog Post"}) #(删除指定记录,所有文档中,包含此key:value的都删除)
WriteResult({ "nRemoved" : 1 })
MongoDB Enterprise > db.blog.findOne()
null
MongoDB Enterprise > db.blog.remove() (删除全部文档,清空集合,2.4.1版本支持此操作,3.4版本必须要有查询语句)
MongoDB Enterprise > db.blog.drop() (删除集合,若集合内包含了过多记录使用此方式删除,再使用 db.blog.ensureIndex(keypattern[,options]) 重建索引)
true
MongoDB Enterprise > db.blog.findOne()
null
文档替换:
原文档:
{
"_id":ObjectId("……"),
"name":"joe",
"friend":32,
"enemies":2
}
期望结果:
{
"_id":ObjectId("……"),
"username":"joe",
"relationships":
{
"friends":32,
"enemies":2
}
}
操作:
varjoe=db.users.findOne({"name":"joe"});
joe.relatinotallow={"friend":joe.friends,
"enemies":joe.enemies};
joe.username=joe.name;
deletejoe.friends;
deletejoe.enemies;
deletejoe.name;
MongoDB中的数据类型:
交互模式下(shell中)支持的数据类型:空值型;布尔型;64bit浮点数;字符串;对象id;日期;正则表达式;代码;数组;内嵌文档;未定义类型;
其它数据类型(不在shell下,支持更多,包括shell中的全部,例如用app连接db):
32bit/64bit整数;符号;二进制数据;最大值(系统所允许的最大数字);最小值(系统所允许的最小数字);
注:shell下不支持整数,若有整数会自动转换为64bit浮点数;
_id与ObjectId:
_id用于唯一标示文档,类似oracle里的rowid;
ObjectId是_id的缺省产生办法;
ObjectId由12byte组成,即24个16进制数字;
第0-3字节为时间戳,第4-6字节为机器标识(一般是主机名的散列值),第7-8字节是pid,第9-11字节是计数器;
数组:
mongodb中的数组既可以作为有序对象(如列表)来操作,也可以作为无序对象(如集合)来操作,mongodb可以以原子的形式来修改或删除数组中的元素;
举例:
"Status":0 #(不是数组)
"Messages":[] #(空数组)
"ResponseBody":[
"test1",
"test2",
"test3",
] #(包括3个元素的数组)
文档嵌套:
{"Status":0,
"Responsebody":{"CityName":"Beijing","Keyword":"diaoyutai",
"LandMarks":[{"Name":"diaoyutai beijing luxingshe"},{"Name":"diaoyutai"}
]}}
insert文档:
插入时mongodb会检查文档里是否包含_id,如果没有指定_id,mongodb会为其创建;
对于多个文档,推荐批量操作,优点(更少的连接次数;更少的信息头控制;对“待插入集合”的灵活控制);
插入文档时,默认mongodb仅检查传入数据是否包含_id及数据大小是否超过16MB(v1.8为4M),除此之外不再做任何验证,如此简单的判断可得到更高的性能,缺点是可能被录入无效数据;
mongodb在插入数据时不执行任何代码,因此没有注入攻击的风险;
更新:
db.users.update({"name":"joe"},joe);
upsert模式:
在upsert模式下,如果找到匹配记录则更新,如果找不到匹配记录创建一条新记录;
指定第三个参数为true,可开启upsert模式(db.users.update({"name":"joe"},joe, true ););
multi模式:
默认update只会更新匹配到的第一个文档,开启multi模式才会更新所有匹配到的文档;
指定第四个参数为true,可开启multi模式(db.users.update({"name":"joe"},joe.true,true););
数据操作(类似java script语言):
修改器:
$inc(数字,增加或减少数字的值,键不存在时会自动创建);
$set(设置指定键的值,键不存在时会自动创建);
$unset($set的反操作,会删除键及键值);
$push(数组,将元素追加到数组末尾,数组不存在时会自动创建);
$pushAll(数组,$push的批量操作);
$addToSet(数组,同$pushAll,但会自动过滤重复元素);
$pop(数组,{$pop:{key:1}}从数组末尾移除元素;{$pop:{key:-1}}从数组开头移除元素);
$pull(数组,从数组中移除所有匹配的元素);
$pullAll(数组,$pull的反操作);
$rename(键,修改指定键的键名);
$bit(数字,对整形键值执行位操作,"与""或");
MongoDB Enterprise > db.hits.insert({"url":"www.datagru.cn","pv":100});
WriteResult({ "nInserted" : 1 })
MongoDB Enterprise > db.hits.find()
{ "_id" : ObjectId("58cf33123b9c3d55707193ff"), "url" : "www.datagru.cn", "pv" : 100 }
MongoDB Enterprise > db.hits.update({"url":"www.itpub.net"},{"$inc":{"pv":1}});
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })
MongoDB Enterprise > db.hits.find()
{ "_id" : ObjectId("58cf33123b9c3d55707193ff"), "url" : "www.datagru.cn", "pv" : 100 }
MongoDB Enterprise > db.hits.update({"_id":ObjectId("58cf33123b9c3d55707193ff")},{"$set":{"ip":20}}); (没有则增加,有则更改)
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })
MongoDB Enterprise > db.hits.find()
{ "_id" : ObjectId("58cf33123b9c3d55707193ff"), "url" : "www.datagru.cn", "pv" : 100, "ip" : 20 }
MongoDB Enterprise > db.hits.update({"_id":ObjectId("58cf33123b9c3d55707193ff")},{"$unset":{"ip":20}}); (删除键值)
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })
MongoDB Enterprise > db.hits.find()
{ "_id" : ObjectId("58cf33123b9c3d55707193ff"), "url" : "www.datagru.cn", "pv" : 100 }
MongoDB Enterprise > db.hits.update({"_id":ObjectId("58cf33123b9c3d55707193ff")},{"$set":{"users":[{"name":"Zhang","age":20},{"name":"Huang","age":21}]}});
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })
MongoDB Enterprise > db.hits.update({"_id":ObjectId("58cf33123b9c3d55707193ff")},{"$push":{"users":{"name":"He","age":22}}});
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })
MongoDB Enterprise > db.hits.find()
{ "_id" : ObjectId("58cf33123b9c3d55707193ff"), "url" : "www.datagru.cn", "pv" : 100, "users" : [ { "name" : "Zhang", "age" : 20 }, { "name" : "Huang", "age" : 21 }, { "name" : "He", "age" : 22 } ] }
定位修改:
MongoDB Enterprise > db.hits.update({"_id":ObjectId("58cf33123b9c3d55707193ff")},{"$set":{"users.2.name":"Lin"}}); (数组中的第一个元素从0开始,此例2即第3个元素)
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })
MongoDB Enterprise > db.hits.find()
{ "_id" : ObjectId("58cf33123b9c3d55707193ff"), "url" : "www.datagru.cn", "pv" : 100, "users" : [ { "name" : "Zhang", "age" : 20 }, { "name" : "Huang", "age" : 21 }, { "name" : "Lin", "age" : 22 } ] }
MongoDB Enterprise > db.hits.update({"_id":ObjectId("58cf33123b9c3d55707193ff")},{"$set":{"users.0.name":"Xie"}});
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })
MongoDB Enterprise > db.hits.find()
{ "_id" : ObjectId("58cf33123b9c3d55707193ff"), "url" : "www.datagru.cn", "pv" : 100, "users" : [ { "name" : "Xie", "age" : 20 }, { "name" : "Huang", "age" : 21 }, { "name" : "Lin", "age" : 22 } ] }
定位操作符$:
MongoDB Enterprise > db.hits.update({"users.name":"Xie"},{"$set":{"users.$.name":"Gong"}});
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })
MongoDB Enterprise > db.hits.find()
{ "_id" : ObjectId("58cf33123b9c3d55707193ff"), "url" : "www.datagru.cn", "pv" : 100, "users" : [ { "name" : "Gong", "age" : 20 }, { "name" : "Huang", "age" : 21 }, { "name" : "Lin", "age" : 22 } ] }
>db.analytics.update({"url":"/blog"},{"$inc":{"visits":1}},true) #(upsert模式)
>db.analytics.findOne()
>db.users.update({birthday:"10/13/1978"},{$set:{gift:"Happy Birthday!"}},false,true) #(multi模式)
getLastError:
>db.runCommand({getLastError:1}) #(返回最后一条执行了什么)
findAndModify:
类似sql中select和update同时进行原子操作;
>ps=db.runCommand({"findAndModify":"processes",
"query":{"status":"READY"},
"sort":{"priority":-1},
"update":{"$set":{"status":"RUNNING"}})
>db.processes.findOne({"_id":"ps.value._id})
瞬时完成与安全操作:
瞬时完成(client将文档发给server,无须等待server有任何响应(实际上也没有),即可进行下一步操作,类似于udp,称为离弦之箭);
安全操作(client将文档发给server,再执行getLastError操作,等待server反馈后再继续下一操作,如果发现server反馈与预期不符,则抛出异常);
用python测试修改器速度:
$inc比$push要快;
[root@test1 ~]# yum -y install gcc python-devel
[root@test1 ~]# wget -q http://peak.telecommunity.com/dist/ez_setup.py
[root@test1 ~]# python ez_setup.py
[root@test1 ~]# easy_install pymongo
Searching for pymongo
Best match: pymongo 3.4.0
Processing pymongo-3.4.0-py2.6-linux-x86_64.egg
……
[root@test1 ~]# vim test.py (经测pymongo-3.4.0不支持Collection方法)
#!/usr/bin/python
#
from pymongo import Connection
import time
db = Connection().performance_test
db.drop_collection("update")
collection = db.updates
collection.insert({"x":1})
#make sure the insert is complete before we start timing
collection.find_one()
start = time.time()
for i in range(10000):
collection.update({},{"$inc":{"x":1}})
#collection.update({},""$push":{"x":1}})
#make sure the updates are complete before we stop timing
collection.find_one()
print time.time() - start
[root@test1 ~]# chmod +x !$
chmod +x test.py
#python test.py
mongodb查询:
操作 |
SQL |
mongodb |
所有记录 |
>SELECT * FROM users; |
>db.users.find() |
age=33的记录 |
>SELECT * from users WHERE age=33; |
>db.users.find({age:33}); |
子键(字段)筛选 |
>SELECT a,b from users WHERE age=33; |
>db.users.find({age:33},{a:1,b:1}); |
排序 |
>SELECT * FROM users WHERE age=33 ORDER BY name; |
db.users.find({age:33}).sort({name:1}); |
比大小 |
>SELECT * FROM users WHERE age>33; |
db.users.find({age:{$gt:33}}); |
正则(模糊匹配) |
>SELECT * FROM users WHERE name LIKE "Joe%"; |
>db.users.find({name:/^Joe/}); |
忽略,限制 |
>SELECT * FROM users LIMIT 10 SKIP 20; |
>db.users.find().limit(10).skip(20); |
or操作 |
>SELECT * FROM users WHERE a=1 or b=2; |
>db.users.find({$or:[{a:1},{b:2}]); |
仅返回1条 |
>SELECT * FROM users LIMIT 1; |
>db.users.findOne(); |
distinct聚合 |
>SELECT DISTINCT last_name FROM users; |
>db.users.distinct("Last_name"); |
count聚合 |
>SELECT COUNT(AGE) from users; |
db.users.find({age:{"$exists":true}}).count(); |
查询计划 |
>EXPLAIN SELECT * FROM users WHERE z=3; |
>db.users.find({z:3}).explain(); |
操作符:
1、$lt,$lte,$gt,$gte
2、$all #(数组中的元素是否完全匹配)
3、$exists #(可选true,false)
4、$mod #(取模,a%10==1)
5、$ne #(取反,not equals)
6、$in #(类似SQL的in操作)
7、$nin #($in的反操作,即SQL的NOT IN)
8、$nor #($or的反操作,即不匹配)
9、$or #(or子句,注意$or不能嵌套使用)
10、$size #(匹配数组长度)
11、$type #(匹配子键的数据类型)
子键筛选:
>db.abc.find({"a":23});
>db.abc.find({"a":23},{"b":1}); #(只列出b字段)
>db.abc.find({"a":23},{"b":1,"c":1}); #(只列出b、c字段)
>db.abc.find({"a":23},{"b":0}); #(不列出b字段)
>db.abc.find({"a":23},{"b":1,"_id":0}); #(列出b字段,不列出_id列)
包含比较的查询:
>db.abc.find({b:{"$ge":30,"$lt":35}});
>db.abc.find({b:{"$gte":24,"$ne":32}});
>db.things.find({a:{$all:[2,3]}});
>db.things.find({a:{$exists:true}});
>db.things.find({a:{$mod:[10,1]}}); #(除10余1)
>db.things.find({a:{$ne:3}});
>db.things.find({j:{$in:[2,4,6]}});
>db.things.find({j:{$nin:[2,4,6]}});
>db.things.find({name:"bob",$nor:[{a:1},{b:2}]});
>db.things.find({name:"bob",$or:[{a:1},{b:2}]});
>db.things.find({a:{$size:1}});
>db.things.find({a:{$type:2}});
逻辑操作符:
>db.abc.find({"b":{"$in":[25,32]}});
>db.abc.find({"b":{"$mod":[5,1]}});
>db.abc.find({"b":{"$not":{"$mod":[5,1]}}});
mongodb的正则表达式比SQL更强大,与perl兼容的正则式语法PCRE,但只有类似于/^.+/这样的前缀型正则表达式才能使用索引,其它形式都会做全集合扫描;
>db.users.find({"name":/joe/i}); #(忽略大小写)
>db.users.find({"name":/joey?/i});
数组查询:
>db.food.insert({"fruit":["apple","banana","peach"]});
>db.food.find({"fruit":"banana"});
>db.food.drop();
>db.food.insert({"_id":1,"fruit":["apple","banana","peach"]});
>db.food.insert({"_id":2,"fruit":["apple","kumquat","orange]});
>db.food.insert({"_id":3,"fruit":["cherry","banana","apple"]});
>db.food.find()
>db.food.find({fruit:{$all:["apple","banana"]});
>db.food.find({"fruit":{"$size":3}}); #(数组中有3个元素的)
>db.blog.posts.findOne(criteria,{"comments":{"$slice":10}}); #(第1到10个元素)
>db.blog.posts.findOne(criteria,{"comments":{"$slice":-10}}); #(倒数第1到倒数第10个元素)
>db.blog.posts.findOne(criteria,{"comments":{"$slice":[23,10]}}); #(第23个开始取10个)
查询内嵌文档:
>db.people.insert(
{
"name":{
"first":"Joe",
"last":"Schmoe"
}
"age":45
})
>db.people.find({"name":{"first":"Joe","last":"Schmoe"}});
>db.people.find({"name.first":"Joe","name.last":"Schmoe"});
$where:
>db.foo.insert({"apple":1,"banana":6,"peach":3});
>db.foo.insert({"apple":8,"spinach":4,"watermelon":4});
>db.foo.find()
>db.foo.find({"$where":function() {
for (var current in this) {
for (var other in this) {
if (current != other && this[current] == this[other]) {
return true;
}
}
}
return false;
}});
游标cursor:
一.先以非授权的模式启动MongoDB
非授权:
linux/Mac : mongod -f /mongodb/etc/mongo.conf
windows : mongod --config c:\mongodb\etc\mongo.conf 或者 net start mongodb (前提是mongo安装到了服务里面)
备注:
/mongodb/etc/mongo.conf 位mongo配置文件所在的地址
授权:
mongod -f /mongodb/etc/mongo.conf --auth
备注:
1.--auth代表授权启动,需要帐号密码才能访问
2.auth=true可以加到mongo.conf配置文件里面去进行统一管理
二.创建管理员
1.通过非授权的方式启动mongo
2.> use admin 切到admin数据库
3.添加管理员用户
> db.createUser( {user: "admin",pwd: "admin",roles: [ { role: " userAdminAnyDatabase ", db: "admin" } ]})
4.认证
use admin #必须切到admin库下才能执行db.auth("admin","admin")
db.auth("admin", " admin ")
三.以授权的方式启动Mongo,切到admin下执行db.auth("admin","admin")给使用的数据库添加用户
进入mongo shell,使用admin数据库并进行验证,如果不验证,是做不了任何操作的。
验证之后还是做不了操作,因为admin只有用户管理权限,下面创建用户,用户都跟着库走
#/etc/init.d/mongod restart
#mongo --host 10.113.128.235
>use admin
>db.auth("admin","admin")
> use test 切换要创建用户的数据库
> db.createUser({user: "root",pwd: "rte-mongodb",roles: [{ role: "readWrite", db: "rte" }]}) 创建用户
]# mongo 10.113.128.235/rte -uroot -prte-mongodb 通过客户端连接test数据库
须知:处理微信服务器发来的消息之前必须先通过公众号配置的服务器验证获取AccessToken,里面的HttpClientUtil类可以从我csdn资源中找/** * 获取AccessToken * @return public String getAccessToken() { log.info("\n获取accessTok
1、引入依赖<!-- 集成通用mapper --> <mapper.version>2.3.4</mapper.version> <dependency> <groupId>com.github.abel533</groupId>