一文囊括Ceph所有利器(含ceph性能分析&资源分析)

一文囊括Ceph所有利器(含ceph性能分析&资源分析)

前言

ceph的工具很多,包括集群管理与运维,还有性能分析等等。

所以本文期望应收尽收所有的工具,也当做自己的一个梳理与总结,当自己需要的时候知道有哪些利器可以使用。

由于水平与能力有限,所以可能还有很多好用的利器是漏网之鱼,期待大家不吝告知。

ceph rados相关工具

通用

ceph auth [ add | caps | del | export | get | get-key | get-or-create | get-or-create-key | import | list | print-key | print_key ] …
ceph compact
ceph config [ dump | ls | help | get | show | show-with-defaults | set | rm | log | reset | assimilate-conf | generate-minimal-conf ] …
ceph config-key [ rm | exists | get | ls | dump | set ] …
ceph daemon <name> | <path> <command> …
ceph daemonperf <name> | <path> [ interval [ count ] ]
ceph df {detail}
ceph fs [ ls | new | reset | rm | authorize ] …
ceph fsid
ceph health {detail}
ceph injectargs <injectedargs> [ <injectedargs>… ]
ceph log <logtext> [ <logtext>… ]
ceph mds [ compat | fail | rm | rmfailed | set_state | stat | repaired ] …
ceph mon [ add | dump | getmap | remove | stat ] …
ceph osd [ blocklist | blocked-by | create | new | deep-scrub | df | down | dump | erasure-code-profile | find | getcrushmap | getmap | getmaxosd | in | ls | lspools | map | metadata | ok-to-stop | out | pause | perf | pg-temp | force-create-pg | primary-affinity | primary-temp | repair | reweight | reweight-by-pg | rm | destroy | purge | safe-to-destroy | scrub | set | setcrushmap | setmaxosd | stat | tree | unpause | unset ] …
ceph osd crush [ add | add-bucket | create-or-move | dump | get-tunable | link | move | remove | rename-bucket | reweight | reweight-all | reweight-subtree | rm | rule | set | set-tunable | show-tunables | tunables | unlink ] …
ceph osd pool [ create | delete | get | get-quota | ls | mksnap | rename | rmsnap | set | set-quota | stats ] …
ceph osd pool application [ disable | enable | get | rm | set ] …
ceph osd tier [ add | add-cache | cache-mode | remove | remove-overlay | set-overlay ] …
ceph pg [ debug | deep-scrub | dump | dump_json | dump_pools_json | dump_stuck | getmap | ls | ls-by-osd | ls-by-pool | ls-by-primary | map | repair | scrub | stat ] …
ceph quorum_status
ceph report { <tags> [ <tags>… ] }
ceph status
ceph sync force {–yes-i-really-mean-it} {–i-know-what-i-am-doing}
ceph tell <name (type.id)> <command> [options…]
ceph version

ceph是一个控制工具,用于手动部署和维护ceph集群。它提供了一组不同的命令,允许部署监视器、osd、放置组、MDS以及集群的整体维护和管理

这里额外说明一个有用的命令 ceph osd dump 指定版本号 , 该命令可以打出指定版本号的osdmap

rados是一个与Ceph对象存储集群(rados)交互的实用程序,rados是Ceph分布式存储系统的一部分


osd daemon指令(含osd性能/资源)

usrname@hostname:~$ sudo ceph daemon osd.148  help
    "bluestore allocator dump block": "dump allocator free regions",
    "bluestore allocator dump bluefs-db": "dump allocator free regions",
    "bluestore allocator score block": "give score on allocator fragmentation (0-no fragmentation, 1-absolute fragmentation)",
    "bluestore allocator score bluefs-db": "give score on allocator fragmentation (0-no fragmentation, 1-absolute fragmentation)",
    "bluestore bluefs available": "Report available space for bluefs. If alloc_size set, make simulation.",
    "calc_objectstore_db_histogram": "Generate key value histogram of kvdb(rocksdb) which used by bluestore",
    "compact": "Commpact object store's omap. WARNING: Compaction probably slows your requests",
    "config diff": "dump diff of current config and default config",
    "config diff get": "dump diff get <field>: dump diff of current and default config setting <field>",
    "config get": "config get <field>: get the config value",
    "config help": "get config setting schema and descriptions",
    "config set": "config set <field> <val> [<val> ...]: set a config variable",
    "config show": "dump current config settings",
    "dump_blacklist": "dump blacklisted clients and times",
    "dump_blocked_ops": "show the blocked ops currently in flight",
    "dump_historic_ops": "show recent ops",
    "dump_historic_ops_by_duration": "show slowest recent ops, sorted by duration",
    "dump_historic_slow_ops": "show slowest recent ops",
    "dump_mempools": "get mempool stats",
    "dump_objectstore_kv_stats": "print statistics of kvdb which used by bluestore",
    "dump_op_pq_state": "dump op priority queue state",
    "dump_ops_in_flight": "show the ops currently in flight",
    "dump_osd_network": "Dump osd heartbeat network ping times",
    "dump_pgstate_history": "show recent state history",
    "dump_reservations": "show recovery reservations",
    "dump_scrubs": "print scheduled scrubs",
    "dump_watchers": "show clients which have active watches, and on which objects",
    "flush_journal": "flush the journal to permanent store",
    "flush_store_cache": "Flush bluestore internal cache",
    "get_command_descriptions": "list available commands",
    "get_heap_property": "get malloc extension heap property",
    "get_latest_osdmap": "force osd to update the latest map from the mon",
    "getomap": "output entire object map",
    "git_version": "get git sha1",
    "heap": "show heap usage info (available only if compiled with tcmalloc)",
    "help": "list available commands",
    "injectdataerr": "inject data error to an object",
    "injectfull": "Inject a full disk (optional count times)",
    "injectmdataerr": "inject metadata error to an object",
    "log dump": "dump recent log entries to log file",
    "log flush": "flush log entries to log file",
    "log reopen": "reopen log file",
    "objecter_requests": "show in-progress osd requests",
    "ops": "show the ops currently in flight",
    "perf dump": "dump perfcounters value",
    "perf histogram dump": "dump perf histogram values",
    "perf histogram schema": "dump perf histogram schema",
    "perf reset": "perf reset <name>: perf reset all or one perfcounter name",
    "perf schema": "dump perfcounters schema",
    "rmomapkey": "remove omap key",
    "set_heap_property": "update malloc extension heap property",
    "set_recovery_delay": "Delay osd recovery by specified seconds",
    "setomapheader": "set omap header",
    "setomapval": "set omap key",
    "status": "high-level status of OSD",
    "trigger_deep_scrub": "Trigger a scheduled deep scrub ",
    "trigger_scrub": "Trigger a scheduled scrub ",
    "truncobj": "truncate object to length",
    "version": "get ceph version"
}


ceph osd df tree

ceph osd perf

ceph balancer

ceph balancer

ceph balancer工具可以在osd中优化pg的位置,以实现平衡分布,可以自动或以手动的方式管理该工具

pg upmap

ceph osd pg-upmap <pgid> <osdname (id|osd.id)> [<osdname (id|osd.id)>...]
ceph osd pg-upmap-items <pgid> <osdname (id|osd.id)> [<osdname (id|osd.id)>...]

该工具允许显式地将特定的pg映射到特定的osd

ceph-volume [-h] [–cluster CLUSTER] [–log-level LOG_LEVEL] [–log-path LOG_PATH]
ceph-volume inventory
ceph-volume lvm [ trigger | create | activate | prepare | zap | list | batch | new-wal | new-db | migrate ]
ceph-volume simple [ trigger | scan | activate ]

ceph-volume 是一个单一用途的命令行工具,用于将逻辑卷部署为osd,在准备、激活和创建osd时试图维护与 ceph -disk 类似的API

ceph-mon -i monid [ –mon-data mondatapath ] // 启动mon进程

这篇文章使用monmaptool以及ceph-mon工具处理了ceph线上的一个疑难问题: 从不健康的集群中删除mon

ceph-osd -i osdnum [ –osd-data datapath ] [ –osd-journal journal ] [ –mkfs ] [ –mkjournal ] [–flush-journal] [–check-allows-journal] [–check-wants-journal] [–check-needs-journal] [ –mkkey ] [ –osdspec-affinity ]

ceph-syn – ceph synthetic workload generator

ceph-syn 是Ceph分布式文件系统的一个简单的合成工作负载生成器。它使用用户空间客户端库针对当前运行的文件系统生成简单的工作负载。文件系统不需要通过 ceph-fuse(8) 或内核客户端挂载

crushdiff [ –osdmap osdmap ] [ –pg-dump pg-dump ] [ –compiled ] [ –verbose ] command crushmap

crushdiff是一个实用工具,可以让你测试crushmap更改的效果:pgs的数量,对象,移动的字节数。这是osdmaptool的包装器,依赖于它的-test-map-pgs-dump选项来获得更改的pgs列表。此外,它使用pg统计来计算移动的对象和字节的数量

几个tools

ceph-kvstore-tool <leveldb|rocksdb|bluestore-kv> <store path> command [args…]

ceph-kvstore-tool 是一个kvstore操作工具。它允许用户离线操作 leveldb/rocksdb 的数据(如OSD的omap)

monmaptool是一个为Ceph分布式存储系统创建、查看和修改监视集群映射的实用工具。监视映射指定Ceph分布式系统中唯一的固定地址。所有其他守护进程都绑定到任意地址,并向监视器注册自己

这篇文章使用monmaptool以及ceph-mon工具处理了ceph线上的一个疑难问题: 从不健康的集群中删除mon

ceph-authtool keyringfile [ -l | –list ] [ -p | –print-key ] [ -C | –create-keyring ] [ -g | –gen-key ] [ –gen-print-key ] [ –import-keyring otherkeyringfile ] [ -n | –name entityname ] [ -a | –add-key base64_key ] [ –cap subsystem capability ] [ –caps capfile ] [ –mode mode ]

ceph-authtool 是一个用于创建、查看和修改Ceph密匙环文件的实用程序。密匙环文件存储一个或多个Ceph身份验证密钥,可能还存储一个相关的能力规范。每个键都与一个实体名相关联,形式为 {client,mon,mds,osd}.name

crushtool ( -d map | -c map.txt | –build –num_osds numosds layer1 … | –test ) [ -o outfile ]

crushtool是一个实用工具,可以让你创建、编译、反编译和测试CRUSH映射文件

osdmaptool mapfilename [–print] [–createsimple numosd [–pgbits bitsperosd ] ] [–clobber]
osdmaptool mapfilename [–import-crush crushmap]
osdmaptool mapfilename [–export-crush crushmap]
osdmaptool mapfilename [–upmap file] [–upmap-max max-optimizations] [–upmap-deviation max-deviation][–upmap-pool poolname] [–save] [–upmap-active]
osdmaptool mapfilename [–upmap-cleanup] [–upmap file]

osdmaptool是一个实用工具,允许你从Ceph分布式存储系统中创建、查看和操作OSD集群映射。值得注意的是,它允许您提取嵌入的CRUSH映射或导入新的CRUSH映射。它还可以模拟升级平衡器模式,这样你就可以了解平衡pg需要什么

  • ceph-object-tool

ceph-objectstore-tool是一个修改OSD状态的工具。它便于操作对象的内容、删除对象、列出omap、操作omap标头、操作omap键、列出对象属性和操作对象属性键

ceph-objectstore-tool –data-path path to osd [–op list ]
Possible object operations:
(get|set)-bytes [file]
set-(attr|omap) [file]
(get|rm)-attr|omap)
get-omaphdr
set-omaphdr [file]
list-attrs
list-omap
remove|removeall
set-size
clear-data-digest
remove-clone-metadata

以下是一个应用实例:
由于bluestore中已经看不到filestore那样的目录结构了,所以文件系统的attr也不可见了,只能通过objectstore-tool把attr属性从db中导出来,然后再通过dencoder解码,才能看到.

1. 确定对象所在osd(ceph osd map <poolname> <objectname>),停止对应osd(可以通过osd set noout防止迁移).
2. 通过ceph-objectstore-tool 列出对象的属性.
[root@node01 cephtools]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0/ --type bluestore rbd_data.81.a8c6bd6b8b4567.0000000000003249 list-attrs
_                 //"_为对象的attr属性"
hinfo_key
snapset
3.把对象属性导出为一个文件.
[root@node01 cephtools]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0/ --type bluestore rbd_data.81.a8c6bd6b8b4567.0000000000003249 get-attr hinfo_key > /home/yg/cephtools/attrhinfo_key.dat
[root@node01 cephtools]# ll
total 8
-rw-r--r-- 1 root root 298 Oct 10 14:39 attr_.dat
-rw-r--r-- 1 root root  18 Oct 10 14:40 attrhinfo_key.dat
4.利用ceph-dencoder解码二进制属性
[root@node01 cephtools]# ceph-dencoder import attr_.dat type object_info_t decode dump_json   //type属性通过ceph-dencoder list_types查看
    "oid": {
        "oid": "rbd_data.81.a8c6bd6b8b4567.0000000000003249",
        "key": "",
        "snapid": -2,
        "hash": 1909457131,
        "max": 0,
        "pool": 83,
        "namespace": ""
    "version": "22364'14561",
    "prior_version": "22364'13751",
    ......

ceph luminous-Bluestore,查看对象信息

ceph-objectstore-tool – modify or examine the state of an OSD

  • ceph-bluestore-tool
ceph-bluestore-tool command [ –dev device … ] [ –path osd path ] [ –out-dir dir ] [ –log-file | -l filename ] [ –deep ]
// 运行一致性检查并尽可能修复错误
ceph-bluestore-tool fsck|repair –path osd path [ –deep ]
// 显示设备标签
ceph-bluestore-tool show-label –dev device
ceph-bluestore-tool prime-osd-dir –dev device –path osd path
// 将BlueFS(即rocksdb文件)的内容导出到输出目录
ceph-bluestore-tool bluefs-export –path osd path –out-dir dir
// 添加WAL设备到BlueFS中,如果WAL设备已经存在则失败
ceph-bluestore-tool bluefs-bdev-new-wal –path osd path –dev-target new-device
// 添加db设备到BlueFS中,如果db设备已经存在则失败
ceph-bluestore-tool bluefs-bdev-new-db –path osd path –dev-target new-device
// 将BlueFS数据从源设备移动到目标设备,成功后将删除源设备(主设备除外)。目标设备可以是已经附加的,也可以是新设备。在后一种情况下,它被添加到OSD中,取代其中一个源设备。应用以下替换规则(按照优先级顺序,在第一次匹配时停止)
ceph-bluestore-tool bluefs-bdev-migrate –path osd path –dev-target new-device –devs-source device1 [–devs-source device2]
ceph-bluestore-tool free-dump|free-score –path osd path [ –allocator block/bluefs-wal/bluefs-db/bluefs-slow ]

使用该工具的一个例子:
Bluestore使用工具分析kvstore里的元数据信息

// kv store prefixes
// bluestore元数据前缀
const string PREFIX_SUPER = "S";   // field -> value
const string PREFIX_STAT = "T";    // field -> value(int64 array)
const string PREFIX_COLL = "C";    // collection name -> cnode_t
const string PREFIX_OBJ = "O";     // object name -> onode_t
const string PREFIX_OMAP = "M";    // u64 + keyname -> value
const string PREFIX_DEFERRED = "L";  // id -> deferred_transaction_t
const string PREFIX_ALLOC = "B";   // u64 offset -> u64 length (freelist)
const string PREFIX_SHARED_BLOB = "X"; // u64 offset -> shared_blob_t
// 通过bluestore导出元数据
// 并通过ceph-kvstore-tool以及ceph-dencoder分析
sudo ceph-bluestore-tool bluefs-export --path /var/lib/ceph/osd/ceph-41 --out-dir /ceph-1
hzwuhongsong@pubt2-ceph1:~$ sudo  ceph-kvstore-tool rocksdb /ceph-1/db/ list|head -100
B	blocks
B	blocks_per_key
B	bytes_per_block
B	size
C	10.5_head
C	14.2_head
C	15.5_head
C   2.6e8_head
O	%7f%80%00%00%00%00%00%00%02%19%d0%8fi%21rbd_data.174b9a6b8b4567.00000000000016d9%21%3d%ff%ff%ff%ff%ff%ff%ff%fe%ff%ff%ff%ff%ff%ff%ff%ffo
O	%7f%80%00%00%00%00%00%00%02%19%d0%8fi%21rbd_data.174b9a6b8b4567.00000000000016d9%21%3d%ff%ff%ff%ff%ff%ff%ff%fe%ff%ff%ff%ff%ff%ff%ff%ffo%00%00%00%00x
O	%7f%80%00%00%00%00%00%00%02%19%d0%8fi%21rbd_data.174b9a6b8b4567.00000000000016d9%21%3d%ff%ff%ff%ff%ff%ff%ff%fe%ff%ff%ff%ff%ff%ff%ff%ffo%00%01%00%00x
O	%7f%80%00%00%00%00%00%00%02%19%d0%8fi%21rbd_data.174b9a6b8b4567.00000000000016d9%21%3d%ff%ff%ff%ff%ff%ff%ff%fe%ff%ff%ff%ff%ff%ff%ff%ffo%00%02%00%00x
O	%7f%80%00%00%00%00%00%00%02%19%d0%8fi%21rbd_data.174b9a6b8b4567.00000000000016d9%21%3d%ff%ff%ff%ff%ff%ff%ff%fe%ff%ff%ff%ff%ff%ff%ff%ffo%00%03%00%00x
hzwuhongsong@pubt2-ceph1:~$ sudo  ceph-kvstore-tool rocksdb /ceph-1/db/  get C  2.6e8_head    out 1.txt
hzwuhongsong@pubt2-ceph1:~$ ceph-dencoder import 1.txt  type bluestore_cnode_t    decode dump_json
    "bits": 11
/// collection metadata
struct bluestore_cnode_t {
  uint32_t bits;   ///< how many bits of coll pgid are significant
  explicit bluestore_cnode_t(int b=0) : bits(b) {}
  DENC(bluestore_cnode_t, v, p) {
    DENC_START(1, 1, p);
    denc(v.bits, p);
    DENC_FINISH(p);
sudo  ceph-kvstore-tool bluestore-kv /var/lib/ceph/osd/ceph-30/ list B > list-B
hzwuhongsong@pubt1-ceph72:~/txt2$ cat list-B
B	blocks
B	blocks_per_key
B	bytes_per_block
B	size
sudo  ceph-kvstore-tool bluestore-kv /var/lib/ceph/osd/ceph-30/ list S > list-S
hzwuhongsong@pubt1-ceph72:~/txt2$ cat list-S
S	blobid_max
S	bluefs_extents
S	freelist_type
S	min_alloc_size
S	min_compat_ondisk_format
S	nid_max
S	ondisk_format

ceph-bluestore-tool 是一个在BlueStore实例上执行低级管理操作的实用程序

rbd相关工具

通用工具

rbd [ -c ceph.conf ] [ -m monaddr ] [–cluster cluster-name] [ -p | –pool pool ] [ command … ]

rbd-fuse [ -p pool ] [-c conffile] mountpoint [ fuse options ]

rbd-fuse 是用于RADOS块设备(rbd)映像的FUSE(“用户空间中的文件系统”)客户端。给定一个包含rbd映像的池,它将挂载一个用户空间文件系统,允许在挂载点将这些映像作为常规文件访问

rbd-fuse不建议在生产环境使用.

rbd-nbd [-c conf] [–read-only] [–device nbd device] [–nbds_max limit] [–max_part limit] [–exclusive] [–notrim] [–encryption-format format] [–encryption-passphrase-file passphrase-file] [–io-timeout seconds] [–reattach-timeout seconds] map image-spec | snap-spec
rbd-nbd unmap nbd device | image-spec | snap-spec
rbd-nbd list-mapped
rbd-nbd attach –device nbd device image-spec | snap-spec
rbd-nbd detach nbd device | image-spec | snap-spec

rbd-nbd是RADOS块设备(rbd)映像的客户端,类似于rbd内核模块。它将rbd映像映射到nbd(网络块设备)设备,允许将其作为常规的本地块设备进行访问.

rbd-replay 是一个用于重放rbd工作负载的实用程序

性能分析工具

  • rbd perf image iostat & rbd perf image iotop

通过调用“rbd perf image iostat”或“rbd perf image iotop”命令,无需任何选项或位置参数,就可以将所有RBD池混合到一个视图中。在v15.2.14中,这样的调用意外地局限于默认池(rbd_default_pool)

ceph.com/en/news/blog/2

  • N版本

新的rbd perf image iotop和rbd perf image iostat命令为所有rbd图像提供一个iotop和iostat类IO监视器

docs.ceph.com/en/quincy

rbd perf image iostat.
NAME                      WR RD WR_BYTES RD_BYTES WR_LAT RD_LAT
ceph/vm-152-disk-0 1/s   0/s   71 KiB/s     0 B/s   13.04 ms   0.00 ns
ceph/vm-136-disk-0 0/s   0/s   819 B/s       0 B/s   919.79 us   0.00 ns

cephfs相关工具

mds 负载情况

ceph daemonperf mds.a
--------------mds---------------- --mds_cache--- ------mds_log------ -mds_mem- ----mds_server----- mds_ -----objecter------ purg
req  rlat fwd  inos caps exi  imi |stry recy recd|subm evts segs repl|ino  dn  |hcr  hcs  hsr  cre |sess|actv rd   wr   rdwr|purg|
  1    0    0  3.3M 3.5M   0    0 |928    0    0 |  1  112k 133    0 |3.3M 3.3M|  1    0    0    0 |516 |  6    0    1    0 |  1
588    1    0  3.3M 3.5M   0    0 |929    0    0 |353  112k 132    0 |3.3M 3.3M|589   32    0    0 |516 |  1   17   25    0 |  0
1.0k   1    0  3.3M 3.5M   0    0 |929    0    0 |365  113k 132    0 |3.3M 3.3M|1.0k  24    0    0 |516 |  0   19   15    0 |  0

daemon指令(含性能/资源)

  • mds
hzwuhongsong@dl-ceph2:~$ sudo ceph daemon mds.dl-ceph2   help
    "cache drop": "drop cache",
// 查看 Ceph MDS 缓存的使用情况
    "cache status": "show cache status",
    "config diff": "dump diff of current config and default config",
    "config diff get": "dump diff get <field>: dump diff of current and default config setting <field>",
    "config get": "config get <field>: get the config value",
    "config help": "get config setting schema and descriptions",
    "config set": "config set <field> <val> [<val> ...]: set a config variable",
    "config show": "dump current config settings",
// 查看文件系统某个目录下是否有脏数据
    "dirfrag ls": "List fragments in directory",
    "dirfrag merge": "De-fragment directory by path",
    "dirfrag split": "Fragment directory by path",
    "dump cache": "dump metadata cache (optionally to a file)",
    "dump loads": "dump metadata loads",
    "dump tree": "dump metadata cache for subtree",
// 各种op    
    "ops": "show the ops currently in flight",
    "dump_blocked_ops": "show the blocked ops currently in flight",
    "dump_historic_ops": "show slowest recent ops",
    "dump_historic_ops_by_duration": "show slowest recent ops, sorted by op duration",
    "dump_ops_in_flight": "show the ops currently in flight",
    "dump_mempools": "get mempool stats",
    "export dir": "migrate a subtree to named MDS",
    "flush journal": "Flush the journal to the backing store",
    "flush_path": "flush an inode (and its dirfrags)",
    "force_readonly": "Force MDS to read-only mode",
    "get subtrees": "Return the subtree map",
    "get_command_descriptions": "list available commands",
    "git_version": "get git sha1",
    "help": "list available commands",
    "log dump": "dump recent log entries to log file",
    "log flush": "flush log entries to log file",
    "log reopen": "reopen log file",
    "objecter_requests": "show in-progress osd requests",
    "osdmap barrier": "Wait until the MDS has this OSD map epoch",
// MDS 的性能指标
    "perf dump": "dump perfcounters value",
    "perf histogram dump": "dump perf histogram values",
    "perf histogram schema": "dump perf histogram schema",
    "perf reset": "perf reset <name>: perf reset all or one perfcounter name",
    "perf schema": "dump perfcounters schema",
    "scrub_path": "scrub an inode and output results",
// 查看 CephFS 的 session 连接
    "session evict": "Evict a CephFS client",
    "session ls": "Enumerate connected CephFS clients",
    "status": "high-level status of MDS",
    "tag path": "Apply scrub tag recursively",
    "version": "get ceph version"
}
  • Client
root@pubt2-k8s-for-iaas1:/var/run/ceph# sudo ceph --admin-daemon=/var/run/ceph/ceph-client.10137.asok help
    "config diff": "dump diff of current config and default config",
    "config diff get": "dump diff get <field>: dump diff of current and default config setting <field>",
    "config get": "config get <field>: get the config value",
    "config help": "get config setting schema and descriptions",
    "config set": "config set <field> <val> [<val> ...]: set a config variable",
    "config show": "dump current config settings",
    "dump_cache": "show in-memory metadata cache contents",
    "dump_mempools": "get mempool stats",
    "get_command_descriptions": "list available commands",
    "git_version": "get git sha1",
    "help": "list available commands",
    "kick_stale_sessions": "kick sessions that were remote reset",
    "log dump": "dump recent log entries to log file",
    "log flush": "flush log entries to log file",
    "log reopen": "reopen log file",
    "mds_requests": "show in-progress mds requests",
    "mds_sessions": "show mds session state",
    "objecter_requests": "show in-progress osd requests",
    "perf dump": "dump perfcounters value",
    "perf histogram dump": "dump perf histogram values",
    "perf histogram schema": "dump perf histogram schema",
    "perf reset": "perf reset <name>: perf reset all or one perfcounter name",
    "perf schema": "dump perfcounters schema",
    "status": "show overall client status",
    "version": "get ceph version"
}
  • 客户端与mds的连接session
root@pubt2-ceph13:/home/hzwuhongsong#  sudo ceph --admin-daemon=/var/run/ceph/ceph-client.admin.1684307.94420974780416.asok mds_sessions
    "id": 284139,
    "inst": {
        "name": {
            "type": "client",
            "num": 284139
        "addr": {
            "nonce": 2584420922,
            "addr": "10.182.30.13:0"
    "inst_str": "client.284139 10.182.30.13:0/2584420922",
    "addr_str": "10.182.30.13:0/2584420922",
    "sessions": [
            "mds": 0,
            "addr": "10.182.30.13:6810/1219491430",
            "seq": 0,
            "cap_gen": 0,
            "cap_ttl": "2019-11-29 17:17:44.358388",
            "last_cap_renew_request": "2019-11-29 17:16:44.358388",
            "cap_renew_seq": 8861,
            "num_caps": 14,
            "state": "open"
    "mdsmap_epoch": 49
}
  • 查看客户端发往mds的请求
ceph daemon /var/run/ceph/ceph-client.${id}.asok mds_requests
  • 查看客户端发往osd的请求:
ceph daemon /var/run/ceph/ceph-client.${id}.asok objecter_requests

性能分析工具

cephfs-top& ceph fs perf stats

  • cephfs-top P版本:

cephfs-top提供了类似于top的工具来实时显示各种Ceph文件系统指标 . It is development preview quality and will have bugs

docs.ceph.com/en/quincy

docs.ceph.com/en/quincy


1、客户端数量,统计FUSE、内核、libcephfs使用者。 2、客户端id; 3、客户端对应cephfs挂载目录; 4、客户端本地目录,IP地址; 5、chit cap的命中率。 6、rlat 读操作总延迟(单位s) 7、wlat 写操作总延迟(单位s)。 8、mlat 元数据操作总延迟(单位s)。 9、dlease dentry lease即客户端dentry可用率。 10、oicaps 该客户端持有caps的数量。 11、oinodes 该客户端打开文件持有inode的数量。

  • cepg perf stat

ceph fs perf stats [<mds_rank>] [<client_id>] [<client_ip>]

不加mds_rank查看到的是整体文件系统监控指标。利用json形式查看

ceph fs perf stats | python3 -m json.tool
    "version": 1,         //stat格式的版本号
    "global_counters": [   //全局的性能统计参数:8个
        "cap_hit",            
        "read_latency",     
        "write_latency",
        "metadata_latency",
        "dentry_lease",
        "opened_files",
        "pinned_icaps",
        "opened_inodes"
    "counters": [],           //每个mds的性能统计参数 
    "client_metadata": {   //客户端的元数据信息(这里的元数据非文件元数据)
        "client.6459": {
            "IP": "10.101.17.11",
            "hostname": "jtfast01",
            "root": "/",
            "mount_point": "/mnt/cephfs",
            "valid_metrics": [
                "cap_hit",
                "read_latency",
                "write_latency",
                "metadata_latency",
                "dentry_lease",
                "opened_files",
                "pinned_icaps",
                "opened_inodes"
    "global_metrics": {  //全局的性能统计情况
        "client.6459": [
                31697
            .....
    "metrics": {        //每个mds的统计情况
        "delayed_ranks": [],
        "mds.0": {
            "client.6459": []
        "mds.1": {
            "client.6459": []
}

其中属于客户端几个主要的参数:

几个延迟是总延迟,除以操作量才是平均延迟。


其他通用

  • ceph fs/mds指令
/*获取mdsmap,继而可以获取到session_timeout以及session_autoclose等信息*/
COMMAND_WITH_FLAG("mds dump "
	"name=epoch,type=CephInt,req=false,range=0", \
	"dump legacy MDS cluster info, optionally from epoch","mds", "r", "cli,rest", FLAG(DEPRECATED))
COMMAND("fs dump "
	"name=epoch,type=CephInt,req=false,range=0", \
	"dump all CephFS status, optionally from epoch", "mds", "r", "cli,rest")
/*session_timeout以及session_autoclose等是通过命令设置的,没有配置参数,默认分别是60和300s,*/
/* 通过ceph mds dump命令可看到这两个参数的当前值*/
COMMAND("fs set " \
	"name=fs_name,type=CephString " \
	"name=var,type=CephChoices,strings=max_mds|max_file_size""|allow_new_snaps|inline_data|cluster_down|allow_multimds|allow_dirfrags|balancer" \"|standby_count_wanted|session_timeout|sesion_autoclose " \
	"name=val,type=CephString "					\
	"name=confirm,type=CephString,req=false",			\
	"set fs parameter <var> to <val>", "mds", "rw", "cli,rest")
COMMAND_WITH_FLAG("mds set_max_mds " \
	"name=maxmds,type=CephInt,range=0", \
	"set max MDS index", "mds", "rw", "cli,rest", FLAG(DEPRECATED))
COMMAND_WITH_FLAG("mds set " \
	"name=var,type=CephChoices,strings=max_mds|max_file_size"
	"|allow_new_snaps|inline_data|allow_multimds|allow_dirfrags " \
	"name=val,type=CephString "					\
	"name=confirm,type=CephString,req=false",			\
	"set mds parameter <var> to <val>", "mds", "rw", "cli,rest", FLAG(DEPRECATED))
COMMAND("mds stat", "show MDS status", "mds", "r", "cli,rest")	
COMMAND_WITH_FLAG("mds getmap " \
	"name=epoch,type=CephInt,req=false,range=0", \
	"get MDS map, optionally from epoch", "mds", "r", "cli,rest", FLAG(DEPRECATED))
COMMAND("mds metadata name=who,type=CephString,req=false",
	"fetch metadata for mds <who>",
	"mds", "r", "cli,rest")
COMMAND("mds count-metadata name=property,type=CephString",
	"count MDSs by metadata field property",
	"mds", "r", "cli,rest")
COMMAND("mds versions",
	"check running versions of MDSs",
	"mds", "r", "cli,rest")
COMMAND_WITH_FLAG("mds tell " \
	"name=who,type=CephString " \
	"name=args,type=CephString,n=N", \
	"send command to particular mds", "mds", "rw", "cli,rest", FLAG(OBSOLETE))
COMMAND("mds compat show", "show mds compatibility settings", \
	"mds", "r", "cli,rest")
COMMAND_WITH_FLAG("mds stop name=who,type=CephString", "stop mds", \
	"mds", "rw", "cli,rest", FLAG(DEPRECATED))
COMMAND("mds deactivate name=who,type=CephString",
        "clean up specified MDS rank (use with `set max_mds` to shrink cluster)", \
	"mds", "rw", "cli,rest")
COMMAND("mds set_state " \
	"name=gid,type=CephInt,range=0 " \
	"name=state,type=CephInt,range=0|20", \
	"set mds state of <gid> to <numeric-state>", "mds", "rw", "cli,rest")
COMMAND("mds fail name=who,type=CephString", \
	"Mark MDS failed: trigger a failover if a standby is available",
        "mds", "rw", "cli,rest")
COMMAND("mds repaired name=rank,type=CephString", \
	"mark a damaged MDS rank as no longer damaged", "mds", "rw", "cli,rest")
COMMAND("mds rm " \
	"name=gid,type=CephInt,range=0", \
	"remove nonactive mds", "mds", "rw", "cli,rest")
COMMAND("mds rmfailed name=who,type=CephString name=confirm,type=CephString,req=false", \
	"remove failed mds", "mds", "rw", "cli,rest")
COMMAND_WITH_FLAG("mds cluster_down", "take MDS cluster down", "mds", "rw", "cli,rest", FLAG(DEPRECATED))
COMMAND_WITH_FLAG("mds cluster_up", "bring MDS cluster up", "mds", "rw", "cli,rest", FLAG(DEPRECATED))
COMMAND("mds compat rm_compat " \
	"name=feature,type=CephInt,range=0", \
	"remove compatible feature", "mds", "rw", "cli,rest")
COMMAND("mds compat rm_incompat " \
	"name=feature,type=CephInt,range=0", \
	"remove incompatible feature", "mds", "rw", "cli,rest")
COMMAND_WITH_FLAG("mds add_data_pool " \
	"name=pool,type=CephString", \
	"add data pool <pool>", "mds", "rw", "cli,rest", FLAG(DEPRECATED))
COMMAND_WITH_FLAG("mds remove_data_pool " \
	"name=pool,type=CephString", \
	"remove data pool <pool>", "mds", "rw", "cli,rest", FLAG(DEPRECATED))
COMMAND_WITH_FLAG("mds rm_data_pool " \
	"name=pool,type=CephString", \
	"remove data pool <pool>", "mds", "rw", "cli,rest", FLAG(DEPRECATED))
COMMAND_WITH_FLAG("mds newfs " \
	"name=metadata,type=CephInt,range=0 " \
	"name=data,type=CephInt,range=0 " \
	"name=sure,type=CephChoices,strings=--yes-i-really-mean-it,req=false", \
	"make new filesystem using pools <metadata> and <data>", \
	"mds", "rw", "cli,rest", FLAG(OBSOLETE))
COMMAND("fs new " \
	"name=fs_name,type=CephString " \
	"name=metadata,type=CephString " \
	"name=data,type=CephString " \
	"name=force,type=CephChoices,strings=--force,req=false " \
	"name=sure,type=CephChoices,strings=--allow-dangerous-metadata-overlay,req=false", \
	"make new filesystem using named pools <metadata> and <data>", \
	"fs", "rw", "cli,rest")
COMMAND("fs rm " \
	"name=fs_name,type=CephString " \
	"name=sure,type=CephChoices,strings=--yes-i-really-mean-it,req=false", \
	"disable the named filesystem", \
	"fs", "rw", "cli,rest")
COMMAND("fs reset " \
	"name=fs_name,type=CephString " \
	"name=sure,type=CephChoices,strings=--yes-i-really-mean-it,req=false", \
	"disaster recovery only: reset to a single-MDS map", \
	"fs", "rw", "cli,rest")
COMMAND("fs ls ", \
	"list filesystems", \
	"fs", "r", "cli,rest")
COMMAND("fs get name=fs_name,type=CephString", \
	"get info about one filesystem", \
	"fs", "r", "cli,rest")
COMMAND("fs flag set name=flag_name,type=CephChoices,strings=enable_multiple "
        "name=val,type=CephString " \
	"name=confirm,type=CephChoices,strings=--yes-i-really-mean-it,req=false", \
	"Set a global CephFS flag", \
	"fs", "rw", "cli,rest")
COMMAND("fs add_data_pool name=fs_name,type=CephString " \
	"name=pool,type=CephString", \
	"add data pool <pool>", "mds", "rw", "cli,rest")
COMMAND("fs rm_data_pool name=fs_name,type=CephString " \
	"name=pool,type=CephString", \
	"remove data pool <pool>", "mds", "rw", "cli,rest")
COMMAND_WITH_FLAG("fs set_default name=fs_name,type=CephString",	\
		  "set the default to the named filesystem",		\
		  "fs", "rw", "cli,rest",				\
		  FLAG(DEPRECATED))
COMMAND("fs set-default name=fs_name,type=CephString", \
	"set the default to the named filesystem", \
	"fs", "rw", "cli,rest")
hzwuhongsong@music-data-k8s-0:~$ sudo ceph fs status
ceph_fs - 161 clients
=======
+------+--------+------------------+---------------+-------+-------+
| Rank | State  |       MDS        |    Activity   |  dns  |  inos |
+------+--------+------------------+---------------+-------+-------+
|  0   | active | music-data-k8s-0 | Reqs:    0 /s |  241k |  241k |
+------+--------+------------------+---------------+-------+-------+
+-------------+----------+-------+-------+
|     Pool    |   type   |  used | avail |
+-------------+----------+-------+-------+
| cephfs_meta | metadata |  210M |  706G |
| cephfs_data |   data   | 2561G |  118T |
+-------------+----------+-------+-------+
+------------------+
|   Standby MDS    |
+------------------+
| music-data-k8s-2 |
| music-data-k8s-1 |
+------------------+
MDS version: ceph version 12.2.12+netease+1.0 (4a72ccba99ce63500c90e875d211ad04e8ec15a9) luminous (stable)
  • 查看会话
ceph tell mds.0 session ls
  • 查看或者取消客户端
ceph tell mds.0 client ls
ceph tell mds.0 client evict id=25085