ceph FS 即 ceph filesystem,可以实现文件系统共享功能,客户端通过 ceph 协议挂载并使用ceph 集群作为数据存储服务器。
Ceph FS 需要运行 Meta Data Services(MDS)服务,其守护进程为 ceph-mds,ceph-mds 进程管理与 cephFS 上存储的文件相关的元数据,并协调对 ceph 存储集群的访问。
![图片[1]-CephFS 使用-李佳程的个人主页](http://www.lijiach.com/wp-content/uploads/2023/01/image-172.png)
1、部署 MDS 服务
如果第一次使用 cephFS,需要部署 cephfs 服务。
[root@mgr01 ~]# yum install -y ceph-mds
[ceph@deploy ceph-cluster]$ ceph-deploy mds create mgr01
2、创建 CephFS metadata 和 data 存储池
使用 CephFS 之前需要事先于集群中创建一个文件系统,并为其分别指定元数据和数据相关的存储池。下面创建一个名为 cephfs 的文件系统用于测试,它使用 cephfs-metadata 为元数据存储池,使用 cephfs-data 为数据存储池:
# 保存 metadata 的 pool
[ceph@deploy ceph-cluster]$ ceph osd pool create cephfs-metadata 32 32
# 保存数据的 pool
[ceph@deploy ceph-cluster]$ ceph osd pool create cephfs-data 64 64
# 当前 ceph 状态
[ceph@deploy ceph-cluster]$ ceph -s
cluster:
id: 845224fe-1461-48a4-884b-99b7b6327ae9
health: HEALTH_WARN
application not enabled on 1 pool(s)
1 pools have pg_num > pgp_num
services:
mon: 3 daemons, quorum mon01,mon02,mon03
mgr: mgr01(active), standbys: mgr02
mds: mycephfs-1/1/1 up {0=mgr01=up:active}
osd: 15 osds: 15 up, 15 in
data:
pools: 9 pools, 288 pgs
objects: 349 objects, 439 MiB
usage: 16 GiB used, 1.4 TiB / 1.5 TiB avail
pgs: 288 active+clean
3、创建 cephFS 并验证
[ceph@deploy ceph-cluster]$ ceph fs new mycephfs cephfs-metadata cephfs-data
[ceph@deploy ceph-cluster]$ ceph fs ls
name: mycephfs, metadata pool: cephfs-metadata, data pools: [cephfs-data ]
[ceph@deploy ceph-cluster]$ ceph fs status mycephfs
mycephfs - 0 clients
========
+------+--------+-------+---------------+-------+-------+
| Rank | State | MDS | Activity | dns | inos |
+------+--------+-------+---------------+-------+-------+
| 0 | active | mgr01 | Reqs: 0 /s | 11 | 14 |
+------+--------+-------+---------------+-------+-------+
+-----------------+----------+-------+-------+
| Pool | type | used | avail |
+-----------------+----------+-------+-------+
| cephfs-metadata | metadata | 7690 | 469G |
| cephfs-data | data | 1004k | 469G |
+-----------------+----------+-------+-------+
+-------------+
| Standby MDS |
+-------------+
+-------------+
[ceph@deploy ceph-cluster]$ ceph mds stat
mycephfs-1/1/1 up {0=mgr01=up:active}
4、创建客户端账户
[ceph@deploy ceph-cluster]$ ceph auth add client.user2 mon 'allow r' mds 'allow rw' osd 'allow rwx pool=cephfs-data'
added key for client.user2
[ceph@deploy ceph-cluster]$ ceph auth get client.user2
exported keyring for client.user2
[client.user2]
key = AQBQy7tjM1EfHxAADAtJMqtlRVUKeTPtyi6Vmw==
caps mds = "allow rw"
caps mon = "allow r"
caps osd = "allow rwx pool=cephfs-data"
[ceph@deploy ceph-cluster]$ ceph auth get client.user2 -o ceph.client.user2.keyring
exported keyring for client.user2
[ceph@deploy ceph-cluster]$ ceph auth print-key client.user2 > user2.key
[ceph@deploy ceph-cluster]$ cat ceph.client.user2.keyring
[client.user2]
key = AQBQy7tjM1EfHxAADAtJMqtlRVUKeTPtyi6Vmw==
caps mds = "allow rw"
caps mon = "allow r"
caps osd = "allow rwx pool=cephfs-data"
5、安装 ceph 客户端
[root@ceph-client3 ~]# yum install -y https://mirrors.tuna.tsinghua.edu.cn/ceph/rpm-mimic/el7/noarch/ceph-release-1-1.el7.noarch.rpm
[root@ceph-client3 ~]# yum install epel-release -y
[root@ceph-client3 ~]# yum install ceph-common -y
6、客户端验证权限
[ceph@deploy ceph-cluster]$ scp ceph.conf ceph.client.user2.keyring root@192.168.1.13:/etc/ceph/
[root@ceph-client3 ~]# ceph --user user2 -s
cluster:
id: 845224fe-1461-48a4-884b-99b7b6327ae9
health: HEALTH_WARN
application not enabled on 1 pool(s)
1 pools have pg_num > pgp_num
services:
mon: 3 daemons, quorum mon01,mon02,mon03
mgr: mgr01(active), standbys: mgr02
mds: mycephfs-1/1/1 up {0=mgr01=up:active}
osd: 15 osds: 15 up, 15 in
data:
pools: 9 pools, 288 pgs
objects: 349 objects, 439 MiB
usage: 16 GiB used, 1.4 TiB / 1.5 TiB avail
pgs: 288 active+clean
7、内核空间挂载 ceph-fs
客户端挂载有两种方式,一是内核空间一是用户空间,内核空间挂载需要内核支持 ceph 模块,用户空间挂载需要安装 ceph-fuse
客户端通过 key 文件挂载:
[root@ceph-client3 ~]# mkdir /data
[root@ceph-client3 ~]# mount -t ceph 192.168.1.114:6789,192.168.1.115:6789,192.168.1.116:6789:/ /data -o name=user2,secretfile=/etc/ceph/user2.key
[root@ceph-client3 ~]# df -TH
Filesystem Type Size Used Avail Use% Mounted on
devtmpfs devtmpfs 2.0G 0 2.0G 0% /dev
tmpfs tmpfs 2.0G 0 2.0G 0% /dev/shm
tmpfs tmpfs 2.0G 13M 2.0G 1% /run
tmpfs tmpfs 2.0G 0 2.0G 0% /sys/fs/cgroup
/dev/mapper/centos-root xfs 19G 2.4G 16G 13% /
/dev/sda1 xfs 1.1G 158M 906M 15% /boot
tmpfs tmpfs 396M 0 396M 0% /run/user/0
192.168.1.114:6789,192.168.1.115:6789,192.168.1.116:6789:/ ceph 504G 0 504G 0% /data
[root@ceph-client3 ~]# cp /etc/issue /data/
[root@ceph-client3 ~]# dd if=/dev/zero of=/data/testfile bs=1M count=200
200+0 records in
200+0 records out
209715200 bytes (210 MB) copied, 0.408162 s, 514 MB/s
客户端通过 key 挂载:
[root@ceph-client3 ~]# tail /etc/ceph/user2.key
AQBQy7tjM1EfHxAADAtJMqtlRVUKeTPtyi6Vmw==
[root@ceph-client3 ~]# umount /data
[root@ceph-client3 ~]# mount -t ceph 192.168.1.114:6789,192.168.1.115:6789,192.168.1.116:6789:/ /data -o name=user2,secret=AQBQy7tjM1EfHxAADAtJMqtlRVUKeTPtyi6Vmw==
[root@ceph-client3 ~]# df -TH
Filesystem Type Size Used Avail Use% Mounted on
devtmpfs devtmpfs 2.0G 0 2.0G 0% /dev
tmpfs tmpfs 2.0G 0 2.0G 0% /dev/shm
tmpfs tmpfs 2.0G 13M 2.0G 1% /run
tmpfs tmpfs 2.0G 0 2.0G 0% /sys/fs/cgroup
/dev/mapper/centos-root xfs 19G 2.4G 16G 13% /
/dev/sda1 xfs 1.1G 158M 906M 15% /boot
tmpfs tmpfs 396M 0 396M 0% /run/user/0
192.168.1.114:6789,192.168.1.115:6789,192.168.1.116:6789:/ ceph 504G 210M 504G 1% /data
[root@ceph-client3 ~]# cp /etc/yum.repos.d/local.repo /data/
[root@ceph-client3 ~]# stat -f /data/
File: "/data/"
ID: 49bd190c4d3253a2 Namelen: 255 Type: ceph
Block size: 4194304 Fundamental block size: 4194304
Blocks: Total: 120110 Free: 120060 Available: 120060
Inodes: Total: 53 Free: -1
开机挂载:
[root@ceph-client3 ~]# cat /etc/fstab
192.168.1.114:6789,192.168.1.115:6789,192.168.1.116:6789:/ /data ceph defaults,name=user2,secretfile=/etc/ceph/user2.key,_netdev 0 0
[root@ceph-client3 ~]# umount /data
[root@ceph-client3 ~]# mount -a
[root@ceph-client3 ~]# df -TH
Filesystem Type Size Used Avail Use% Mounted on
devtmpfs devtmpfs 2.0G 0 2.0G 0% /dev
tmpfs tmpfs 2.0G 0 2.0G 0% /dev/shm
tmpfs tmpfs 2.0G 13M 2.0G 1% /run
tmpfs tmpfs 2.0G 0 2.0G 0% /sys/fs/cgroup
/dev/mapper/centos-root xfs 19G 2.4G 16G 13% /
/dev/sda1 xfs 1.1G 158M 906M 15% /boot
tmpfs tmpfs 396M 0 396M 0% /run/user/0
192.168.1.114:6789,192.168.1.115:6789,192.168.1.116:6789:/ ceph 504G 210M 504G 1% /data
客户端模块:
客户端内核加载 ceph.ko 模块挂载 cephfs 文件系统
[root@ceph-client3 ~]# lsmod | grep ceph
ceph 363016 1
libceph 306750 1 ceph
dns_resolver 13140 1 libceph
libcrc32c 12644 2 xfs,libceph
[root@ceph-client3 ~]# modinfo ceph
filename: /lib/modules/3.10.0-1160.el7.x86_64/kernel/fs/ceph/ceph.ko.xz
license: GPL
description: Ceph filesystem for Linux
author: Patience Warnick <patience@newdream.net>
author: Yehuda Sadeh <yehuda@hq.newdream.net>
author: Sage Weil <sage@newdream.net>
alias: fs-ceph
retpoline: Y
rhelversion: 7.9
srcversion: EB765DDC1F7F8219F09D34C
depends: libceph
intree: Y
vermagic: 3.10.0-1160.el7.x86_64 SMP mod_unload modversions
signer: CentOS Linux kernel signing key
sig_key: E1:FD:B0:E2:A7:E8:61:A1:D1:CA:80:A2:3D:CF:0D:BA:3A:A4:AD:F5
sig_hashalgo: sha256
8、用户空间挂载 ceph-fs
如果内核本较低而没有 ceph 模块,那么可以安装 ceph-fuse 挂载,但是推荐使用内核模块挂载。
安装 ceph-fuse:
[root@ceph-client ~]# yum install -y https://mirrors.tuna.tsinghua.edu.cn/ceph/rpm-mimic/el7/noarch/ceph-release-1-1.el7.noarch.rpm
[root@ceph-client ~]# yum install -y epel-release
[root@ceph-client ~]# yum install -y ceph-fuse ceph-common
ceph-fuse 挂载:
[ceph@deploy ceph-cluster]$ scp ceph.conf ceph.client.user2.keyring user2.key root@192.168.1.11:/etc/ceph/
[root@ceph-client ~]# mkdir /data
[root@ceph-client ~]# ceph-fuse --name client.user2 -m 192.168.1.114:6789,192.168.1.115:6789,192.168.1.116:6789 /data
ceph-fuse[1656]: starting ceph client2023-01-09 17:19:59.000 7fae961a5c00 -1 init, newargv = 0x55a3951f8300 newargc=7
ceph-fuse[1656]: starting fuse
[root@ceph-client ~]# df -TH
Filesystem Type Size Used Avail Use% Mounted on
devtmpfs devtmpfs 2.0G 0 2.0G 0% /dev
tmpfs tmpfs 2.0G 0 2.0G 0% /dev/shm
tmpfs tmpfs 2.0G 13M 2.0G 1% /run
tmpfs tmpfs 2.0G 0 2.0G 0% /sys/fs/cgroup
/dev/mapper/centos-root xfs 19G 2.4G 16G 13% /
/dev/sda1 xfs 1.1G 158M 906M 15% /boot
tmpfs tmpfs 396M 0 396M 0% /run/user/0
ceph-fuse fuse.ceph-fuse 504G 210M 504G 1% /data
[root@ceph-client ~]# dd if=/dev/zero of=/data/ceph-fuse-data bs=1M count=200
200+0 records in
200+0 records out
209715200 bytes (210 MB) copied, 2.89092 s, 72.5 MB/s
开机挂载,指定用户会自动根据用户名称加载授权文件及配置文件 ceph.conf
[root@ceph-client ~]# vim /etc/fstab
none /data fuse.ceph ceph.id=user2,ceph.conf=/etc/ceph/ceph.conf,_netdev,defaults 0 0
[root@ceph-client ~]# umount /data
[root@ceph-client ~]# mount -a
ceph-fuse[1783]: starting ceph client
2023-01-09 17:22:31.904 7f6657b9ec00 -1 init, newargv = 0x5579ffd81960 newargc=9
ceph-fuse[1783]: starting fuse
[root@ceph-client ~]# df -TH
Filesystem Type Size Used Avail Use% Mounted on
devtmpfs devtmpfs 2.0G 0 2.0G 0% /dev
tmpfs tmpfs 2.0G 0 2.0G 0% /dev/shm
tmpfs tmpfs 2.0G 13M 2.0G 1% /run
tmpfs tmpfs 2.0G 0 2.0G 0% /sys/fs/cgroup
/dev/mapper/centos-root xfs 19G 2.4G 16G 13% /
/dev/sda1 xfs 1.1G 158M 906M 15% /boot
tmpfs tmpfs 396M 0 396M 0% /run/user/0
ceph-fuse fuse.ceph-fuse 504G 420M 504G 1% /data
9、ceph mds 高可用
Ceph mds(etadata service)作为 ceph 的访问入口,需要实现高性能及数据备份,假设启动 4个 MDS 进程,设置 2 个 Rank。这时候有 2 个 MDS 进程会分配给两个 Rank,还剩下 2 个 MDS进程分别作为另外个的备份。
设置每个 Rank 的备份 MDS,也就是如果此 Rank 当前的 MDS 出现问题马上切换到另个 MDS。
设置备份的方法有很多,常用选项如下。
mds_standby_replay:值为 true 或 false,true 表示开启 replay 模式,这种模式下主 MDS 内的数量将实时与从 MDS 同步,如果主宕机,从可以快速的切换。如果为 false 只有宕机的时
候才去同步数据,这样会有一段时间的中断。
mds_standby_for_name:设置当前 MDS 进程只用于备份于指定名称的 MDS。
mds_standby_for_rank:设置当前 MDS 进程只用于备份于哪个 Rank,通常为 Rank 编号。另
外在存在之个 CephFS 文件系统中,还可以使用 mds_standby_for_fscid 参数来为指定不同的
文件系统。
mds_standby_for_fscid:指定 CephFS 文件系统 ID,需要联合 mds_standby_for_rank 生效,如果设置 mds_standby_for_rank,那么就是用于指定文件系统的指定 Rank,如果没有设置,就是指定文件系统的所有 Rank
当前 mds 服务器状态
[ceph@deploy ceph-cluster]$ ceph mds stat
mycephfs-1/1/1 up {0=mgr01=up:active}
添加 MDS 服务器
将 ceph-mgr2 和 ceph-mon2 和 ceph-mon3 作为 mds 服务角色添加至 ceph 集群,最后实两主两备的 mds 高可用和高性能结构。
# mds 服务器安装 ceph-mds 服务
[root@mgr02 ~]# yum install -y ceph-mds
[root@mon02 ~]# yum install -y ceph-mds
[root@mon03 ~]# yum install -y ceph-mds
# 添加 mds 服务器
[ceph@deploy ceph-cluster]$ ceph-deploy mds create mgr02
[ceph@deploy ceph-cluster]$ ceph-deploy mds create mon02
# mds 服务器安装 ceph-mds 服务
[root@mgr02 ~]# yum install -y ceph-mds
[root@mon02 ~]# yum install -y ceph-mds
[root@mon03 ~]# yum install -y ceph-mds
# 添加 mds 服务器
[ceph@deploy ceph-cluster]$ ceph-deploy mds create mgr02
[ceph@deploy ceph-cluster]$ ceph-deploy mds create mon02
[ceph@deploy ceph-cluster]$ ceph-deploy mds create mon03
# 验证 mds 服务器当前状态
[ceph@deploy ceph-cluster]$ ceph mds stat
mycephfs-1/1/1 up {0=mgr01=up:active}, 3 up:standby
验证 ceph 集群当前状态:
当前处于激活状态的 mds 服务器有一台,处于备份状态的 mds 服务器有三台。
[ceph@deploy ceph-cluster]$ ceph fs status
mycephfs - 2 clients
========
+------+--------+-------+---------------+-------+-------+
| Rank | State | MDS | Activity | dns | inos |
+------+--------+-------+---------------+-------+-------+
| 0 | active | mgr01 | Reqs: 0 /s | 15 | 18 |
+------+--------+-------+---------------+-------+-------+
+-----------------+----------+-------+-------+
| Pool | type | used | avail |
+-----------------+----------+-------+-------+
| cephfs-metadata | metadata | 55.1k | 468G |
| cephfs-data | data | 400M | 468G |
+-----------------+----------+-------+-------+
+-------------+
| Standby MDS |
+-------------+
| mgr02 |
| mon02 |
| mon03 |
+-------------+
当前的文件系统状态:
[ceph@deploy ceph-cluster]$ ceph fs get mycephfs
Filesystem 'mycephfs' (1)
fs_name mycephfs
epoch 4
flags 12
created 2023-01-06 15:42:24.691707
modified 2023-01-06 15:42:25.694040
tableserver 0
root 0
session_timeout 60
session_autoclose 300
max_file_size 1099511627776
min_compat_client -1 (unspecified)
last_failure 0
last_failure_osd_epoch 0
compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no anchor table,9=file layout v2,10=snaprealm v2}
max_mds 1
in 0
up {0=4485}
failed
damaged
stopped
data_pools [8]
metadata_pool 7
inline_data disabled
balancer
standby_count_wanted 1
4485: 192.168.1.117:6801/435597672 'mgr01' mds.0.3 up:active seq 57
设置处于激活状态 mds 的数量:
目前有四个 mds 服务器,但是有一个主三个备,可以优化一下部署架构,设置为为两主两
备。
#设置同时活跃的主 mds 最大值为 2。
[ceph@deploy ceph-cluster]$ ceph fs set mycephfs max_mds 2
[ceph@deploy ceph-cluster]$ ceph fs status
mycephfs - 2 clients
========
+------+--------+-------+---------------+-------+-------+
| Rank | State | MDS | Activity | dns | inos |
+------+--------+-------+---------------+-------+-------+
| 0 | active | mgr01 | Reqs: 0 /s | 15 | 18 |
| 1 | active | mon03 | Reqs: 0 /s | 0 | 0 |
+------+--------+-------+---------------+-------+-------+
+-----------------+----------+-------+-------+
| Pool | type | used | avail |
+-----------------+----------+-------+-------+
| cephfs-metadata | metadata | 55.1k | 468G |
| cephfs-data | data | 400M | 468G |
+-----------------+----------+-------+-------+
+-------------+
| Standby MDS |
+-------------+
| mgr02 |
| mon02 |
+-------------+
MDS 高可用优化:
目前的状态是 ceph-mgr1 和 ceph-mon2 分别是 active 状态,ceph-mon3 和 ceph-mgr2 分别处于 standby 状态,现在可以将 ceph-mgr2 设置为 ceph-mgr1 的 standby,将 ceph-mon3 设置为 ceph-mon2 的 standby,以实现每个主都有一个固定备份角色的结构,则修改配置文件如下:
[ceph@deploy ceph-cluster]$ vim ceph.conf
[global]
fsid = 23b0f9f2-8db3-477f-99a7-35a90eaf3dab
public_network = 172.31.0.0/21
cluster_network = 192.168.0.0/21
mon_initial_members = ceph-mon1
mon_host = 172.31.6.104
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
mon clock drift allowed = 2
mon clock drift warn backoff = 30
[mds.ceph-mgr2]
#mds_standby_for_fscid = mycephfs
mds_standby_for_name = ceph-mgr1
mds_standby_replay = true
[mds.ceph-mon2]
mds_standby_for_name = ceph-mon3
mds_standby_replay = true
分发配置文件并重启 mds 服务:
# 分发配置文件保证各 mds 服务重启有效
[ceph@deploy ceph-cluster]$ ceph-deploy --overwrite-conf config push mgr01
[ceph@deploy ceph-cluster]$ ceph-deploy --overwrite-conf config push mgr02
[ceph@deploy ceph-cluster]$ ceph-deploy --overwrite-conf config push mon02
[ceph@deploy ceph-cluster]$ ceph-deploy --overwrite-conf config push mon03
[root@mon02 ~]# systemctl restart ceph-mds@mon02.service
[root@mon02 ~]# systemctl restart ceph-mds@mon03.service
[root@mon02 ~]# systemctl restart ceph-mds@mgr01.service
[root@mon02 ~]# systemctl restart ceph-mds@mgr02.service
ceph 集群 mds 高可用状态:
[ceph@deploy ceph-cluster]$ ceph fs status
mycephfs - 2 clients
========
+------+--------+-------+---------------+-------+-------+
| Rank | State | MDS | Activity | dns | inos |
+------+--------+-------+---------------+-------+-------+
| 0 | active | mgr02 | Reqs: 0 /s | 15 | 18 |
| 1 | active | mon02 | Reqs: 0 /s | 10 | 13 |
+------+--------+-------+---------------+-------+-------+
+-----------------+----------+-------+-------+
| Pool | type | used | avail |
+-----------------+----------+-------+-------+
| cephfs-metadata | metadata | 57.6k | 468G |
| cephfs-data | data | 400M | 468G |
+-----------------+----------+-------+-------+
+-------------+
| Standby MDS |
+-------------+
| mon03 |
| mgr01 |
+-------------+
# 查看 active 和 standby 对应关系
[ceph@deploy ceph-cluster]$ ceph fs get mycephfs
Filesystem 'mycephfs' (1)
fs_name mycephfs
epoch 29
flags 12
created 2023-01-06 15:42:24.691707
modified 2023-01-09 17:37:58.143140
tableserver 0
root 0
session_timeout 60
session_autoclose 300
max_file_size 1099511627776
min_compat_client -1 (unspecified)
last_failure 0
last_failure_osd_epoch 132
compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no anchor table,9=file layout v2,10=snaprealm v2}
max_mds 2
in 0,1
up {0=6294,1=6282}
failed
damaged
stopped
data_pools [8]
metadata_pool 7
inline_data disabled
balancer
standby_count_wanted 1
6294: 192.168.1.118:6800/2455480909 'mgr02' mds.0.25 up:active seq 6 (standby for rank 0 'mgr01')
6282: 192.168.1.115:6800/899553899 'mon02' mds.1.16 up:active seq 8 (standby for rank 1 'mon03')