CEPH
To follow the cluster log, use the following command
ceph -w
ceph log last [n] to see the most recent n lines from the cluster log
Running the health command gives us an OK, a warning or error on the overall cluster health status
$ sudo ceph
ceph> health
More detailed information can be retrieved with ceph status that will give us a few lines about the monitor, storage nodes and placement groups:
ceph> status
ceph> quorum_status
ceph> mon_status
To know what the OSD nodes are doing, we have a few different options:
ceph> osd stat
ceph> osd tree
Monitor nodes:
ceph> mon stat
Monitoring the cluster capacity and data distribution across the different pools we can use the df command:
ceph> df
#Or by OSD node:
ceph> osd df
We cannot miss looking at the I/O metrics, latency and reads/writes both in ops per second and bandwidth using osd perf:
ceph> osd perf
Be on time On distributed systems clock skews are often an unexpected source of issues. Ceph monitors will allow clocks to drift up to 0.05 seconds. Set up some NTP monitoring and keep an eye on NTP offset metric.
ceph osd crush tree
Créer un nouveau pool dans CEPH
ceph --cluster cluster01 osd pool create Lab-Pool 1024
ceph --cluster cluster01 osd pool create Template-Pool 1024
Ajouter dans Proxmox : Datacenter -> Storage et "Add" de type "RBD" Exemple :
ID: Lab-Pool
Pool: Lab-Pool
Monitor(s): 10.1.1.1 10.1.1.2 10.1.1.3
User name: admin
Nodes: All (No restrictions)
Enable: yes
Content: Disk image
KRBD: no
rbd – manage rados block device (RBD) images
Synopsis
rbd [ -c ceph.conf ] [ -m monaddr ] [–cluster cluster name] [ -p | –pool pool ] [ command … ]
Description
rbd is a utility for manipulating rados block device (RBD) images, used by the Linux rbd driver and the rbd storage driver for Qemu/KVM. RBD images are simple block devices that are striped over objects and stored in a RADOS object store. The size of the objects the image is striped over must be a power of two.
Exporter une image dans Ceph vers un fichier temporaire
rbd export ecpool/vm-7011-disk-1 /var/mnt/cephfs/temp_migration/vm-7011-disk-1-e
Importer le fichier exporté dans un pool Ceph
rbd import --image-format 2 /var/mnt/cephfs/temp_migration/vm-7011-disk-1-e Prod-Pool/vm-7011-disk-1
Renommer une image
rbd mv ecpool/vm-7011-disk-1 ecpool/vm-7011-disk-1-delme
Effacer/supprimer une image
rbd rm ecpool/vm-7011-disk-1-delme
Supprimer un snapshot
rbd snap rm ecpool/vm-7011-disk-1-delme
ou s'il y a un ou plusieurs snapshots d'une image
rbd snap purge ecpool/vm-7011-disk-1-delme
Effacer le fichier temporaire
rm /var/mnt/cephfs/temp_migration/vm-7011-disk-1-e
Lister le contenu d'un pool
rbd --cluster cluster01 -p Prod-Pool list -l
rbd --cluster cluster01 -p ecpool list -l
Pour lister avec Proxmox
pvesm list Prod-Pool
pvesm list ecpool
Pour trouver s'il y a un lock et sur quel PMX
find /etc/pve/nodes/**/qemu-server/ -name "*.tmp.*"
Lister les versions de toutes les composantes
pveversion -v
Exercise: Locate an Object
As an exercise, lets create an object. Specify an object name, a path to a test file containing some object data and a pool name using the rados put command on the command line. For example:
rados put {object-name} {file-path} --pool=data
rados put test-object-1 testfile.txt --pool=data
To verify that the Ceph Object Store stored the object, execute the following:
rados -p data ls
Now, identify the object location:
ceph osd map {pool-name} {object-name}
ceph osd map data test-object-1
Ceph should output the object’s location. For example:
osdmap e537 pool 'data' (0) object 'test-object-1' -> pg 0.d1743484 (0.4) -> up [1,0] acting [1,0]
To remove the test object, simply delete it using the rados rm command. For example:
rados rm test-object-1 --pool=data
OSD
root@OSD-01:/var/log# ceph --cluster cluster01 daemon osd.0 config show
{
"name": "osd.0",
"cluster": "cluster01",
"debug_none": "0\/5",
"debug_lockdep": "0\/1",
"debug_context": "0\/1",
"debug_crush": "1\/1",
"debug_mds": "1\/5",
...
Log
/var/log/ceph/cluster01.log
New versions of Ceph complain about slow requests
:
{date} {osd.num} [WRN] 1 slow requests, 1 included below; oldest blocked for > 30.005692 secs
{date} {osd.num} [WRN] slow request 30.005692 seconds old, received at {date-time}: osd_op(client.4240.0:8 benchmark_data_ceph-1_39426_object7 [write 0~4194304] 0.69848840) v4 currently waiting for subops from [610]
grep 'slow request' cluster01.log | awk '{print $3}' | sort | uniq
Exportation d'un snapshot
root@PMX-01:/# rbd export ecpool/vm-7011-disk-1-delme@Config /mnt/pve/standby/dump/vm-7011-disk-1@Config
Exporting image: 100% complete...done.
Importation
root@PMX-01:/# rbd import --image-format 2 /mnt/pve/standby/dump/vm-7011-disk-1@Config Prod-Pool/vm-7011-disk-1-Config
Importing image: 100% complete...done.
List les images "7011"
rbd --cluster cluster01 -p Prod-Pool list -l | grep 7011
# Output:
vm-7011-disk-1 40960M 2
vm-7011-disk-1@Test_Ceph 40960M 2
vm-7011-disk-1-Config 40960M 2
rbd --cluster cluster01 -p Prod-Pool info vm-7011-disk-1-Config
rbd image 'vm-7011-disk-1-Config'
# Output:
size 40960 MB in 10240 objects
order 22 (4096 kB objects)
block_name_prefix: rbd_data.04da716b8b4567
format: 2
features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
flags:
Cache pool
Drain the cache in preparation for turning it off:
ceph osd tier cache-mode foo-hot forward
rados -p foo-hot cache-flush-evict-all
When cache pool is finally empty, disable it:
ceph osd tier remove-overlay foo
ceph osd tier remove foo foo-hot
ceph osd pool get Prod-Pool pg_num
# Output:
pg_num: 1024
ceph osd pool get SSD-Pool pg_num
# Output:
pg_num: 256
ceph osd pool get cephfs_data pg_num
# Output:
pg_num: 256
ceph osd pool get cephfs_metadata pg_num
# Output:
pg_num: 256
ceph --cluster cluster01 daemon osd.0 config show | grep ratio | grep full
Enable Ceph dashboard
apt install ceph-mgr-dashboard (on all service manager nodes)
ceph mgr module enable dashboard
ceph dashboard ac-user-create cephdash [password] administrator
ceph config-key set mgr/dashboard/server_addr ::
ceph dashboard create-self-signed-cert
ceph mgr module disable dashboard
ceph mgr module enable dashboard
systemctl restart ceph-mgr@[servername].service
Then https://[IP or FQDN]:8443 or http://[IP or FQDN]:8080