这篇文章上次修改于 699 天前,可能其部分内容已经发生变化,如有疑问可询问作者。

错误

platform.linux_distribution问题

表现

RuntimeError: AttributeError: module 'platform' has no attribute 'linux_distribution'

原因

在Python 3.8及更高版本中,platform.linux_distribution已经被删除了,所以ceph会报错。

解决

解决方法简单粗暴,直接修改程序逻辑就好。
编辑 /usr/lib/python3/dist-packages/ceph_deploy/hosts/remotes.py
然后找到platform_information函数,将下列两行用try语句包裹:

linux_distribution = _linux_distribution or platform.linux_distribution
distro, release, codename = linux_distribution()

然后设置三个变量的默认值,修改后是这样的:

def platform_information(_linux_distribution=None):
    """ detect platform information from remote host """
    """
    linux_distribution = _linux_distribution or platform.linux_distribution
    distro, release, codename = linux_distribution()
    """
    distro = release = codename = None
    try:
          linux_distribution = _linux_distribution or platform.linux_distribution
          distro, release, codename = linux_distribution()
    except AttributeError:
          pass

保存退出,然后再次尝试运行。

ceph-deploy文本显示问题

表现

[a1][INFO  ] Running command: sudo fdisk -l
[ceph_deploy][ERROR ] Traceback (most recent call last):
[ceph_deploy][ERROR ]   File "/usr/lib/python3/dist-packages/ceph_deploy/util/decorators.py", line 69, in newfunc
[ceph_deploy][ERROR ]     return f(*a, **kw)
[ceph_deploy][ERROR ]   File "/usr/lib/python3/dist-packages/ceph_deploy/cli.py", line 166, in _main
[ceph_deploy][ERROR ]     return args.func(args)
[ceph_deploy][ERROR ]   File "/usr/lib/python3/dist-packages/ceph_deploy/osd.py", line 434, in disk
[ceph_deploy][ERROR ]     disk_list(args, cfg)
[ceph_deploy][ERROR ]   File "/usr/lib/python3/dist-packages/ceph_deploy/osd.py", line 375, in disk_list
[ceph_deploy][ERROR ]     if line.startswith('Disk /'):
[ceph_deploy][ERROR ] TypeError: startswith first arg must be bytes or a tuple of bytes, not str
[ceph_deploy][ERROR ] 

原因

类型不通问题,需要以bytes形式匹配。

解决

打开 /usr/lib/python3/dist-packages/ceph_deploy/osd.py
查找下列代码:

if line.startswith('Disk /'):

将其替换为:

if line.startswith(b'Disk /'):

保存退出,重新运行即可。

残留Block问题

如果用ceph-deploy强制删除了原来的集群,那原来的OSD可能会保留在磁盘内。
所以当再次部署的时候,使用ceph-deploy擦除磁盘的时候可能导致异常情况:

[x1][INFO  ] Running command: /usr/sbin/ceph-volume lvm zap /dev/sdb
[x1][WARNIN] --> Zapping: /dev/sdb
[x1][WARNIN] --> --destroy was not specified, but zapping a whole device will remove the partition table
[x1][WARNIN]  stderr: wipefs: error: /dev/sdb: probing initialization failed: Device or resource busy
[x1][WARNIN] --> failed to wipefs device, will try again to workaround probable race condition
[x1][WARNIN]  stderr: wipefs: error: /dev/sdb: probing initialization failed: Device or resource busy
[x1][WARNIN] --> failed to wipefs device, will try again to workaround probable race condition
[x1][WARNIN]  stderr: wipefs: error: /dev/sdb: probing initialization failed: Device or resource busy
[x1][WARNIN] --> failed to wipefs device, will try again to workaround probable race condition
[x1][WARNIN]  stderr: wipefs: error: /dev/sdb: probing initialization failed: Device or resource busy
[x1][WARNIN] --> failed to wipefs device, will try again to workaround probable race condition
[x1][WARNIN]  stderr: wipefs: error: /dev/sdb: probing initialization failed: Device or resource busy
[x1][WARNIN] --> failed to wipefs device, will try again to workaround probable race condition
[x1][WARNIN]  stderr: wipefs: error: /dev/sdb: probing initialization failed: Device or resource busy
[x1][WARNIN] --> failed to wipefs device, will try again to workaround probable race condition
[x1][WARNIN]  stderr: wipefs: error: /dev/sdb: probing initialization failed: Device or resource busy
[x1][WARNIN] --> failed to wipefs device, will try again to workaround probable race condition
[x1][WARNIN]  stderr: wipefs: error: /dev/sdb: probing initialization failed: Device or resource busy
[x1][WARNIN] --> failed to wipefs device, will try again to workaround probable race condition
[x1][WARNIN] -->  RuntimeError: could not complete wipefs on device: /dev/sdb
[x1][ERROR ] RuntimeError: command returned non-zero exit status: 1
[ceph_deploy][ERROR ] RuntimeError: Failed to execute command: /usr/sbin/ceph-volume lvm zap /dev/sdb

要想解决这个问题,就得手动删除磁盘中残余的Block,先用lsblk查看磁盘情况

lsblk

我这边显示的情况是:

NAME                                                                                          MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
loop1                                                                                           7:1    0 55.5M  1 loop /snap/core18/2409
loop2                                                                                           7:2    0 71.3M  1 loop /snap/lxd/16099
loop3                                                                                           7:3    0 61.9M  1 loop /snap/core20/1518
loop4                                                                                           7:4    0 67.8M  1 loop /snap/lxd/22753
loop5                                                                                           7:5    0   47M  1 loop /snap/snapd/16292
loop6                                                                                           7:6    0 55.6M  1 loop /snap/core18/2538
sda                                                                                             8:0    0   50G  0 disk 
├─sda1                                                                                          8:1    0    1M  0 part 
└─sda2                                                                                          8:2    0   50G  0 part /
sdb                                                                                             8:16   0    1T  0 disk 
└─ceph--66fb0189--7b8a--423e--a26c--f4a85545f396-osd--block--df953059--5020--4c8c--8b82--4dd8a22a0b1c
                                                                                              253:0    0 1024G  0 lvm  
rbd0                                                                                          252:0    0   20G  0 disk 
rbd1                                                                                          252:16   0   20G  0 disk 
rbd2                                                                                          252:32   0    2G  0 disk 
rbd3                                                                                          252:48   0   20G  0 disk 
rbd4                                                                                          252:64   0    4G  0 disk 
rbd5                                                                                          252:80   0    8G  0 disk 
rbd6                                                                                          252:96   0    8G  0 disk 
rbd7                                                                                          252:112  0    8G  0 disk 

可以看到在要抹除的磁盘/dev/sdb下,存在一个前ceph集群残留的存储块: ceph--66fb0189--7b8a--423e--a26c--f4a85545f396-osd--block--df953059--5020--4c8c--8b82--4dd8a22a0b1c
把它擦除掉就好了

sudo dmsetup remove --force ceph--66fb0189--7b8a--423e--a26c--f4a85545f396-osd--block--df953059--5020--4c8c--8b82--4dd8a22a0b1c

删除这个block之后,磁盘里可能还存在一些LVM或者分区,用wipefs强行写入删除:

wipefs -af /dev/sdb