8000 Take `bluestore_min_alloc_size` into account · Issue #34 · TheJJ/ceph-balancer · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
Take bluestore_min_alloc_size into account #34
Open
@patrakov

Description

@patrakov

I have access to a cluster created long ago and then expanded by adding new OSDs. I found that, in order to balance it properly, I had to add --osdsize device balance --osdused delta. Otherwise, its idea of how full an OSD is disagrees with what ceph osd df says, and disagrees differently for different OSDs.

Today, with the help of my colleagues, we root-caused this: old OSDs have bluestore_min_alloc_size=65536, while new ones have bluestore_min_alloc_size=4096. It means that the average per-object overhead is different. This overhead is what makes the sum of PG sizes (i.e., the sum of all stored object sizes) different from the used space on the OSD.

Please assume by default that each stored object comes with an overhead of bluestore_min_alloc_size / 2, and take this into account when figuring out how much space would be used or freed by PG moves. On Ceph 17.2.7 and later, you can get this from ceph osd metadata.

For example, an OSD that has a total of 56613739 objects in all PGs would have 1.7 TB of overhead with bluestore_min_alloc_size=65536, but only 100 GB of overhead with bluestore_min_alloc_size=4096.

Here is ceph osd df (please ignore the first bunch of OSDs with only 0.75% utilization - they are outside of the CRUSH root, waiting for an "ok" to be placed in the proper hierarchy):
ceph-osd-df.txt

Here is ceph pg ls-by-osd 221 (this one was redeployed recently, so it has bluestore_min_alloc_size=4096):
ceph-pg-ls-by-osd-221.txt

Here is ceph pg ls-by-osd 223:
ceph-pg-ls-by-osd-223.txt

As you can see, these two OSDs have almost the same size, almost the same (differing only by 1) number of PGs, but their utilization differs by 1.9 TB, which matches (although not perfectly) the overhead calculation presented above.

Sorry, I am not allowed to post the full osdmap.

P.S. I am also going to file the same bug against the built-in Ceph balancer.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0