8000 Addition of method to add OSDs to managed spec by pdhiran · Pull Request #4833 · red-hat-storage/cephci · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Addition of method to add OSDs to managed spec #4833

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 30, 2025

Conversation

pdhiran
Copy link
Contributor
@pdhiran pdhiran commented May 27, 2025

Description

  1. Method to move the OSDs from a unmanaged spec to managed spec.
    bug : https://bugzilla.redhat.com/show_bug.cgi?id=2304314
  2. Updated all rados tests to use rados specific modules for managed/unmanaged. This is to avoid hitting issues when placement key is not present for newly added OSDs.
# ceph orch osd set-spec-affinity osd.all-available-devices 17 18                                      
Updated service for osd 17,18

@pdhiran pdhiran requested a review from neha-gangadhar as a code owner May 27, 2025 07:50
@pdhiran pdhiran self-assigned this May 27, 2025
@pdhiran pdhiran added RADOS Rados Core 8.1 Ceph 8.1 feature automation labels May 27, 2025
@pdhiran pdhiran requested a review from harshkumarRH May 27, 2025 07:55
@pdhiran pdhiran force-pushed the pdhiran-latest branch 2 times, most recently from 42fdd68 to c4482ba Compare May 28, 2025 04:05
@pdhiran pdhiran force-pushed the pdhiran-latest branch 4 times, most recently from b62f229 to 31d6722 Compare May 29, 2025 05:22
@pdhiran
Copy link
Contributor Author
pdhiran commented May 29, 2025

@pdhiran pdhiran requested review from harshkumarRH and s-vipin May 29, 2025 05:55
return True

if not self.unmanaged_osd_service_exists():
log.info("No unmanaged OSDs exist on the cluster on the start of test")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we return True here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. now with the new behaviour, a new proper spec gets created, and the OSD will be part of it. We want to move the OSD from one spec to another.

That would not be possible if we return here. Hence just a message.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we are claiming that no OSDs are unmanaged, so if all the OSDs are already managed, why do we need to do anything at all?

Comment on lines 427 to 438
if not self.get_osd_spec(osd_id=osd_id):
log.debug("Setting OSDs ID %s to managed state", osd_id)
set_osds_managed([osd_id])
time.sleep(10)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To save execution time, how about we write a method to fetch the list of unmanaged OSDs on the cluster, and then run single command to add all of them together.
Maybe as an enhancement in next PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the workflow. Collecting all the OSDs with un-manage-able placement spec, and then running a single command.

Comment on lines +800 to +805
rados_obj.set_service_managed_type(service_type="osd", unmanaged=False)
# log cluster health
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose for builds older than 8.1, this section of the code will still get executed and if the cluster already has any OSD services that are not managed by ceph orch, then this would fail

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we already handle condition where there un-manage-able placement spec. Would not fail i believe.

Comment on lines 753 to 760
log.info(
"No Unmanaged service present on the cluster"
"Proceeding to change the OSD spec to different one"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting inside this block means Unmanaged service is present, the log statements seems to say the opposite

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Updated the log

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Updated the log

log.debug("Completed addition of the OSD back to cluster.")

# Checking if the newly added OSD is without placement
if service_obj.unmanaged_osd_service_exists():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point aren't all the OSD Orch services in unmanaged state due to execution of line 736?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is check for a OSD orch service, that cannot be set in managed state. Basically a check for un-manage-able placement spec.

Copy link
Contributor Author
@pdhiran pdhiran May 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is check for a OSD orch service, that cannot be set in managed state. Basically a check for un-manage-able placement spec.

Not a check for spec that is currently in a unmanaged state.

Comment on lines +776 to +783
# Currently OSD metadta is not updated with the spec change. Open discussion in progress.
# if osd_spec_name != new_osd_spec_name:
# log.error("Spec could not be updated for the OSD : %s", target_osd)
# return 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did OSD restart help?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NO. OSD restart did not help. Even after deleting the osd.default spec, and no longer listing in ceph orch ls, the metadata remained same.

@pdhiran pdhiran force-pushed the pdhiran-latest branch 2 times, most recently from 4e32e91 to ae34a12 Compare May 29, 2025 13:50
Signed-off-by: Pawan Dhiran <pdhiran@redhat.com>
Copy link
Contributor
@harshkumarRH harshkumarRH left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@openshift-ci openshift-ci bot added the lgtm Add this label when the PR is good to be merged label May 30, 2025
@pdhiran
Copy link
Contributor Author
pdhiran commented May 30, 2025

Copy link
Contributor
@neha-gangadhar neha-gangadhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved

Copy link
Contributor
openshift-ci bot commented May 30, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: harshkumarRH, neha-gangadhar, pdhiran

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@mergify mergify bot merged commit 704df7b into red-hat-storage:main May 30, 2025
5 of 6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
8.1 Ceph 8.1 feature automation lgtm Add this label when the PR is good to be merged RADOS Rados Core
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0