8000 rbd: add support for changed block tracking RPCs by Rakshith-R · Pull Request #5347 · ceph/ceph-csi · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

rbd: add support for changed block tracking RPCs #5347

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: devel
Choose a base branch
from

Conversation

Rakshith-R
Copy link
Contributor

Describe what this PR does

rbd: add support for changed block tracking RPCs

This commit adds support for GetMetadataAllocated
and GetMetadataDelta RPCs.
CSI Snapshot Metadata Service RPCs

Updates: #5346

Checklist:

  • Commit Message Formatting: Commit titles and messages follow
    guidelines in the developer
    guide
    .
  • Reviewed the developer guide on Submitting a Pull
    Request
  • Pending release
    notes

    updated with breaking and/or notable changes for the next major release.
  • Documentation has been updated, if necessary.
  • Unit tests have been added, if necessary.
  • Integration tests have been added, if necessary.

Show available bot commands

These commands are normally not required, but in case of issues, leave any of
the following bot commands in an otherwise empty comment in this PR:

  • /retest ci/centos/<job-name>: retest the <job-name> after unrelated
    failure (please report the failure too!)

@mergify mergify bot added the component/rbd Issues related to RBD label May 30, 2025
@Rakshith-R Rakshith-R requested review from a team May 30, 2025 10:04
@@ -1709,3 +1709,67 @@ func (cs *ControllerServer) ControllerUnpublishVolume(

return &csi.ControllerUnpublishVolumeResponse{}, nil
}

// GetSnapshotMetadata streams the allocated block metadata for a snapshot.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// GetSnapshotMetadata streams the allocated block metadata for a snapshot.
// GetMetadataAllocated streams the allocated block metadata for a snapshot.

req *csi.GetMetadataDeltaRequest,
stream csi.SnapshotMetadata_GetMetadataDeltaServer,
) error {
ctx := context.WithValue(stream.Context(), "Base-Snapshot-ID", req.GetBaseSnapshotId())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Base Snapshot ID


targetSnapshotID := req.GetTargetSnapshotId()
if targetSnapshotID == "" {
return status.Error(codes.InvalidArgument, "base snapshot ID cannot be empty")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return status.Error(codes.InvalidArgument, "base snapshot ID cannot be empty")
return status.Error(codes.InvalidArgument, "target snapshot ID cannot be empty")

@Rakshith-R Rakshith-R force-pushed the add-snapshot-diff branch from b60a6f5 to 3892027 Compare June 2, 2025 07:06
@@ -1709,3 +1709,67 @@ func (cs *ControllerServer) ControllerUnpublishVolume(

return &csi.ControllerUnpublishVolumeResponse{}, nil
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets move this to a new file as its a new functionality, This fine is already having 1770 lines of code.


rbdSnap, err := genSnapFromSnapID(ctx, snapshotID, cr, req.GetSecrets())
if err != nil {
return status.Error(codes.Internal, err.Error())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about the NotFound error?

maxResults int32) error {
image, err := rbdSnap.open()
if err != nil {
return err
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return GRPC error message

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Take care of this in other places in this function

}
}
// If no error, we successfully completed the diff operation.
if len(changedBlocks) > 0 {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have a function to send the response and can be reused here and above like

Suggested change
if len(changedBlocks) > 0 {
send := func() error {
if len(changedBlocks) == 0 {
return nil
}
log.DebugLog(ctx, "Sending %d changed blocks to the stream", len(changedBlocks))
if err := stream.Send(&csi.GetMetadataAllocatedResponse{
BlockMetadataType: csi.BlockMetadataType_VARIABLE_LENGTH,
VolumeCapacityBytes: rbdSnap.VolSize,
BlockMetadata: changedBlocks,
}); err != nil {
log.ErrorLog(ctx, "failed to send response: %v", err)
return status.Error(codes.Internal, fmt.Sprintf("failed to send response: %v", err))
}
changedBlocks = changedBlocks[:0] // Reset slice
return nil
}


baseRBDSnap, err := genSnapFromSnapID(ctx, baseSnapshotID, cr, req.GetSecrets())
if err != nil {
return status.Error(codes.Internal, err.Error())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to handle NotFound?


targetRBDSnap, err := genSnapFromSnapID(ctx, targetSnapshotID, cr, req.GetSecrets())
if err != nil {
return status.Error(codes.Internal, err.Error())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to handle NotFound?

return nil
}

func (rbdSnap *rbdSnapshot) getMetadataDelta(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks similar to above and have lot of duplication . Can we rewrite this to avoid duplication of the core logic? That will help us to keep the outer implementation simple

@Rakshith-R Rakshith-R force-pushed the add-snapshot-diff branch 2 times, most recently from 85a53a1 to 288a354 Compare June 2, 2025 10:43
@Rakshith-R Rakshith-R force-pushed the add-snapshot-diff branch 2 times, most recently from 56db368 to 6a3d211 Compare June 2, 2025 12:18
@Rakshith-R
Copy link
Contributor Author
Rakshith-R commented Jun 2, 2025

Thanks for the quick detailed review

The new main modifications are:

  • moved the new calls to snap_diff.go file
  • debug logs for feature support check errors
  • refactored to have a single flow with sendStreamResponse handling which response to send allocated or delta
  • modified return statement and ensured they are formatted and have correct error codes
  • added validation for req parameters
  • added check for snapshot's existence + returns correct error code now
  • added additional checks for startingOffset (add out of range error code as specified in the spec)
  • additional checks for maxResults and internal value to limit memory usage ([spec](spec allows us to send lower number of blocks as well))

failing golang lintci is being handled in #5351

Please take a look again

Copy link
Contributor
mergify bot commented Jun 2, 2025

This pull request now has conflicts with the target branch. Could you please resolve conflicts and force push the corrected changes? 🙏

defer cr.DeleteCredentials()

snapshotID := req.GetSnapshotId()
if snapshotID == "" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not needed? already handled in validateMetadataAllocatedReq

Copy link
Member
@nixpanic nixpanic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1st pass going through this. Needs cleanup and clear splitting of internal RBD things and ControllerServer Snapshot usage. Extend the types.Snapshot interface with required functions, so that ControllerServer does not need to use rbdSnap types directly.


// streamDoneErrCode is a custom error code used to indicate that the
// stream was closed by the client.
streamDoneErrCode = -1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to not use google.golang.org/grpc/codes constants? Or, maybe base it on return values of read(), like EOF and so on?

// internalMaxResults is the maximum number of results to return in a single
// response. This is used to limit the number of changed blocks that needs
// to be stored in memory before sending a response to the stream.
internalMaxResults = 20
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't 20 very small?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't 20 very small?

I'm fine with any value here on the lower
The CSI spec gives us freedom to send lower number than asked for.

We'll have a followup to make this configurable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any suggestions ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expect that the size of a single metadata for a changed block is relatively small. Start with something like 128, so that the number of repeated calls is reduced (will be less efficient?).

A configuration option is definitely good to do at a later point in time. Does the sidecar provide metrics about the number of changed blocks returned per diff? With that, a more educated guess for a default can be made.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expect that the size of a single metadata for a changed block is relatively small. Start with something like 128, so that the number of repeated calls is reduced (will be less efficient?).

👍

A configuration option is definitely good to do at a later point in time. Does the sidecar provide metrics about the number of changed blocks returned per diff? With that, a more educated guess for a default can be made.

Currently only metrics regarding start & end time of requests is added
@Nikhil-Ladha ^

@Rakshith-R Rakshith-R force-pushed the add-snapshot-diff branch 3 times, most recently from 72390e5 to 4ff0a22 Compare June 5, 2025 12:26
@@ -4,6 +4,8 @@

## Features

- rbd: add support for [CSI Snapshot Metadata Service RPCs](https://github.com/container-storage-interface/spec/blob/master/spec.md#snapshot-metadata-service-rpcs)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add something about it to the rbd documentation too?

@@ -68,3 +68,8 @@ var (
// ErrGroupNotFound is returned when group is not found in the cluster on the given pool and/or namespace.
ErrGroupNotFound = fmt.Errorf("%w: RBD group not found", librbd.ErrNotFound)
)

type ErrorCode interface {
// Code returns the error code.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment Code -> ErrorCode.

Can you also add a comment for the interface itself, with a small explanation how (which kind of errors) it can be used?

// StreamMetadata streams the block metadata for a snapshot.
// If baseRBDSnap is provided, it calculates the delta between the
// current snapshot and the base snapshot.
func (rbdSnap *rbdSnapshot) StreamMetadata(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have the feeling this function does too much. The rbdSnapshot struct should probably not send anything, but instead return []*csi.BlockMetadata to the caller which can then send it out? Would that not make it a cleaner implementation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have the feeling this function does too much.

Broke down this function to include multiple helpers

The rbdSnapshot struct should probably not send anything, but instead return []*csi.BlockMetadata to the caller which can then send it out? Would that not make it a cleaner implementation?

It's a streaming API, so we would be sending more than once.
Used this as inspiration to instead send down a func that only takes []*csi.BlockMetadata as input and send stream response internally.

// If baseRBDSnap is provided, the delta between the
// current snapshot and the base snapshot is streamed.
StreamMetadata(ctx context.Context,
stream grpc.ServerStream,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

grpc.ServerStream really isn't nice here, see my other comment.

}

if baseRBDSnap != nil {
fromSnapID, err = baseRBDSnap.GetRBDSnapID(ctx)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of exporting the GetRBDSnapID() function in the Snapshot interface, it can be an internal function too. In this case, typecasting baseRBDSnap to a *rbdSnapshot would be ok.

Consider a helper function for typecasting:

func rbdSnapFromSnapshot(snap types.Snapshot) (*rbdSnapshot, error) {}

GetVolSize() int64

// GetRBDSnapID retrieves the RBD snapshot ID for the snapshot.
GetRBDSnapID(ctx context.Context) (uint64, error)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a non-backend-specific naming, drop RBD from it. Ideally remove it from the interface, see a previous comment.

Signed-off-by: Rakshith R <rar@redhat.com>
@Rakshith-R Rakshith-R force-pushed the add-snapshot-diff branch 4 times, most recently from 4eeea62 to 9956ddf Compare June 20, 2025 10:58
Signed-off-by: Rakshith R <rar@redhat.com>
@Rakshith-R Rakshith-R force-pushed the add-snapshot-diff branch 2 times, most recently from 9c9afb4 to 755348e Compare June 20, 2025 11:25
This commit adds support for GetMetadataAllocated
and GetMetadataDelta RPCs.

Signed-off-by: Rakshith R <rar@redhat.com>
@Rakshith-R Rakshith-R requested review from nixpanic, iPraveenParihar and a team June 20, 2025 11:55
Copy link
Collaborator
@Madhu-1 Madhu-1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some nits, overall LGTM


* Streams metadata about block differences between two snapshots
* Optimal for incremental backups
* Returns only blocks that changed between snapshots
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doc is for someone who wants to deploy and test it out, can we add relevant steps for the same?

caps = append(caps, &snapDiffCaps)
}
if err != nil {
log.DebugLog(ctx, "error checking for snapshot diff by ID support: %s", err)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

log as error/warning?

Comment on lines +176 to +183
if errors.Is(err, rbderrors.ErrImageNotFound) {
log.ErrorLog(ctx, "snapshot %q not found: %v", baseSnapshotID, err)

return status.Errorf(codes.NotFound, "snapshot %q not found: %v", baseSnapshotID, err)
}
if err != nil {
return status.Error(codes.Internal, err.Error())
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if errors.Is(err, rbderrors.ErrImageNotFound) {
log.ErrorLog(ctx, "snapshot %q not found: %v", baseSnapshotID, err)
return status.Errorf(codes.NotFound, "snapshot %q not found: %v", baseSnapshotID, err)
}
if err != nil {
return status.Error(codes.Internal, err.Error())
}
if err != nil {
if errors.Is(err, rbderrors.ErrImageNotFound) {
log.ErrorLog(ctx, "snapshot %q not found: %v", baseSnapshotID, err)
return status.Errorf(codes.NotFound, "snapshot %q not found: %v", baseSnapshotID, err)
}
return status.Error(codes.Internal, err.Error())
}
it would be nice to have one block for the error handling, take care of it in other places as well.

diffIterateErr := image.DiffIterateByID(diffIterateByIDConfig)
err = handleDiffIterateError(ctx, diffIterateErr)
if err != nil {
log.ErrorLog(ctx, "failed to get diff: %s", err)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

log it at the caller? we dont need to log this at multiple places in this function?

// handleDiffIterateError checks the error returned by DiffIterateByID.
// It handles specific error codes and returns a formatted error message
// for unrecognized error codes.
func handleDiffIterateError(ctx context.Context, err error) error {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let the caller log it and we dont need to pass the ctx?

@@ -527,7 +527,7 @@ If custom image is built for the rbd-plugin instance, make sure that it contains

## Changed Block Tracking (CBT)

> **Warning**: Requires Ceph version that supports RBD snap diff by ID feature.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes should be part of the previous doc: add documentation for CBT feature commit.

// defaultMaxResults is the internal limit for max results in metadata streaming
// This is set to a reasonable value to prevent excessive memory usage
// and ensure efficient processing of metadata requests.
// TODO: Consider making this configurable via a cmdline flag.
Copy link
Contributor
@Nikhil-Ladha Nikhil-Ladha Jun 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please have an issue created for this, so it's easier to track.

Copy link
Contributor
mergify bot commented Jun 26, 2025

This pull request now has conflicts with the target branch. Could you please resolve conflicts and force push the corrected changes? 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/rbd Issues related to RBD
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
0