10000 Support GetFileSize API in FSRandomAccessFile by xingbowang · Pull Request #13676 · facebook/rocksdb · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Support GetFileSize API in FSRandomAccessFile #13676

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

xingbowang
Copy link

Summary:

This change is used to address this issue
#13619
It supports GetFileSize API in FSRandomAccessFile. This allows ReadFooterFromFile to quickly get the file size for file size validation.

Test Plan:
make check

Reviewers:
Peter Dillinger

Subscribers:

Tasks:

Tags:

@facebook-github-bot
Copy link
Contributor

@xingbowang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Copy link
Contributor
@pdillinger pdillinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would include as many built-in implementations as you can in this PR (git grep 'public FSRandomAccessFile') and add a basic unit test in env_test that simply writes a file of some random size, up to ~100KB, opens it for reads and checks that the written size agrees with all the file size APIs.

We also need to override this function in all the wrapper classes. Basically, if you can't compile the full repository with this function pure virtual instead of defaulting to NotSupported, you should have an explanation for each of the internal cases you have not implemented in this PR.

@@ -909,6 +909,13 @@ class FSRandomAccessFile {
return IOStatus::NotSupported("Prefetch");
}

// Get the file size. The default implementation is not supported, as some of
Copy link
Contributor
@pdillinger pdillinger Jun 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say

The default implementation returns "not supported" so that user implementations
of FSRandomAccessFile do not need to immediately implement this function.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback. I will address them.
I also noticed another class RandomAccessFile, which had exact same functions as FSRandomAccessFile. Should I add the new API there as well. I recall you mentioned that it was from a legacy system used in env. The code seems indicate that as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should not be needed for the legacy Env-related class RandomAccessFile. Just adds to the confusion about what should be overridden. #9274

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RandomAccessFile is used in LegacyRandomAccessFileWrapper. To make the wrapper work, I had to add the new API GetFileSize in RandomAccessFile. Let me know if it is a concern.

@pdillinger
Copy link
Contributor

Also, use unreleased_history/add.sh to add a release note. At this point, the user should care that they have the option to override this new function to make some file metadata queries more efficient. And we could say that this function might be required in the future.

@facebook-github-bot
Copy link
Contributor

@xingbowang has updated the pull request. You must reimport the pull request before landing.

1 similar comment
@facebook-github-bot
Copy link
Contributor

@xingbowang has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@xingbowang has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@xingbowang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@xingbowang has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@xingbowang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@xingbowang has updated the pull request. You must reimport the pull request before landing.

1 similar comment
@facebook-github-bot
Copy link
Contributor

@xingbowang has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@xingbowang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@xingbowang has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@xingbowang has updated the pull request. You must reimport the pull request before landing.

1 similar comment
@facebook-github-bot
Copy link
Contributor

@xingbowang has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@xingbowang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Copy link
Contributor
@pdillinger pdillinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I typed up these review comments a few days ago and forgot to submit

env/env_test.cc Outdated

constexpr size_t kSectorSize = 4096;
constexpr size_t kNumSectors = 128;
constexpr size_t expectedFileSize = kSectorSize * kNumSectors;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not enough entropy, imho. Could be something random up to something like this size.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean randomize the number of kNumSectors in each run?

kNumSectors = 64 + (std::rand() % 64);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, that's not enough entropy. A FileSystem must provide files of any number of bytes (up to some limit). The notion of "sector" seems irrelevant here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

env/env_test.cc Outdated
ASSERT_OK(fs->NewRandomAccessFile(fname, FileOptions(), &file, nullptr));

uint64_t actualFileSize;
file->GetFileSize(&actualFileSize);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is suspicious/wrong that this is not reported by our ASSERT_STATUS_CHECKED CI test. (The return value is ignored and needs to be checked.)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check with ASSERT_OK

env/env_test.cc Outdated
};

// Validate GetFileSize API returns the right value.
// Use the default implementation from Posix File system
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might not be posix depending on platform.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

Env* env_;
};

// Validate GetFileSize API returns the right value.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about FileSystem::GetFileSize()?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you elaborate more about this? Do you mean validate the file size against FileSystem::GetFileSize API?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like that. I was just pointing out an inconsistency/incompleteness/ambiguity in what was claimed to be tested vs. actually tested.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

env/env_test.cc Outdated
std::unique_ptr<FSRandomAccessFile> file;
ASSERT_OK(fs->NewRandomAccessFile(fname, FileOptions(), &file, nullptr));

uint64_t actualFileSize;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

APIs can lie or have bugs. I would call this "reportedFileSize".

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -1051,6 +1051,14 @@ class FSRandomAccessFile {
// open.
virtual Temperature GetTemperature() const { return Temperature::kUnknown; }

// Get the file size.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... on an open-for-reading file without re-seeking the file's path in the filesystem.

In other words, API documentation should not only include "what" something is but "why" something exists (if it's potentially redundant).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

uint64_t file_size_from_file_system = 0;
Status s;
s = file->file()->GetFileSize(&file_size_from_file_system);
if (!s.ok()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we discussed that it would be OK for implementations not to implement the new function immediately. (Allow NotSupported here.) In fact, we should be testing that scenario in unit tests. We could consider doing that by holding back the env_encryption implementation until we make the function is required. (Justification for that state described well in comments in env_encryption.)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

table/format.cc Outdated
// manifest. Otherwise it is not guaranteed.
return Status::Corruption(
"Sst file size mismatch: " + file->file_name() + ". Expected " +
std::to_string(expected_file_size) + ", actual size " +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer "reported" to "actual"

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@facebook-github-bot
Copy link
Contributor

@xingbowang has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@xingbowang has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@xingbowang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Deprecate skip_checking_sst_file_sizes_on_db_open option.

Summary:
This change is used to address this issue
facebook#13619
It supports file size validation in ReadFooterFromFile. In favor of this
change, CheckConsistency function and
skip_checking_sst_file_sizes_on_db_open flag are deprecated.

The CheckConsistency function checks each file size matches what was
recorded in manifest during DB open. Meantime, ReadFooterFromFile was
called for each file in LoadTables function. Since ReadFooterFromFile
always validates file size, the CheckConsistency is redundant.

In addtion, CheckConsistency is ecxecuted in a single thread. This could
slow down DB open when a network file system is used. Therefore, the
flag skip_checking_sst_file_sizes_on_db_open was added to skip this
check. After this change, ReadFooterFromFile was executed in parallel
through multiple threads. Therefore, the concern of DB open slowness is
eliminated, and the flag could be deprecated.

There is 2 slight concerns of this change.

*If max_open_files is set with smaller value, engine will not open all
the files during DB open. This means if there is a corruption on file
size, it will not be detected during DB open, but rather at a later
time. Since the default is -1, which means open all the files, and it is
rarely overridden and a lot of new features rely on it to be -1, the
risk is very low.

*If FIFO compaction is used, engine could fail to open DB unnecessarily
on the corrupted files that would never be used again. However, this is
a very rare case as well. The error could still be ignored by setting
paranoid_checks operationally. The risk is very low.

To remain backward compatibility. The public facing flag was kept and
marked as no-op internally. Another change is required to fully remove
the flag.

Test Plan:
make check
A new unit test was added to validate file size check API works as
expected.
@facebook-github-bot
Copy link
Contributor

@xingbowang has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@xingbowang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0