-
Notifications
You must be signed in to change notification settings - Fork 6.5k
Support GetFileSize API in FSRandomAccessFile #13676
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@xingbowang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would include as many built-in implementations as you can in this PR (git grep 'public FSRandomAccessFile'
) and add a basic unit test in env_test that simply writes a file of some random size, up to ~100KB, opens it for reads and checks that the written size agrees with all the file size APIs.
We also need to override this function in all the wrapper classes. Basically, if you can't compile the full repository with this function pure virtual instead of defaulting to NotSupported, you should have an explanation for each of the internal cases you have not implemented in this PR.
include/rocksdb/file_system.h
Outdated
@@ -909,6 +909,13 @@ class FSRandomAccessFile { | |||
return IOStatus::NotSupported("Prefetch"); | |||
} | |||
|
|||
// Get the file size. The default implementation is not supported, as some of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say
The default implementation returns "not supported" so that user implementations
of FSRandomAccessFile do not need to immediately implement this function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback. I will address them.
I also noticed another class RandomAccessFile, which had exact same functions as FSRandomAccessFile. Should I add the new API there as well. I recall you mentioned that it was from a legacy system used in env. The code seems indicate that as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should not be needed for the legacy Env-related class RandomAccessFile. Just adds to the confusion about what should be overridden. #9274
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RandomAccessFile is used in LegacyRandomAccessFileWrapper. To make the wrapper work, I had to add the new API GetFileSize in RandomAccessFile. Let me know if it is a concern.
Also, use |
@xingbowang has updated the pull request. You must reimport the pull request before landing. |
1 similar comment
@xingbowang has updated the pull request. You must reimport the pull request before landing. |
@xingbowang has updated the pull request. You must reimport the pull request before landing. |
@xingbowang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@xingbowang has updated the pull request. You must reimport the pull request before landing. |
@xingbowang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@xingbowang has updated the pull request. You must reimport the pull request before landing. |
1 similar comment
@xingbowang has updated the pull request. You must reimport the pull request before landing. |
@xingbowang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@xingbowang has updated the pull request. You must reimport the pull request before landing. |
@xingbowang has updated the pull request. You must reimport the pull request before landing. |
1 similar comment
@xingbowang has updated the pull request. You must reimport the pull request before landing. |
@xingbowang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I typed up these review comments a few days ago and forgot to submit
env/env_test.cc
Outdated
|
||
constexpr size_t kSectorSize = 4096; | ||
constexpr size_t kNumSectors = 128; | ||
constexpr size_t expectedFileSize = kSectorSize * kNumSectors; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not enough entropy, imho. Could be something random up to something like this size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean randomize the number of kNumSectors in each run?
kNumSectors = 64 + (std::rand() % 64);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, that's not enough entropy. A FileSystem must provide files of any number of bytes (up to some limit). The notion of "sector" seems irrelevant here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
env/env_test.cc
Outdated
ASSERT_OK(fs->NewRandomAccessFile(fname, FileOptions(), &file, nullptr)); | ||
|
||
uint64_t actualFileSize; | ||
file->GetFileSize(&actualFileSize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is suspicious/wrong that this is not reported by our ASSERT_STATUS_CHECKED CI test. (The return value is ignored and needs to be checked.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check with ASSERT_OK
env/env_test.cc
Outdated
}; | ||
|
||
// Validate GetFileSize API returns the right value. | ||
// Use the default implementation from Posix File system |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might not be posix depending on platform.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
Env* env_; | ||
}; | ||
|
||
// Validate GetFileSize API returns the right value. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about FileSystem::GetFileSize()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you elaborate more about this? Do you mean validate the file size against FileSystem::GetFileSize API?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something like that. I was just pointing out an inconsistency/incompleteness/ambiguity in what was claimed to be tested vs. actually tested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
env/env_test.cc
Outdated
std::unique_ptr<FSRandomAccessFile> file; | ||
ASSERT_OK(fs->NewRandomAccessFile(fname, FileOptions(), &file, nullptr)); | ||
|
||
uint64_t actualFileSize; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
APIs can lie or have bugs. I would call this "reportedFileSize".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
include/rocksdb/file_system.h
Outdated
@@ -1051,6 +1051,14 @@ class FSRandomAccessFile { | |||
// open. | |||
virtual Temperature GetTemperature() const { return Temperature::kUnknown; } | |||
|
|||
// Get the file size. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... on an open-for-reading file without re-seeking the file's path in the filesystem.
In other words, API documentation should not only include "what" something is but "why" something exists (if it's potentially redundant).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
uint64_t file_size_from_file_system = 0; | ||
Status s; | ||
s = file->file()->GetFileSize(&file_size_from_file_system); | ||
if (!s.ok()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we discussed that it would be OK for implementations not to implement the new function immediately. (Allow NotSupported here.) In fact, we should be testing that scenario in unit tests. We could consider doing that by holding back the env_encryption implementation until we make the function is required. (Justification for that state described well in comments in env_encryption.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
table/format.cc
Outdated
// manifest. Otherwise it is not guaranteed. | ||
return Status::Corruption( | ||
"Sst file size mismatch: " + file->file_name() + ". Expected " + | ||
std::to_string(expected_file_size) + ", actual size " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer "reported" to "actual"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@xingbowang has updated the pull request. You must reimport the pull request before landing. |
@xingbowang has updated the pull request. You must reimport the pull request before landing. |
@xingbowang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Deprecate skip_checking_sst_file_sizes_on_db_open option. Summary: This change is used to address this issue facebook#13619 It supports file size validation in ReadFooterFromFile. In favor of this change, CheckConsistency function and skip_checking_sst_file_sizes_on_db_open flag are deprecated. The CheckConsistency function checks each file size matches what was recorded in manifest during DB open. Meantime, ReadFooterFromFile was called for each file in LoadTables function. Since ReadFooterFromFile always validates file size, the CheckConsistency is redundant. In addtion, CheckConsistency is ecxecuted in a single thread. This could slow down DB open when a network file system is used. Therefore, the flag skip_checking_sst_file_sizes_on_db_open was added to skip this check. After this change, ReadFooterFromFile was executed in parallel through multiple threads. Therefore, the concern of DB open slowness is eliminated, and the flag could be deprecated. There is 2 slight concerns of this change. *If max_open_files is set with smaller value, engine will not open all the files during DB open. This means if there is a corruption on file size, it will not be detected during DB open, but rather at a later time. Since the default is -1, which means open all the files, and it is rarely overridden and a lot of new features rely on it to be -1, the risk is very low. *If FIFO compaction is used, engine could fail to open DB unnecessarily on the corrupted files that would never be used again. However, this is a very rare case as well. The error could still be ignored by setting paranoid_checks operationally. The risk is very low. To remain backward compatibility. The public facing flag was kept and marked as no-op internally. Another change is required to fully remove the flag. Test Plan: make check A new unit test was added to validate file size check API works as expected.
@xingbowang has updated the pull request. You must reimport the pull request before landing. |
@xingbowang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Summary:
This change is used to address this issue
#13619
It supports GetFileSize API in FSRandomAccessFile. This allows ReadFooterFromFile to quickly get the file size for file size validation.
Test Plan:
make check
Reviewers:
Peter Dillinger
Subscribers:
Tasks:
Tags: