-
Notifications
You must be signed in to change notification settings - Fork 37.4k
Add cs_main annotation to WriteBatchSync(), drop lock in CDiskBlockIndex #24199
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add cs_main annotation to WriteBatchSync(), drop lock in CDiskBlockIndex #24199
Conversation
This allows us to remove the cs_main lock in CDiskBlockIndex::SERIALIZE_METHODS with the remaining thread safety warnings generated only by the tests.
Concept NACK. Prefer recursive locks over fragile assumptions about the rest of the codebase. |
My point was that the lock isn't needed to be recursively taken during serialization since the data is copied before serialization. |
Specifically, I don't think we should need |
I agree but didn't see how to update the fuzz tests to do it -- happy to update with suggestions. |
To me it looks like that it is not just the tests that would be problematic:
Both So, we have some methods in the (non-test) code that invoke |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code review ACK 0e052eb
re: #24199 (comment)
To me it looks like that it is not just the tests that would be problematic [...]
This is all true, CDiskBlockIndex
flow is a little convoluted. I think following my suggestion below to make relevant CBlockIndex
member variables private & nonguarded, you could keep the CDiskBlockIndex
class working by declaring it as a friend of CBlockIndex
, or by changing it to use SER_READ / SER_WRITE macros with specialized Set/Get accessor methods that don't require cs_main.
// and (b) if the analysis is enabled the nStatus compiler warnings are only | ||
// generated by the tests, as the callers in the codebase that invoke this | ||
// serialization hold cs_main and have lock annotations and assertions. | ||
SERIALIZE_METHODS(CDiskBlockIndex, obj) NO_THREAD_SAFETY_ANALYSIS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In commit "Drop lock and thread safety analysis in CDiskBlockIndex serialization" (0e052eb)
It'd be nice to do this without disabling thread safety checks. Though I can see it would require more work, and maybe better to do separately or as a followup. (IMO ideally we would have a linter to make sure every NO_THREAD_SAFETY_ANALYSIS
usage was accompanied by a link to a github issue where we could discuss the plan to reenable thread safety checks.)
In this case it looks like most obvious way of removing NO_THREAD_SAFETY_ANALYSIS
would be to replace public guarded variables with private unguarded variables, changing:
public:
int nFile GUARDED_BY(::cs_main){0};
unsigned int nDataPos GUARDED_BY(::cs_main){0};
unsigned int nUndoPos GUARDED_BY(::cs_main){0};
uint32_t nStatus GUARDED_BY(::cs_main){0};
to:
private:
int m_file_num = 0;
unsigned int m_data_pos = 0;
unsigned int m_undo_pos = 0;
uint32_t m_status = 0;
And add EXCLUSIVE_LOCKS_REQUIRED
accessor methods as needed to ensure variables are always accessed with the right locks held.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, how would that be different than the current code?
current: LOCK(cs_main); read nStatus
proposed: LOCK(cs_main); call GetNStatus()
and the ser/deser methods would still need to acquire cs_main
, no?
The access to all nStatus
/nDataPos
/nUndoPos
from within ser/deser has to happen in one atomic block. I mean that this is not ok:
lock
read nStatus
unlock
lock
read nDataPos
unlock
it has to be
lock
read nStatus
read nDataPos
unlock
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All CDiskBlockIndex
are short-lived copied data on the stack, shared with no other thread, so they don't need any locks. If there is a missing lock then it can at most be missed while copying the data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ryanofsky, now I see your point - to change the members of the parent class CBlockIndex
to be not-tagged as guarded by cs_main
and change them to private, in the hopes that the methods that touch them respect the implicit guarding. Then, access them without guarding from the child class CDiskBlockIndex
. I do not like this much because the members in CBlockIndex
need to be guarded, but will be without a GUARDED_BY()
tag. Hmm...
@MarcoFalke, I did not realize this before, thanks for pointing it out!
If there is a missing lock then it can at most be missed while copying the data
there is not - both users of CDiskBlockIndex
own cs_main
when creating such objects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MarcoFalke, I did not realize this before, thanks for pointing it out!
I pointed it only out thrice up to now 😅 . #24199 (comment) and #22932 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not like this much because the members in CBlockIndex need to be guarded, but will be without a GUARDED_BY() tag.
I should probably explain reasoning behind what I suggested yesterday. I don't think this is a question what anybody likes or doesn't like, but a question of what's right and wrong. It's wrong to annotate these struct members as being guarded by cs_main, because they aren't guarded by it. The whole struct is not even guarded by cs_main, just certain instances of it are. What is guarded by cs_main is the m_block_index
map and pointers & references to CBlockIndex instances specifically inside that map.
The correct way to annotate this code would be not to annotate anything in CBlockIndex (Ideally I would rename it to struct BlockInfo
and make it plain struct with no methods), but to not expose this low-level struct to high level application code. Higher level code would use a BlockReference
accessor class that has a BlockInfo&
member and methods appropriately annotated with EXCLUSIVE_LOCKS_REQUIRED(cs_main) to enforce the access pattern which enables our lock usage. CBlockIndex*
pointers throughout the code would be replaced by BlockReference
objects. In my ideal world, this would look something like:
struct BlockInfo {
uint256 hash;
int height = 0;
BlockInfo* prev = nullptr;
/// ... more members here ... ///
};
struct BlockState : public BlockInfo {
int flags = 0;
int file_num = -1;
unsigned data_pos = 0;
unsigned undo_pos = 0;
};
class BlockReference {
private:
BlockState& m_block;
public:
BlockInfo& info() const { return m_block; }
BlockState& state() EXCLUSIVE_LOCKS_REQUIRED(cs_main) { return m_block; }
BlockReference prev() const { return {static_cast<BlockState&>(*Assert(m_block.prev))}; }
// Could add more fine-grained accessors like height() or flags() or whatever.
};
class BlockManager {
private:
std::unordered_set<BlockState> m_blocks GUARDED_BY(cs_main);
public:
BlockReference FindBlock(const uint256& hash) EXCLUSIVE_LOCKS_REQUIRED(cs_main);
/// ... more stuff ... ///
}
The appoach I suggested yesterday does the same thing but keeps everything mushed into the same CBlockIndex class/struct hybrid instead of splitting it out. It would be pretty easy to implement what I suggested yesterday as a followup to this PR to re-enable thread safety checking. In any case I was not suggesting doing something bigger like that in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to use another wording (not like/dislike):
It is a design smell to have a variable without GUARDED_BY()
and rely on EXCLUSIVE_LOCKS_REQUIRED()
on some methods to protect it. This way, if the code goes wrong (like, new access is added to the variable from a function not marked with EXCLUSIVE_LOCKS_REQUIRED()
) then the compiler has no way to detect that. This defeats the purpose of thread safety annotations which is to enforce proper access and detect mistakes if the code goes wrong.
Your suggestion above does that:
replace public guarded variables with private unguarded variables ... And add
EXCLUSIVE_LOCKS_REQUIRED
accessor methods as needed to ensure variables are always accessed with the right locks held.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review feedback! Given the number of pulls that are currently touching this area of the codebase, it seems good to keep this change minimal and perhaps propose these ideas as a follow-up (happy to review them).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a design smell to have a variable without
GUARDED_BY()
and rely onEXCLUSIVE_LOCKS_REQUIRED()
on some methods to protect it. This way, if the code goes wrong (like, new access is added to the variable from a function not marked withEXCLUSIVE_LOCKS_REQUIRED()
) then the compiler has no way to detect that. This defeats the purpose of thread safety annotations which is to enforce proper access and detect mistakes if the code goes wrong.
These are mostly incorrect claims made without specific reasoning or justification. If you have an inner variable that does not require lucking, and an outer data structure containing the variable, or an outer pointer pointing to the variable, or an outer function accessing the variable, and the outer container/pointer/function does require locking, the correct way of annotating the code is to not annotate the variable, and to annotate the outer container/pointer/function.
If you look at the actual code I posted in #24199 (comment), you can see that the compiler will detect all unsafe accesses. If you want to talk about a code smell, NO_THREAD_SAFETY_ANALYSIS is the code smell, and the design in that post shows how to get rid of it.
That said, there is a difference between the bigger change I posted yesterday #24199 (comment), and the smaller suggestion #24199 (comment) I posted two days ago. In both cases the compiler can guarantee thread safety outside of chain.h/chain.cpp if these files are annotated correctly. But in the bigger change where I split up CBlockIndex class into separate BlockInfo / BlockState / BlockReference components, it's easy to annotate everything correctly, while in the smaller change where CBlockIndex is kept as a monolithic class, what you are saying about it being trickier to see notice missing EXCLUSIVE_LOCKS_REQUIRED annotations on CBlockIndex methods is true.
If you want to argue that using incorrect annotations is better than using correct ones for practical reasons (e.g. it is better to use NO_THREAD_SAFETY_ANALYSIS in one method, so the rest of the file is easier to maintain), that's fine. But let's not create confusion about what correct annotations are and what incorrect annotations are. Correct annotations do not require using NO_THREAD_SAFETY_ANALYSIS, and that's just the most obvious clue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrt NO_THREAD_SAFETY_ANALYSIS
I tend to agree with @luke-jr: #24199 (comment). This is why I did not ACK this PR, which contains NO_THREAD_SAFETY_ANALYSIS
. Personally, I do not see a good and elegant solution to the problem this PR aims to fix (yet).
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
Are you still working on this? |
Closing this as it has not had any activity in a while. If you are interested in continuing work on this, please leave a comment so that it can be reopened. |
Yes, sorry not to have seen this comment and replied. There is valuable review feedback here so it may be helpful to re-open here rather than open a new pull (mentioning as I do not have the privileges to re-open it.) |
Moved this to draft. It's not clear what the status of this is. There is clearly disagreement about the approach amongst contributors. Was re-opened to address feedback, but no activity in the 4 months since. |
🐙 This pull request conflicts with the target branch and needs rebase. |
Closing temporarily so that I can re-open it -- am still interested to work on this. |
Addresses review feedback by Marco Falke in #22932 (comment):
add
cs_main
thread safety annotation toCBlockTreeDB::WriteBatchSync()
, which allows us toremove the
cs_main
lock inCDiskBlockIndex::SERIALIZE_METHODS
with respect to its 2 callers in the codebase,CBlockTreeDB::LoadBlockIndexGuts()
andCBlockTreeDB::WriteBatchSync()
, that already hold the lock and whose callers hold the lock.I didn't manage to remove the thread safety warnings generated by the fuzz tests (suggestions welcome) and resolved them here with a
NO_THREAD_SAFETY_ANALYSIS
annotation and explanatory documentation.