-
Notifications
You must be signed in to change notification settings - Fork 565
Replace most maps in CFG with vectors #3346
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
virtual std::string toString(const core::GlobalState &gs) const; | ||
virtual std::string showRaw(const core::GlobalState &gs, int tabs = 0) const; | ||
virtual std::string toString(const core::GlobalState &gs, const CFG &cfg) const; | ||
virtual std::string showRaw(const core::GlobalState &gs, const CFG &cfg, int tabs = 0) const; | ||
}; | ||
CheckSize(Send, 160, 8); |
There was a problem hiding this comment. 8000
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why didn't size change here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure; I'd think that VariableUseSite
would be smaller. Will investigate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
VariableUseSite
's size did not change. It contains a 16 byte TypePtr
and a 4 byte LocalRef
so, with padding, it's still 24 bytes.
530b77d
to
2b3e9bd
Compare
} | ||
|
||
// Convert sets to sorted vectors. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For large input sets, this up-front conversion cost pays huge dividends later when we take the union of sets / iterate through the entries.
const auto &blockWrites = target.writes[blockId]; | ||
vector<int> blockReadsAndWrites; | ||
10000 blockReadsAndWrites.reserve(max(blockReads.size(), blockWrites.size())); | ||
set_union(blockReads.begin(), blockReads.end(), blockWrites.begin(), blockWrites.end(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is O(sizeof(blockReads) + sizeof(blockWrites))
since we're merging two sorted vectors.
} | ||
} | ||
} | ||
{ | ||
Timer timeit(ctx.state.tracer(), "privates2"); | ||
for (const auto &[local, usages] : usageCounts) { | ||
auto local = 0; | ||
vector<vector<int>> writesToRemove(maxBasicBlockId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Individual item removal from a sorted list is O(log n)
(if you use binary search) or O(n)
(if you iterate in-order). But if you're removing a list of sorted items from a sorted list, it's just O(n)
because you can remove all of them in one pass.
std::vector<UnorderedSet<core::LocalVariable>> reads; | ||
std::vector<UnorderedSet<core::LocalVariable>> writes; | ||
std::vector<UnorderedSet<int>> readsSet; | ||
std::vector<std::vector<int>> reads; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some places need fast readset lookup, which is most efficient as an UnorderedSet
. Other places need fast iteration / merging of read sets, which is most efficient as a vector. Since ReadsAndWrites
is not mutated once it's created, I stored them as both.
BasicBlock *entry = res->entry(); | ||
auto selfClaz = md.symbol.data(ctx)->rebind(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indented below so I can use RAII w/ UnfreezeCFGLocalVariables
if (fnd != aliases.end()) { | ||
return fnd->second; | ||
} else { | ||
return what; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I simplified the function so the default return value is only in one place.
if (!bb->backEdges.empty()) { | ||
// Take the intersection of all of the back edges' aliases. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment helps explain the code changes below.
The previous code did the following to compute this intersection:
- Block enters loop with an empty set of aliases.
- If block has at least one backedge:
- Set the current block's aliases to the aliases of the first backedge.
- Take the intersection of current block's aliases and all of its backedges' aliases.
The new code does the following to compute this intersection:
- Block enters loop with an empty set of aliases.
- If block has at least one backedge:
- Set the current block's aliases to the aliases of the first backedge.
- Take the intersection of current block's aliases and all of its backedges' aliases except for the first one.
The change:
- Removes the loop from the path of blocks w/o backedges (a common case)
- For blocks with backedges, avoids a no-op loop that intersects two identical sets.
- This results in noticeable savings since many blocks only have a single backedge.
} | ||
} | ||
} | ||
|
||
// Overapproximation of the set of variables that have aliases. | ||
// Will have false positives, but no false negatives. Avoids an expensive inner loop below. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This optimization avoids an inner loop below that, otherwise, makes the loop O(# bindings * # of aliases)
. On a file I was benchmarking, no invalidations ever occurred, so it was wasting time on that loop.
I made it an overapproximation because otherwise this needs to be a map with reference counting which seems like more effort than is needed.
auto &upperBoundsForBlock = upperBounds1[bb->id]; | ||
upperBoundsForBlock.insert(readsByBlock[bb->id].begin(), readsByBlock[bb->id].end()); | ||
} | ||
upperBounds1 = readsByBlock; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like the code it replaces, upperBounds1
is initialized to readsByBlock
.
cfg/builder/builder_finalize.cc
Outdated
if (bb->bexit.thenb != cfg.deadBlock()) { | ||
upperBoundsForBlock.insert(upperBounds1[bb->bexit.thenb->id].begin(), | ||
upperBounds1[bb->bexit.thenb->id].end()); | ||
setUnionInplace(upperBoundsForBlock, upperBounds1[bb->bexit.thenb->id]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function takes the union of two sorted vectors that are acting as sets. Its equivalent to the code it's replacing.
if (bb->outerLoops <= cfg.minLoops[dead]) { | ||
upperBoundsForBlock.erase(dead); | ||
const auto &deadForBlock = deadByBlock[bb->id]; | ||
if (!deadForBlock.empty()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoids a vector allocation and a call to setDifferenceInplace
if nothing is dead.
upperBoundsForBlock.erase(dead); | ||
const auto &deadForBlock = deadByBlock[bb->id]; | ||
if (!deadForBlock.empty()) { | ||
vector<int> toRemove; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's more efficient to remove everything at once than it is to remove items individually.
const auto set1El = *set1It; | ||
const auto set2El = *set2It; | ||
if (set1El == set2El) { | ||
it->args.emplace_back(set1El); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since set1
and set2
are sorted, and it->args
is empty at the start of this loop, we know that it->args
will be sorted too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be a good comment to move into the code.
@@ -288,13 +299,12 @@ void CFGBuilder::computeMinMaxLoops(core::Context ctx, const CFG::ReadsAndWrites | |||
continue; | |||
} | |||
|
|||
for (const core::LocalVariable what : RnW.reads[bb->id]) { | |||
const auto minIt = cfg.minLoops.find(what); | |||
int curMin = minIt != cfg.minLoops.end() ? minIt->second : INT_MAX; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: minLoops
is now initialized to INT_MAX
when we resize it to size maxVariableId
.
const auto minIt = cfg.minLoops.find(what); | ||
const auto maxIt = cfg.maxLoopWrite.find(what); | ||
int curMin = minIt != cfg.minLoops.end() ? minIt->second : INT_MAX; | ||
int curMax = maxIt != cfg.maxLoopWrite.end() ? maxIt->second : 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: maxLoopWrite
is now initialized to 0
when we resize it to size maxVariableId
.
} else { | ||
// intoEl > fromEl | ||
// Present in from, not in into. | ||
// Rather than insert in the middle of the vector, break out of loop and do a set union. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This saves a small amount of time when unioning larger vectors: repeatedly inserting into the middle of a vector involves moving all subsequent entries.
It's meager gains, so I can be convinced to use insert
instead and have simpler code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems fine to me, too bad you can't use absl::c_set_union
here :(
infer/environment.cc
Outdated
@@ -24,11 +24,12 @@ core::TypePtr dropConstructor(core::Context ctx, core::Loc loc, core::TypePtr tp | |||
} | |||
|
|||
KnowledgeFilter::KnowledgeFilter(core::Context ctx, unique_ptr<cfg::CFG> &cfg) { | |||
used_vars.resize(cfg->maxVariableId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
used_vars
is now a vector<bool>
@@ -127,37 +127,37 @@ class Environment { | |||
TestedKnowledge knowledge; | |||
bool knownTruthy; | |||
}; | |||
UnorderedMap<core::LocalVariable, VariableState> vars; | |||
// TODO(jvilk): Use vectors. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I plan to revisit these data structures in a future change.
But keep around UnorderedSets for efficient read lookups.
It's now much faster than before on large input.
…ration. Shaves 100ms off of a big file.
…bles table. Makes it clear when localVariables is no longer growing.
ee2ce07
to
1029f1f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀
This PR replaces most maps/sets in CFG with vectors for a significant speedup on individual files with many basic blocks/variables (70%+ faster on one particular input I iterated on). This translates into a noticeable end-to-end speedup on large codebases and improves IDE responsiveness.
I was able to do this by introducing a
LocalRef
abstraction, which is a CFG-local reference to a local variable. Like otherRef
classes, aLocalRef
has a numerical ID that is the index into a vector ofLocalVariable
s. ALocalRef
is also half the size of aLocalVariable
(4 bytes).I use
LocalRef
for two data structures that were key to the speedup:While working on this change, I made a few additional common case optimizations. If distracting, they can be moved to subsequent PRs.
Motivation
Speed up typecheck.
Test plan
Covered by existing tests.