8000 Replace most maps in CFG with vectors by jvilk-stripe · Pull Request #3346 · sorbet/sorbet · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
8000

Replace most maps in CFG with vectors #3346

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 39 commits into from
Aug 15, 2020
Merged

Replace most maps in CFG with vectors #3346

merged 39 commits into from
Aug 15, 2020

Conversation

jvilk-stripe
Copy link
Collaborator
@jvilk-stripe jvilk-stripe commented Aug 6, 2020

This PR replaces most maps/sets in CFG with vectors for a significant speedup on individual files with many basic blocks/variables (70%+ faster on one particular input I iterated on). This translates into a noticeable end-to-end speedup on large codebases and improves IDE responsiveness.

I was able to do this by introducing a LocalRef abstraction, which is a CFG-local reference to a local variable. Like other Ref classes, a LocalRef has a numerical ID that is the index into a vector of LocalVariables. A LocalRef is also half the size of a LocalVariable (4 bytes).

I use LocalRef for two data structures that were key to the speedup:

  • Vectors of size # of locals that act like maps (vector[LocalRef.id] = value for that LocalRef). These can be preallocated and support insertion/deletion with no further memory allocation at the cost of slower iteration thru elements in the vector.
  • Sorted vectors that contain LocalRef IDs, which act as sets. These are great for sparse sets that code needs to iterate through. It's cheap to take the union of two of these (linear time). These are less good for lookup (O(n) or O(logn) if using binary search).

While working on this change, I made a few additional common case optimizations. If distracting, they can be moved to subsequent PRs.

Motivation

Speed up typecheck.

Test plan

Covered by existing tests.

virtual std::string toString(const core::GlobalState &gs) const;
virtual std::string showRaw(const core::GlobalState &gs, int tabs = 0) const;
virtual std::string toString(const core::GlobalState &gs, const CFG &cfg) const;
virtual std::string showRaw(const core::GlobalState &gs, const CFG &cfg, int tabs = 0) const;
};
CheckSize(Send, 160, 8);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why didn't size change here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure; I'd think that VariableUseSite would be smaller. Will investigate.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VariableUseSite's size did not change. It contains a 16 byte TypePtr and a 4 byte LocalRef so, with padding, it's still 24 bytes.

@jvilk-stripe jvilk-stripe changed the title WIP: Replace maps in CFG with flat vectors WIP: Replace most maps in CFG with vectors Aug 14, 2020
}

// Convert sets to sorted vectors.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For large input sets, this up-front conversion cost pays huge dividends later when we take the union of sets / iterate through the entries.

const auto &blockWrites = target.writes[blockId];
vector<int> blockReadsAndWrites;
10000 blockReadsAndWrites.reserve(max(blockReads.size(), blockWrites.size()));
set_union(blockReads.begin(), blockReads.end(), blockWrites.begin(), blockWrites.end(),
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is O(sizeof(blockReads) + sizeof(blockWrites)) since we're merging two sorted vectors.

}
}
}
{
Timer timeit(ctx.state.tracer(), "privates2");
for (const auto &[local, usages] : usageCounts) {
auto local = 0;
vector<vector<int>> writesToRemove(maxBasicBlockId);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Individual item removal from a sorted list is O(log n) (if you use binary search) or O(n) (if you iterate in-order). But if you're removing a list of sorted items from a sorted list, it's just O(n) because you can remove all of them in one pass.

std::vector<UnorderedSet<core::LocalVariable>> reads;
std::vector<UnorderedSet<core::LocalVariable>> writes;
std::vector<UnorderedSet<int>> readsSet;
std::vector<std::vector<int>> reads;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some places need fast readset lookup, which is most efficient as an UnorderedSet. Other places need fast iteration / merging of read sets, which is most efficient as a vector. Since ReadsAndWrites is not mutated once it's created, I stored them as both.

BasicBlock *entry = res->entry();
auto selfClaz = md.symbol.data(ctx)->rebind();
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indented below so I can use RAII w/ UnfreezeCFGLocalVariables

if (fnd != aliases.end()) {
return fnd->second;
} else {
return what;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I simplified the function so the default return value is only in one place.

if (!bb->backEdges.empty()) {
// Take the intersection of all of the back edges' aliases.
Copy link
Collaborator Author
@jvilk-stripe jvilk-stripe Aug 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment helps explain the code changes below.

The previous code did the following to compute this intersection:

  • Block enters loop with an empty set of aliases.
  • If block has at least one backedge:
    • Set the current block's aliases to the aliases of the first backedge.
  • Take the intersection of current block's aliases and all of its backedges' aliases.

The new code does the following to compute this intersection:

  • Block enters loop with an empty set of aliases.
  • If block has at least one backedge:
    • Set the current block's aliases to the aliases of the first backedge.
    • Take the intersection of current block's aliases and all of its backedges' aliases except for the first one.

The change:

  • Removes the loop from the path of blocks w/o backedges (a common case)
  • For blocks with backedges, avoids a no-op loop that intersects two identical sets.
    • This results in noticeable savings since many blocks only have a single backedge.

}
}
}

// Overapproximation of the set of variables that have aliases.
// Will have false positives, but no false negatives. Avoids an expensive inner loop below.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This optimization avoids an inner loop below that, otherwise, makes the loop O(# bindings * # of aliases). On a file I was benchmarking, no invalidations ever occurred, so it was wasting time on that loop.

I made it an overapproximation because otherwise this needs to be a map with reference counting which seems like more effort than is needed.

auto &upperBoundsForBlock = upperBounds1[bb->id];
upperBoundsForBlock.insert(readsByBlock[bb->id].begin(), readsByBlock[bb->id].end());
}
upperBounds1 = readsByBlock;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like the code it replaces, upperBounds1 is initialized to readsByBlock.

if (bb->bexit.thenb != cfg.deadBlock()) {
upperBoundsForBlock.insert(upperBounds1[bb->bexit.thenb->id].begin(),
upperBounds1[bb->bexit.thenb->id].end());
setUnionInplace(upperBoundsForBlock, upperBounds1[bb->bexit.thenb->id]);
Copy link
Collaborator Author
@jvilk-stripe jvilk-stripe Aug 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function takes the union of two sorted vectors that are acting as sets. Its equivalent to the code it's replacing.

if (bb->outerLoops <= cfg.minLoops[dead]) {
upperBoundsForBlock.erase(dead);
const auto &deadForBlock = deadByBlock[bb->id];
if (!deadForBlock.empty()) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoids a vector allocation and a call to setDifferenceInplace if nothing is dead.

upperBoundsForBlock.erase(dead);
const auto &deadForBlock = deadByBlock[bb->id];
if (!deadForBlock.empty()) {
vector<int> toRemove;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's more efficient to remove everything at once than it is to remove items individually.

const auto set1El = *set1It;
const auto set2El = *set2It;
if (set1El == set2El) {
it->args.emplace_back(set1El);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since set1 and set2 are sorted, and it->args is empty at the start of this loop, we know that it->args will be sorted too.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be a good comment to move into the code.

@@ -288,13 +299,12 @@ void CFGBuilder::computeMinMaxLoops(core::Context ctx, const CFG::ReadsAndWrites
continue;
}

for (const core::LocalVariable what : RnW.reads[bb->id]) {
const auto minIt = cfg.minLoops.find(what);
int curMin = minIt != cfg.minLoops.end() ? minIt->second : INT_MAX;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: minLoops is now initialized to INT_MAX when we resize it to size maxVariableId.

const auto minIt = cfg.minLoops.find(what);
const auto maxIt = cfg.maxLoopWrite.find(what);
int curMin = minIt != cfg.minLoops.end() ? minIt->second : INT_MAX;
int curMax = maxIt != cfg.maxLoopWrite.end() ? maxIt->second : 0;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: maxLoopWrite is now initialized to 0 when we resize it to size maxVariableId.

} else {
// intoEl > fromEl
// Present in from, not in into.
// Rather than insert in the middle of the vector, break out of loop and do a set union.
Copy link
Collaborator Author
@jvilk-stripe jvilk-stripe Aug 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This saves a small amount of time when unioning larger vectors: repeatedly inserting into the middle of a vector involves moving all subsequent entries.

It's meager gains, so I can be convinced to use insert instead and have simpler code.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems fine to me, too bad you can't use absl::c_set_union here :(

@@ -24,11 +24,12 @@ core::TypePtr dropConstructor(core::Context ctx, core::Loc loc, core::TypePtr tp
}

KnowledgeFilter::KnowledgeFilter(core::Context ctx, unique_ptr<cfg::CFG> &cfg) {
used_vars.resize(cfg->maxVariableId);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

used_vars is now a vector<bool>

@@ -127,37 +127,37 @@ class Environment {
TestedKnowledge knowledge;
bool knownTruthy;
};
UnorderedMap<core::LocalVariable, VariableState> vars;
// TODO(jvilk): Use vectors.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I plan to revisit these data structures in a future change.

@jvilk-stripe jvilk-stripe changed the title WIP: Replace most maps in CFG with vectors Replace most maps in CFG with vectors Aug 14, 2020
@jvilk-stripe jvilk-stripe marked this pull request as ready for review August 14, 2020 19:03
But keep around UnorderedSets for efficient read lookups.
It's now much faster than before on large input.
…bles table.

Makes it clear when localVariables is no longer growing.
Copy link
Collaborator
@elliottt elliottt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@jvilk-stripe jvilk-stripe merged commit 978f01c into master Aug 15, 2020
@jvilk-stripe jvilk-stripe deleted the jvilk/faster-cfg branch August 15, 2020 03:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0