-
Notifications
You must be signed in to change notification settings - Fork 40
u3: adds new, page-oriented memory allocator #812
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
joemfb
wants to merge
123
commits into
develop
Choose a base branch
from
jb/palloc
base: develop
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
… pack also fixes use-after-free in _ca_prag bitmaps on realloc
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR replaces the u3 allocator with a new, page-aware design (cribbed directly from phkmalloc). Opened as a draft for now pending a migration from the old allocator/snapshot.
The primary design consideration of phkmalloc is to minimize the number of pages that it touches. This aligns perfectly with the constraints of a persistent allocation arena, minimizing write-amplification when updating a snapshot. phkmalloc further ensures natural alignment at every allocation size up to the page size (all powers of two), and page alignment after that. These are stronger alignment guarantees than we require in any scenario, so all alignment is now implicit (other than the page-alignment of a new road), and all heap objects are aligned to at least 16 bytes (allowing an additional bit of pointer compression). All allocation metadata and freelists are stored out-of-band; allocated "as if" with the allocator itself. This shrinks our cells from 24 to 16 bytes and ensures that free pages are actually free, allowing us to store more data and further reducing churn in snapshot updates. Rounding small allocations up to power-of-two sizes reduces the overhead of searching free lists, allowing us to removing a performance hack from the old allocator that had significantly increased fragmentation. In practice, persistent state and snapshot have been observed to be roughly 20% smaller with this allocator.
The absence of extra space for inline metadata in our allocations (no more "boxes") dictates redesigns of our garbage collection mechanisms. Reference counts are now in "userspace": an ad-hoc part of the application data inside an allocation (conventionally, the first word) instead of allocator-level metadata. So our mark and sweep collector (actually two collectors, switched by
u3o_debug_ram
) requires that extra space be allocated on the side to track mark bits or recalculated refcounts. This has the advantageous effect that|mass
no longer dirties clean, persisted pages on the home road, making it much faster and cheaper (and therefore practical to run automatically). Similarly, our mark/compact collector (|pack) requires out-of-band relocation state. To minimize overhead, those relocations are not stored, but calculated from compact bitmaps in a design cribbed from the Clozure Common Lisp compiler by way of Factor.