8000 Don't skip munmap of mtcp_restart regions. by karya0 · Pull Request #353 · mpickpt/mana · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Don't skip munmap of mtcp_restart regions. #353

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

karya0
Copy link
Collaborator
@karya0 karya0 commented Aug 28, 2023

No description provided.

@karya0 karya0 requested review from gc00 and jiamingz9925 August 28, 2023 18:26
Copy link
Collaborator
@gc00 gc00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please modify the comment, and add the extra requested comment. Otherwise, in the future, we look back at this mysterious special case, and wonder why.

Separately, I assume that you're going to squash the two commits together, before pushing this in.

I'd like to wait to see the added comment before approving, just to make sure we're documenting the code well. Thanks.

@@ -849,6 +849,11 @@ mtcp_plugin_skip_memory_region_munmap(Area *area, RestoreInfo *rinfo)
LhCoreRegions_t *lh_regions_list = NULL;
int total_lh_regions = lh_info->numCoreRegions;

// Don't skip munmap of mtcp_restart regions.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's change the comment to remove the double negative:
// Do an munmap of mtcp_restart regions during restart. Don't skip this.

Also, please add a comment about why we need to munmap the mtcp_restart regions within MANA, but we don't need to do that within ordinary DMTCP. Where is the potential address conflict that we're trying to avoid?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to add that we should not skip all [heap] but only the [heap] right after the mtcp_restart region.

@@ -849,6 +849,12 @@ mtcp_plugin_skip_memory_region_munmap(Area *area, RestoreInfo *rinfo)
LhCoreRegions_t *lh_regions_list = NULL;
int total_lh_regions = lh_info->numCoreRegions;

// Don't skip munmap of mtcp_restart regions.
if (mtcp_strendswith(area->name, "/mtcp_restart") ||
mtcp_strendswith(area->name, "[heap]")) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we only want to skip the [heap] right after mtcp_restart. Also in mpi_plugin.cpp we need to do the same to skip those area for libsStart consideration.

@@ -849,6 +849,12 @@ mtcp_plugin_skip_memory_region_munmap(Area *area, RestoreInfo *rinfo)
LhCoreRegions_t *lh_regions_list = NULL;
int total_lh_regions = lh_info->numCoreRegions;

// Don't skip munmap of mtcp_restart regions.
if (mtcp_strendswith(area->name, "/mtcp_restart") ||
mtcp_strendswith(area->name, "[heap]")) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update: For some reason unmap the heap region here would cause seg fault, but without unmapping it we will encounter conflict as well later. @karya0

@gc00
Copy link
Collaborator
gc00 commented Aug 29, 2023

@karya0 ,
This issue is a blocker for MANA development. I hope you can get back to it soon.
Best,

  • Gene

@karya0
Copy link
Collaborator Author
karya0 commented Aug 30, 2023

@gc00 : This PR is insufficient for the fix. The problem lies in how lower-half/lh-proxy are accounting "core" vs rest of the regions. The current logic in the split process considers all areas until [heap] as core regions and refuses to munmap them. This includes the mtcp_restart region as well.

Further, the upper-half plugin, mpi_plugin.cpp, logic incorrectly labels the heap created by the new lh-proxy process as part of the upper half and saves it as part of checkpoint. That's why heap also sees a conflict on second restart.

We need to come up with a proper fix to handle both cases. This PR can plaster over the mtcp_restart conflict but not heap.

@jiamingz9925
Copy link
Collaborator

@karya0 @gc00 I can try to do some experiment in my forked repo and based on this PR as well

@gc00
Copy link
Collaborator
gc00 commented Sep 3, 2023

See PR #357 for the continuation of this analysis. We should probably close this PR without committing

@gc00
Copy link
Collaborator
gc00 commented Sep 13, 2023

@karya0 , If this PR #353 is now obsolete, can you close it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0