8000 Simplify copy bits by gc00 · Pull Request #354 · mpickpt/mana · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Simplify copy bits #354

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Simplify copy bits #354

wants to merge 2 commits into from

Conversation

gc00
Copy link
Collaborator
@gc00 gc00 commented Aug 28, 2023

This has various bug fixes from dev/gdc0/simplifyCopyBits on origin, as well as cleaning up some code. I've tried to re-organize the order of the commits so that the obviously important commits come first. There was a report by @JainTwinkle that some of the commits caused problems when experimenting with Open MPI. I'm hoping that at least up to 31c9538 , everything is vanilla, and should work.

@JainTwinkle , could you test up to this commit (any maybe to the last commit), and tell us what works and what doesn't work for Open MPI?

Thanks.

@gc00 gc00 added the bug Something isn't working label Aug 28, 2023
@gc00 gc00 requested a review from JainTwinkle August 28, 2023 23:40
@gc00 gc00 force-pushed the simplifyCopyBits branch from 58b49b8 to 94e527e Compare August 30, 2023 23:27
@gc00
Copy link
Collaborator Author
gc00 commented Aug 30, 2023

@JainTwinkle ,
Thanks again for agreeing to test this PR against MANA/Open MPI.
If you want to simply do something quick, I suggest testing with only the commits through:
3608f94
"[TODO: split_process.cpp: MANA_USE_LH_FIXED_ADDRES]"

This is the same as:

  git fetch gc00 simplifyCopyBits:simplifyCopyBits
  git checkout simplifyCopyBits
  # Next, ignore the last two commits for your testing
  git checkout simplifyCopyBits^^

Please let me know if this works with MANA/Open MPI, and if not, then how many of the earlier commits will work. Then I will push into 'main' whichever commits seem to work with MANA/Open MPI.
Most of this code is a few very simple fixes, followed by massive cleaning up of the code that should still retain the original semantics.

As I explained, I'm trying to clean up the ugly code from the academic prototype. Then I'll push the cleaner code into 'main'.
The alternative would be for me to add on top of the existing ugly code, and then later port it to the clean code. But it will make my life a lot easier if you can test the clean code now, and then I can add on top of the clean code directly. Thanks!!

@gc00
Copy link
Collaborator Author
gc00 commented Sep 1, 2023

@JainTwinkle ,
Could you please test this PR against MANA/Open MPI one more time?

Thanks for catching my earlier bug with where to save lh_info.  I believe I've fixed it now.  I'm only using lh_info_addr (LowerHalfInfo_t *), and not lh_info itself.  So, it's a pointer into the lh_proxy memory.

If you want to simply do something quick, I suggest testing with _only_ the commits through:
24bd2120ef77b1f1ce4e53cd7a0f5454bf47c8e5
"[TODO: split_process.cpp: MANA_USE_LH_FIXED_ADDRES]"

This is the same as:

  git fetch gc00 simplifyCopyBits:simplifyCopyBits
  git checkout simplifyCopyBits
  # Next, ignore the last two commits for your testing
  git checkout simplifyCopyBits^^

Please let me know if this works with MANA/Open MPI, and if not, then how many of the earlier commits will work. Then I will push into 'main' whichever commits seem to work with MANA/Open MPI.
As before, each individual commit is small, and the goal for most of the commits is to keep the same semantics, but make the code cleaner.

For some reason, I don't see these bugs when testing against wave_mpi. So, your test with MANA/OpenMPI is very valuable. Thanks!

@gc00
Copy link
Collaborator Author
gc00 commented Sep 1, 2023

@JainTwinkle ,
I just force-pushed to PR #354 to fix the issue with the dmtcp submodule that you observed. The DMTCP submodule should now be updated only with the corresponding commit, and you're testing only before that commit.

The code to test is exactly the same.
When you do git checkout simplifyCopyBits^^, the code will be the same, but the hash for that commit will have changed.

You can continue to work with your old version, or do a git pull --rebase on this updated PR.

Thanks again for this testing!!!

 * restart_plugin/mtcp_restart.c:reserveUpperHalfMemoryRegionsForCkptImgs()
   requires start1, end1, start2, end2.
 * start2 is roughly low end of stack, and end2 is 8 MB higher.
 * But because of address space randomization, the stacks
   of different ranks can be 1 GB away or more.  So, choosing
   [ minHighMemStart, minHighMemStart + 8MB] is not sufficiient
   to reserve all possible stack regions.
 * So, this adds maxHighMemEnd (the maximum of highMemStart+8MB
   for the highMemStart (stack) in each rank.
@gc00
Copy link
Collaborator Author
gc00 commented Sep 13, 2023

This has two remaining commits that may or may not be useful. Most of the original commits disappeared when we pushed a lot of dev/gdc0/simplifyCopyBits into main.

I'll come back later and see if these commits are useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant
0