-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Advanced path from env generation #1263
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). 📝 Please visit https://cla.developers.google.com/ to sign. Once you've signed (or fixed any issues), please reply here with What to do if you already signed the CLA< 8000 h5 dir="auto">Individual signers
Corporate signers
ℹ️ Googlers: Go here for more info. |
@googlebot I signed it! |
CLAs look good, thanks! ℹ️ Googlers: Go here for more info. |
Apologies for delay. I guess I am finally okay with just doing this. Wasn't sure if changing behavior is fine or not. But I have a comment about change itself. So now that we always do pid thing, doesn't look like we need "or bit 7" trick anymore. Can you please clean up this part? (Or did I misundertand something) |
Yes, I don't think we need the extra bit trick anymore. Let me have a look. I assume we need to avoid parents setting it. So it'll be in a different place. |
8545600
to
1d15462
Compare
Done cleaning up and updating the PR. |
Eh. I am very very sorry. Looking at this once more I cannot agree with "always append pid" approach. For most people and me too, main workload with cpu profiler is something like:
Forcing pid in all file names there will be super-annoying. I cannot agree to that. I am truly sorry for saying yes to this and making you do the rebase thingy. Perhaps the idea with %p in file name is workable one. It will also let us deal with high bit thingy in better and cleaner way. Alternative is to look at extra variables like CPUPROFILE_USE_PID to signal that we need pids appended. Please excuse me. And do let me know if you want to work on this larger change or not. In later case I am happy to make up for the trouble and author right change myself. |
It's ok I understand.
NP
This seems like the easiest way to do as long as you don't envision more items to be added to the filename. So I can implement that. In addition to that I'd propose to add the code that can detect HPC environments and adaptively do useful stuff there. For example
|
Sure. Having common mpi environments autodetected seems okay. Go ahead. |
@alk I've pushed the non-tested code to this PR. It builds but I've not checked it at runtime. |
Conceptually looks fine. I recommend adding more comments about motivation for detecting HPC environment (basically, what you wrote above likelyhood of same machine running multiple replicas of same program). If you don't mind, also consider adding comments on the high bit trick (why etc). |
@alk please check the comments and approve. |
LGTM. Lets get this in. Yes, please, squash and I'll merge. |
ffa6d22
to
a396ac8
Compare
Squashed ... note that I haven't tested it with OMPI & Slurm yet |
Noted. You want to test it first? Or we assume if it doesn't work you/someone will fix it later ? |
I plan to do release in about week or so. So you don't need to rush. |
If you ok testing the common cases yourself and if they work I'll take care of testing the HPC case sometime later (in a week timeframe) and file a fix if it doesn't work. |
Yes I am fine testing basic case myself. So I still see 2 commits being proposed here. You still plan to squash them together? (I'd prefer that for sure) |
a396ac8
to
72d3ab9
Compare
Fixed! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Somehow it broke unittests. So needs revision
7992816
to
87f0c76
Compare
Unit tests are passing |
tests are failing. For example heap-profiler_unittest For the new file copyright year is not 2005 is not google. See for example safe_strerror.cc how it can be done. Also you sure that unique path test requires linking to tcmalloc_minimal ? Try minimal thing; only sysinfo and nothing else. Why do you do fancy ptr and len thingy for AppendHPCEnvironment? It is always appending to empty PATH_MAX sized string. Why bother will all the extra logic? Also if you choose to keep append{Ptr,Len} then do make sure to keep naming style consistent. |
Ok, I'll take a look. I was referring to my new test, haven't tried existing.
Ok
Ok, will try that. I wasn't sure.
It is not appending to the empty prefix always. At least this is not the case for the PID suffix. PID suffix may be on its own but also appended to the PMIX or Slurm suffixes already present.
Ok |
I wasn't sure about how to link with the whole LD_PRELOAD-able library. I can try adding only "sysinfo", but assumed it might have dependencies either now or in the future that would introduce additional maintenance hassle later. |
For me the heap profiler is passing:
Am I looking at the wrong place? Should I run it in some special way? |
Will |
Ok, I guess I was able to reproduce
Fixing now |
Tests should be passing now.
I also fixed the naming of the functions per below:
|
bd6a6ea
to
53efafa
Compare
|
b66f6be
to
4fc3cb8
Compare
Thanks for update. Several more questions. (Some I can fix myself but this is never ideal if I have to amend people's work)
which saves all the trouble with AppendXYZ stuff (and confusion inspiring O(N^2) behaviors; yes, I might by only one irked by this and this code isn't exactly speed critical but still) |
Ah and yes, fine to leave makefile bits to me. Dependency on libtcmalloc is unfortunate and fixable, but several other unit tests are in same shape. So I'll be fixing that separately. And I can't be asking people to learn autotools bits. |
I see you closed it. What are the next steps? From responses it seems you were asking for further fixes, but in this case it should not be closed right? |
Oh. I apologize. Had no intention to close. Must have hit by mistake. Yes, if you could adjust further I'd be thankful. |
I will fix as requested |
* Add support for known HPC environments (TODO: needs to be extended with more nevironments) * Added the "CPUPROFILE_USE_PID" environment variable to force appending PID for the non-covered environments * Preserve the old way of handling the Child-Parent case Signed-off-by: Artem Polyakov <artpol84@gmail.com>
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
4fc3cb8
to
86450ad
Compare
@alk, I did a pass to address the comments. Please see below
Fixed - I've removed that item from the Makefile.am as I'm not sure how to properly generate it.
N/A - I've got rid of this function per one of the requests below
Not sure what's the best way to handle it. #define TC_ENV_MAX_NAME 256 that defines the max expected variable name.
Good catch, fixed
Fixed
Indeed, this is very subjective. I tried to address, but I'm not sure I am able to guess what you would prefer:
Fixed I've also added some macros to sysinfo.h to make sure that the unit test and sysinfo.cc are aligned. I know you didn't want to use the macro for this, but I believe it's more reliable as we actually using these things from 2 places. #define TC_ENV_PID_SUFFIX "_"
#define TC_ENV_PMIX_RANK "PMIX_RANK"
#define TC_ENV_PMIX_SUFFIX ".rank-"
#define TC_ENV_SLURM_SUFFIX ".slurmid-"
#define TC_ENV_SLURM_JOBID "SLURM_JOB_ID"
#define TC_ENV_SLURM_PROCID "SLURM_PROCID" |
Looking. Thanks a lot for taking your time to work on this. |
applied. thanks. (I ended up massaging your changes in separate commit) |
Thanks! |
Currently, only the children output files will get the PID.
However if using this tool to analyze MPI applications, it is
possible that multiple processes not related with parent-child
relations are running on the same host.
The other option would be to do a printf-like thing
HEAPPROFILE=/tmp/profile-%p
where
%p
portion would be replaced with the PID.However, having the PID in the file name is always a good idea IMO.
Signed-off-by: Artem Y. Polyakov artpol84@gmail.com