8000 Advanced path from env generation by artpol84 · Pull Request #1263 · gperftools/gperftools · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Advanced path from env generation #1263

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Aug 9, 2023
Merged

Conversation

artpol84
Copy link
Contributor
@artpol84 artpol84 commented Apr 1, 2021

Currently, only the children output files will get the PID.
However if using this tool to analyze MPI applications, it is
possible that multiple processes not related with parent-child
relations are running on the same host.

The other option would be to do a printf-like thing
HEAPPROFILE=/tmp/profile-%p
where %p portion would be replaced with the PID.
However, having the PID in the file name is always a good idea IMO.

Signed-off-by: Artem Y. Polyakov artpol84@gmail.com

@googlebot
Copy link

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed (or fixed any issues), please reply here with @googlebot I signed it! and we'll verify it.


What to do if you already signed the CLA

< 8000 h5 dir="auto">Individual signers
Corporate signers

ℹ️ Googlers: Go here for more info.

@artpol84
Copy link
Contributor Author
artpol84 commented Apr 1, 2021

@googlebot I signed it!

@googlebot
Copy link

CLAs look good, thanks!

ℹ️ Googlers: Go here for more info.

@alk
Copy link
Contributor
alk commented Jul 9, 2023

Apologies for delay. I guess I am finally okay with just doing this. Wasn't sure if changing behavior is fine or not. But I have a comment about change itself.

So now that we always do pid thing, doesn't look like we need "or bit 7" trick anymore. Can you please clean up this part? (Or did I misundertand something)

@artpol84
Copy link
Contributor Author

Yes, I don't think we need the extra bit trick anymore. Let me have a look. I assume we need to avoid parents setting it. So it'll be in a different place.

@artpol84 artpol84 force-pushed the topic/fix_outfile branch from 8545600 to 1d15462 Compare July 10, 2023 21:29
@artpol84
Copy link
Contributor Author
artpol84 commented Jul 10, 2023

@alk

Can you please clean up this part? (Or did I misundertand something)

Done cleaning up and updating the PR.

@alk
Copy link
Contributor
alk commented Jul 10, 2023

Eh. I am very very sorry. Looking at this once more I cannot agree with "always append pid" approach. For most people and me too, main workload with cpu profiler is something like:

$ CPUPROFILE=out ./myprogram
$ pprof --web ./myprogram out

Forcing pid in all file names there will be super-annoying. I cannot agree to that.

I am truly sorry for saying yes to this and making you do the rebase thingy. Perhaps the idea with %p in file name is workable one. It will also let us deal with high bit thingy in better and cleaner way. Alternative is to look at extra variables like CPUPROFILE_USE_PID to signal that we need pids appended.

Please excuse me. And do let me know if you want to work on this larger change or not. In later case I am happy to make up for the trouble and author right change myself.

@artpol84
Copy link
Contributor Author
artpol84 commented Jul 10, 2023

Forcing pid in all file names there will be super-annoying. I cannot agree to that.

It's ok I understand.

Please excuse me. And do let me know if you want to work on this larger change or not. In later case I am happy to make up for the trouble and author right change myself.

NP

Alternative is to look at extra variables like CPUPROFILE_USE_PID to signal that we need pids appended.

This seems like the easiest way to do as long as you don't envision more items to be added to the filename. So I can implement that.

In addition to that I'd propose to add the code that can detect HPC environments and adaptively do useful stuff there. For example

  • Open MPI has OMPI_x, Slurm resource manager would have SLURM_x environment variables. MPICH is likely to have something similar (but need to check). We can use this as a sign that you are likely to want to have PID.
  • In addition, if MPI is using PMIx (https://github.com/openpmix/openpmix) which is de-facto standard for Open MPI and is supported by MPICH, we can actually append MPI rank (instead of PID) to the output file name which will be even more convenient.
    Let me know if those options are interesting for you.

@alk
Copy link
Contributor
alk commented Jul 10, 2023

Sure. Having common mpi environments autodetected seems okay. Go ahead.

@artpol84 artpol84 changed the title Always add the PID to the profile output. Advanced path from env generation Jul 11, 2023
@artpol84
Copy link
Contributor Author

@alk I've pushed the non-tested code to this PR. It builds but I've not checked it at runtime.
Could you give it a look and let me know if that works conceptually?

@alk
Copy link
Contributor
alk commented Jul 11, 2023

Conceptually looks fine. I recommend adding more comments about motivation for detecting HPC environment (basically, what you wrote above likelyhood of same machine running multiple replicas of same program). If you don't mind, also consider adding comments on the high bit trick (why etc).

@artpol84
Copy link
Contributor Author

@alk please check the comments and approve.
If ok - I'll squash and force-push it.
Next I'll do the runtime testing.

@alk
Copy link
Contributor
alk commented Jul 12, 2023

LGTM. Lets get this in. Yes, please, squash and I'll merge.

@artpol84 artpol84 force-pushed the topic/fix_outfile branch from ffa6d22 to a396ac8 Compare July 12, 2023 18:19
@artpol84
Copy link
Contributor Author

Squashed ... note that I haven't tested it with OMPI & Slurm yet

@alk
Copy link
Contributor
alk commented Jul 12, 2023

Noted. You want to test it first? Or we assume if it doesn't work you/someone will fix it later ?

@alk
Copy link
Contributor
alk commented Jul 12, 2023

I plan to do release in about week or so. So you don't need to rush.

@artpol84
Copy link
Contributor Author

If you ok testing the common cases yourself and if they work I'll take care of testing the HPC case sometime later (in a week timeframe) and file a fix if it doesn't work.

@alk
Copy link
Contributor
alk commented Jul 13, 2023

Yes I am fine testing basic case myself. So I still see 2 commits being proposed here. You still plan to squash them together? (I'd prefer that for sure)

@artpol84 artpol84 force-pushed the topic/fix_outfile branch from a396ac8 to 72d3ab9 Compare July 13, 2023 18:01
@artpol84
Copy link
Contributor Author

So I still see 2 commits being proposed here. You still plan to squash them together? (I'd prefer that for sure)

Fixed!

Copy link
Contributor
@alk alk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somehow it broke unittests. So needs revision

@artpol84 artpol84 force-pushed the topic/fix_outfile branch 5 times, most recently from 7992816 to 87f0c76 Compare July 14, 2023 02:16
@artpol84
Copy link
Contributor Author
artpol84 commented Jul 14, 2023

Unit tests are passing

@alk
Copy link
Contributor
alk commented Jul 14, 2023

tests are failing. For example heap-profiler_unittest

For the new file copyright year is not 2005 is not google. See for example safe_strerror.cc how it can be done. Also you sure that unique path test requires linking to tcmalloc_minimal ? Try minimal thing; only sysinfo and nothing else.

Why do you do fancy ptr and len thingy for AppendHPCEnvironment? It is always appending to empty PATH_MAX sized string. Why bother will all the extra logic? Also if you choose to keep append{Ptr,Len} then do make sure to keep naming style consistent.

@artpol84
Copy link
Contributor Author
artpol84 commented Jul 14, 2023

tests are failing. For example heap-profiler_unittest

Ok, I'll take a look. I was referring to my new test, haven't tried existing.

For the new file copyright year is not 2005 is not google. See for example safe_strerror.cc how it can be done.

Ok

Also you sure that unique path test requires linking to tcmalloc_minimal ? Try minimal thing; only sysinfo and nothing else.

Ok, will try that. I wasn't sure.

Why do you do fancy ptr and len thingy for AppendHPCEnvironment? It is always appending to empty PATH_MAX sized string. Why bother will all the extra logic?

It is not appending to the empty prefix always. At least this is not the case for the PID suffix. PID suffix may be on its own but also appended to the PMIX or Slurm suffixes already present.

Also if you choose to keep append{Ptr,Len} then do make sure to keep naming style consistent.

Ok

@artpol84
Copy link
Contributor Author

Also you sure that unique path test requires linking to tcmalloc_minimal ? Try minimal thing; only sysinfo and nothing else.

I wasn't sure about how to link with the whole LD_PRELOAD-able library. I can try adding only "sysinfo", but assumed it might have dependencies either now or in the future that would introduce additional maintenance hassle later.
IMO it's a good practice to link the test to the whole library. In this case, you always have all dependencies correct

@artpol84
Copy link
Contributor Author
artpol84 commented Jul 14, 2023

For me the heap profiler is passing:

$ git log -1 && ./heap-profiler_unittest 
commit bd6a6ea2be62e6a456fd4d83c4b18cf30ced353a (HEAD -> topic/fix_outfile_v2, origin/topic/fix_outfile)
Author: Artem Polyakov <...>
Date:   Thu Jul 13 17:43:29 2023 -0700

    Add unit test for GetUniquePathFromEnv()
    
    Signed-off-by: Artem Polyakov <...>
Starting tracking the heap
Starting tracking the heap
DONE.

Am I looking at the wrong place? Should I run it in some special way?

@artpol84
Copy link
Contributor Author

Also if you choose to keep append{Ptr,Len} then do make sure to keep naming style consistent.

Will AppendPtr() and AppendLen() be consistent with the naming style?

@artpol84
Copy link
Contributor Author

Ok, I guess I was able to reproduce

$ ./heap-profiler_unittest.sh
Profile not found: /tmp/heap-profiler_unittest.beLeuR/test_*.1329.heap

Fixing now

@artpol84
Copy link
Contributor Author

Tests should be passing now.

$ ./heap-profiler_debug_unittest.sh && ./heap-profiler_unittest.sh 
PASS
PASS

I also fixed the naming of the functions per below:

Will AppendPtr() and AppendLen() be consistent with the naming style?

@artpol84 artpol84 force-pushed the topic/fix_outfile branch from bd6a6ea to 53efafa Compare July 14, 2023 21:46
@artpol84
Copy link
Contributor Author

@alk

  • I've fixed the file header to be "Copyright (c) 2023, gperftools Contributors"
  • I tried building just sysinfo.cc into the test but was getting some errors at the 'autogen.sh' running stage (see below). What I did is I removed deps from tcmalloc and simply put "sysinfo.cc" to the _SORCES variable in Makefile.am.
$ ./autogen.sh
Makefile.am: error: object 'src/base/sysinfo.$(OBJEXT)' created both with libtool and without
autoreconf: automake failed with exit status: 1

@artpol84 artpol84 force-pushed the topic/fix_outfile branch 2 times, most recently from b66f6be to 4fc3cb8 Compare July 16, 2023 17:10
@alk
Copy link
Contributor
alk commented Jul 23, 2023

Thanks for update. Several more questions. (Some I can fix myself but this is never ideal if I have to amend people's work)

  • I see that you added makefile reference to unique_path_unittest.vcxproj but there is no such file. It is not great that we're not testing this on windows, but is better than breaking make dist command.
  • CheckPIDIsForced should be static
  • using PATH_MAX-sized buffer for environment name string is mildly confusing. Breaks principle of least surprise. I am not sure how to size this one but PATH_MAX we should probably avoid.
  • Slurm's docs mention that SLURM_JOBID is deprecated. Also can you clarify why we need this approach "if jobid is there but not procid then we force pid" ?
  • For testing checking XYZ_USE_PID thing, please use EnvToBool
  • and most general, this is adding a lot of abstractions. Most used exactly once. And many of them are just giving names to obvious things and result is big diff with logic spread quite thinly. I understand it is matter of taste, but we don't do that kind of style in gperftools. Bigger offenders here are stuff like PID_SUFFIX but those numerous functions are questionable as well.
  • I think general logic can be simplified with hpc detection bits (which don't need to be own function) set up something like mpi_sep and mpi_val. And then you format it all at once something like this:
  if (append_pid) {
    snprintf(path, PATH_MAX, "%s%s%s_%d", envval, mpi_sep, mpi_val, getpid());
  } else {
    snprintf(path, PATH_MAX, "%s%s%s", envval, mpi_sep, mpi_val);
  }

which saves all the trouble with AppendXYZ stuff (and confusion inspiring O(N^2) behaviors; yes, I might by only one irked by this and this code isn't exactly speed critical but still)

@alk
Copy link
Contributor
alk commented Jul 23, 2023

Ah and yes, fine to leave makefile bits to me. Dependency on libtcmalloc is unfortunate and fixable, but several other unit tests are in same shape. So I'll be fixing that separately. And I can't be asking people to learn autotools bits.

@alk alk closed this Jul 23, 2023
@artpol84
Copy link
Contributor Author

I see you closed it. What are the next steps? From responses it seems you were asking for further fixes, but in this case it should not be closed right?

@alk
Copy link
Contributor
alk commented Jul 23, 2023

Oh. I apologize. Had no intention to close. Must have hit by mistake. Yes, if you could adjust further I'd be thankful.

@artpol84
Copy link
Contributor Author

I will fix as requested

* Add support for known HPC environments (TODO: needs to be extended
with more nevironments)
* Added the "CPUPROFILE_USE_PID" environment variable to force appending
PID for the non-covered environments
* Preserve the old way of handling the Child-Parent case

Signed-off-by: Artem Polyakov <artpol84@gmail.com>
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
@artpol84 artpol84 force-pushed the topic/fix_outfile branch from 4fc3cb8 to 86450ad Compare August 8, 2023 23:57
@artpol84
Copy link
Contributor Author
artpol84 commented Aug 9, 2023

@alk, I did a pass to address the comments. Please see below

  • I see that you added makefile reference to unique_path_unittest.vcxproj but there is no such file. It is not great that we're not testing this on windows, but is better than breaking make dist command.

Fixed - I've removed that item from the Makefile.am as I'm not sure how to properly generate it.

  • CheckPIDIsForced should be static

N/A - I've got rid of this function per one of the requests below

  • using PATH_MAX-sized buffer for environment name string is mildly confusing. Breaks principle of least surprise. I am not sure how to size this one but PATH_MAX we should probably avoid.

Not sure what's the best way to handle it.
I've introduced the following macro's in the sysinfo.h:

#define TC_ENV_MAX_NAME 256

that defines the max expected variable name.

  • Slurm's docs mention that SLURM_JOBID is deprecated. Also can you clarify why we need this approach "if jobid is there but not procid then we force pid" ?

Good catch, fixed

  • For testing checking XYZ_USE_PID thing, please use EnvToBool

Fixed

  • and most general, this is adding a lot of abstractions. Most used exactly once. And many of them are just giving names to obvious things and result is big diff with logic spread quite thinly. I understand it is matter of taste, but we don't do that kind of style in gperftools. Bigger offenders here are stuff like PID_SUFFIX but those numerous functions are questionable as well.

Indeed, this is very subjective. I tried to address, but I'm not sure I am able to guess what you would prefer:

  1. I turned Child bit manipulation into macro's
  2. I've merged CheckPIDIsForced() into GetUniquePathFromEnv()
  3. As of now, only 1 new function (QueryHPCEnvironment()) introduced that I'd prefer to keep separate from both readability and logic standpoints.
  • I think general logic can be simplified with hpc detection bits (which don't need to be own function) set up something like mpi_sep and mpi_val. And then you format it all at once something like this:
  if (append_pid) {
    snprintf(path, PATH_MAX, "%s%s%s_%d", envval, mpi_sep, mpi_val, getpid());
  } else {
    snprintf(path, PATH_MAX, "%s%s%s", envval, mpi_sep, mpi_val);
  }

which saves all the trouble with AppendXYZ stuff (and confusion inspiring O(N^2) behaviors; yes, I might by only one irked by this and this code isn't exactly speed critical but still)

Fixed

I've also added some macros to sysinfo.h to make sure that the unit test and sysinfo.cc are aligned. I know you didn't want to use the macro for this, but I believe it's more reliable as we actually using these things from 2 places.
If you still think we need to get rid of this - it's fine, I can copy/paste in all the places. But IMO this makes sense for easier maintenance.

#define TC_ENV_PID_SUFFIX "_"
#define TC_ENV_PMIX_RANK "PMIX_RANK"
#define TC_ENV_PMIX_SUFFIX ".rank-"
#define TC_ENV_SLURM_SUFFIX ".slurmid-"
#define TC_ENV_SLURM_JOBID "SLURM_JOB_ID"
#define TC_ENV_SLURM_PROCID "SLURM_PROCID"

@alk
Copy link
Contributor
alk commented Aug 9, 2023

Looking. Thanks a lot for taking your time to work on this.

@alk alk merged commit 86450ad into gperftools:master Aug 9, 2023
@alk
Copy link
Contributor
alk commented Aug 9, 2023

applied. thanks. (I ended up massaging your changes in separate commit)

@artpol84
Copy link
Contributor Author
artpol84 commented Aug 9, 2023

Thanks!
"massaging" looks great - I do not think in terms of OOP, unfortunately. So I doubt I'd come with up something close to that one on my own.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0