Runtime report for MPI jobs #482

rcarson3 · 2023-04-27T21:50:44Z

rcarson3
Apr 27, 2023

@daboehme so I've been recently trying to run Caliper on Frontier, and I've been running into some issues there were my previous approach of running Caliper on Summit made use of your response here: #151 (comment) would cause my program to crash due to a missing mmap call in a shared library. I believe this is likely a linking issue somewhere as that function would be available from the std c/c++ libraries that I link to in my code.

On Summit, I'd initialize caliper in my app using a macro that just called the below at the start of things

#define CALI_INIT \
   cali_mpi_init(); \
   cali_init();

int main(int argc, char *argv[])
{
   CALI_INIT
   CALI_CXX_MARK_FUNCTION;
   CALI_MARK_BEGIN("main_driver_init");
   // Initialize MPI.
   int num_procs, myid;
   MPI_Init(&argc, &argv);
   MPI_Comm_size(MPI_COMM_WORLD, &num_procs);
   MPI_Comm_rank(MPI_COMM_WORLD, &myid);
   ...
   MPI_Finalize();
 }

I've been working with the OLCF on possible work arounds and they pointed me to the standard configs such as the runtime-report work. In order to try and get something up running I've been looking into using these standard configurations. So, I've modified my code slightly to make use of the cali::ConfigManager mgr using another macro as seen below:

#define CALI_MPI_INIT \
   cali_mpi_init();
#define CALI_INIT(mpi_rank) \
   cali::ConfigManager mgr; \
   if(const char* env_p = std::getenv("CALI_CONFIG")) { mgr.add(env_p); if (mpi_rank == 0) { std::cout << env_p << std::endl; }} \
   if (mgr.error() && mpi_rank == 0) { std::cerr << "ConfigManager: " << mgr.error_msg() << std::endl; } \
   mgr.start();
#define CALI_FINALIZE \
   mgr.flush(); \
   mgr.stop();
   
int main(int argc, char *argv[])
{
   CALI_MPI_INIT
   // Initialize MPI.
   int num_procs, myid;
   MPI_Init(&argc, &argv);
   MPI_Comm_size(MPI_COMM_WORLD, &num_procs);
   MPI_Comm_rank(MPI_COMM_WORLD, &myid);
   
   CALI_MPI_INIT(myid)
   CALI_CXX_MARK_FUNCTION;
   CALI_MARK_BEGIN("main_driver_init");
   ...
   CALI_FINALIZE
   MPI_Finalize();
 }

The code does appear to be running at least and providing me timing metrics. However, I've noticed at least on my super simple 1 node test with 8 MPI ranks I'm seeing in the caliper outputs two sets of metrics like when using the following input:

export CALI_CONFIG="runtime-report(aggregate_across_ranks=true,calc.inclusive=true,profile.mpi,output=stderr)"
srun -N 1 -n 8 -c 7 --threads-per-core=1 --cpu-bind=threads --gpus-per-task=1 ./mechanics -opt ./options_frontier.toml

Path                                             Min time/rank Max time/rank Avg time/rank Time %
main                                                  3.919222      3.921087      3.919999 99.998473
  MPI_Wait                                            0.000001      0.000002      0.000002  0.000038
  MPI_Test                                            0.000012      0.001207      0.000594  0.013249
  ...
Path                                             Min time/rank Max time/rank Avg time/rank Time %
main                                                  3.925248      3.927085      3.926022 99.991143
  MPI_Wait                                            0.000001      0.000002      0.000002  0.000038
  MPI_Test                                            0.000012      0.001204      0.000592  0.013190

As I've only ever used that other input file where this never occurred, I'm not too sure if this expected behavior or if there might be a way to fix it? Any help would be appreciated as I'm trying to get some timings put together for some reports.

Answered by daboehme

Apr 27, 2023

Hi @rcarson3, you don't strictly need your own ConfigManager instance in this case. Caliper has an internal ConfigManager that runs the configuration provided in the CALI_CONFIG environment variable. Here you're creating another ConfigManager instance that runs the same configuration, that's why you're essentially getting the same output twice. So you should use either CALI_CONFIG or your own ConfigManager. If you use your own ConfigManager instance you'd typically use some application-specific way, e.g. a command-line argument, to pass in the runtime-report... configuration string. Hope this helps!

View full answer

daboehme · 2023-04-27T22:23:52Z

daboehme
Apr 27, 2023
Maintainer

Hi @rcarson3, you don't strictly need your own ConfigManager instance in this case. Caliper has an internal ConfigManager that runs the configuration provided in the CALI_CONFIG environment variable. Here you're creating another ConfigManager instance that runs the same configuration, that's why you're essentially getting the same output twice. So you should use either CALI_CONFIG or your own ConfigManager. If you use your own ConfigManager instance you'd typically use some application-specific way, e.g. a command-line argument, to pass in the runtime-report... configuration string. Hope this helps!

1 reply

rcarson3 Apr 27, 2023
Author

Thanks that appears to have been the issue! I can now start getting some timings :)

Also good to know about not needing my own ConfigManager for what I'm doing. I'll probably look into that aspect of things in more detail in the future but for now I'll just revert back to use Caliper's internal ConfigManager.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Runtime report for MPI jobs #482

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Runtime report for MPI jobs #482

rcarson3 Apr 27, 2023

Replies: 1 comment · 1 reply

daboehme Apr 27, 2023 Maintainer

rcarson3 Apr 27, 2023 Author

rcarson3
Apr 27, 2023

Replies: 1 comment 1 reply

daboehme
Apr 27, 2023
Maintainer

rcarson3 Apr 27, 2023
Author