-
@daboehme so I've been recently trying to run Caliper on Frontier, and I've been running into some issues there were my previous approach of running Caliper on Summit made use of your response here: #151 (comment) would cause my program to crash due to a missing On Summit, I'd initialize caliper in my app using a macro that just called the below at the start of things #define CALI_INIT \
cali_mpi_init(); \
cali_init();
int main(int argc, char *argv[])
{
CALI_INIT
CALI_CXX_MARK_FUNCTION;
CALI_MARK_BEGIN("main_driver_init");
// Initialize MPI.
int num_procs, myid;
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &num_procs);
MPI_Comm_rank(MPI_COMM_WORLD, &myid);
...
MPI_Finalize();
} I've been working with the OLCF on possible work arounds and they pointed me to the standard configs such as the runtime-report work. In order to try and get something up running I've been looking into using these standard configurations. So, I've modified my code slightly to make use of the #define CALI_MPI_INIT \
cali_mpi_init();
#define CALI_INIT(mpi_rank) \
cali::ConfigManager mgr; \
if(const char* env_p = std::getenv("CALI_CONFIG")) { mgr.add(env_p); if (mpi_rank == 0) { std::cout << env_p << std::endl; }} \
if (mgr.error() && mpi_rank == 0) { std::cerr << "ConfigManager: " << mgr.error_msg() << std::endl; } \
mgr.start();
#define CALI_FINALIZE \
mgr.flush(); \
mgr.stop();
int main(int argc, char *argv[])
{
CALI_MPI_INIT
// Initialize MPI.
int num_procs, myid;
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &num_procs);
MPI_Comm_rank(MPI_COMM_WORLD, &myid);
CALI_MPI_INIT(myid)
CALI_CXX_MARK_FUNCTION;
CALI_MARK_BEGIN("main_driver_init");
...
CALI_FINALIZE
MPI_Finalize();
} The code does appear to be running at least and providing me timing metrics. However, I've noticed at least on my super simple 1 node test with 8 MPI ranks I'm seeing in the caliper outputs two sets of metrics like when using the following input: export CALI_CONFIG="runtime-report(aggregate_across_ranks=true,calc.inclusive=true,profile.mpi,output=stderr)"
srun -N 1 -n 8 -c 7 --threads-per-core=1 --cpu-bind=threads --gpus-per-task=1 ./mechanics -opt ./options_frontier.toml
As I've only ever used that other input file where this never occurred, I'm not too sure if this expected behavior or if there might be a way to fix it? Any help would be appreciated as I'm trying to get some timings put together for some reports. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
Hi @rcarson3, you don't strictly need your own |
Beta Was this translation helpful? Give feedback.
Hi @rcarson3, you don't strictly need your own
ConfigManager
instance in this case. Caliper has an internalConfigManager
that runs the configuration provided in theCALI_CONFIG
environment variable. Here you're creating anotherConfigManager
instance that runs the same configuration, that's why you're essentially getting the same output twice. So you should use eitherCALI_CONFIG
or your ownConfigManager
. If you use your ownConfigManager
instance you'd typically use some application-specific way, e.g. a command-line argument, to pass in theruntime-report...
configuration string. Hope this helps!