Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ADIOS2: Optionally write attributes only from given ranks #1542

Merged
merged 4 commits into from
Dec 5, 2023
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 23 additions & 0 deletions docs/source/backends/adios2.rst
Original file line number Diff line number Diff line change
Expand Up @@ -105,6 +105,11 @@ The default behavior may be restored by setting the :ref:`JSON parameter <backen
Best Practice at Large Scale
----------------------------

ADIOS2 distinguishes between "heavy" data of arbitrary size (i.e. the "actual" data) and lightweight metadata.

Heavy I/O
.........

A benefitial configuration depends heavily on:

1. Hardware: filesystem type, specific file striping, network infrastructure and available RAM on the aggregator nodes.
Expand Down Expand Up @@ -135,6 +140,24 @@ The preferred backend usually depends on the system's native software stack.
For fine-tuning at extreme scale or for exotic systems, please refer to the ADIOS2 manual and talk to your filesystem admins and the ADIOS2 authors.
Be aware that extreme-scale I/O is a research topic after all.

Metadata
........

ADIOS2 will implicitly aggregate metadata specified from parallel MPI processes.
Duplicate specification of metadata is eliminated in this process.
Unlike in HDF5, specifying metadata collectively is not required and is even detrimental to performance.
The :ref:`JSON/TOML key <backendconfig>` ``adios2.attribute_writing_ranks`` can be used to restrict attribute writing to only a select handful of ranks (most typically a single one).
The ADIOS2 backend of the openPMD-api will then ignore attributes from all other MPI ranks.

.. tip::

Treat metadata specification as a collective operation in order to retain compatibility with HDF5, and then specify ``adios2.attribute_writing_ranks = 0`` in order to achieve best performance in ADIOS2.

.. warning::

The ADIOS2 backend may also use attributes to encode openPMD groups (ref. "group table").
The ``adios.attribute_writing_ranks`` key also applies to those attributes, i.e. also group creation must be treated as collective then (at least on the specified ranks).
franzpoeschel marked this conversation as resolved.
Show resolved Hide resolved

Experimental group table feature
--------------------------------

Expand Down
4 changes: 3 additions & 1 deletion docs/source/details/adios2.json
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
"adios2": {
"engine": {
"type": "sst",
"preferred_flush_target": "disk",
"parameters": {
"BufferGrowthFactor": "2.0",
"QueueLimit": "2"
Expand All @@ -17,6 +18,7 @@
}
}
]
}
},
"attribute_writing_ranks": 0
}
}
7 changes: 7 additions & 0 deletions docs/source/details/adios2.toml
Original file line number Diff line number Diff line change
@@ -1,5 +1,12 @@
[adios2]
# ignore all attribute writes not issued on these ranks
# can also be a list if multiple ranks need to be given
# however rank 0 should be the most common option here
attribute_writing_ranks = 0

[adios2.engine]
type = "sst"
preferred_flush_target = "disk"

[adios2.engine.parameters]
BufferGrowthFactor = "2.0"
Expand Down
7 changes: 6 additions & 1 deletion docs/source/details/backendconfig.rst
Original file line number Diff line number Diff line change
Expand Up @@ -113,7 +113,7 @@ A full configuration of the ADIOS2 backend:
.. literalinclude:: adios2.toml
:language: toml

All keys found under ``adios2.dataset`` are applicable globally as well as per dataset, keys found under ``adios2.engine`` only globally.
All keys found under ``adios2.dataset`` are applicable globally as well as per dataset, any other keys such as those found under ``adios2.engine`` only globally.
Explanation of the single keys:

* ``adios2.engine.type``: A string that is passed directly to ``adios2::IO:::SetEngine`` for choosing the ADIOS2 engine to be used.
Expand Down Expand Up @@ -142,6 +142,11 @@ Explanation of the single keys:
The openPMD-api will automatically use a fallback implementation for the span-based Put() API if any operator is added to a dataset.
This workaround is enabled on a per-dataset level.
The workaround can be completely deactivated by specifying ``{"adios2": {"use_span_based_put": true}}`` or it can alternatively be activated indiscriminately for all datasets by specifying ``{"adios2": {"use_span_based_put": false}}``.
* ``adios2.attribute_writing_ranks``: A list of MPI ranks that define metadata. ADIOS2 attributes will be written only from those ranks, any other ranks will be ignored. Can be either a list of integers or a single integer.

.. hint::

Specifying ``adios2.attribute_writing_ranks`` can lead to serious serialization performance improvements at large scale.

Operations specified inside ``adios2.dataset.operators`` will be applied to ADIOS2 datasets in writing as well as in reading.
Beginning with ADIOS2 2.8.0, this can be used to specify decompressor settings:
Expand Down
2 changes: 2 additions & 0 deletions docs/source/details/mpi.rst
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,8 @@ Functionality Behavior Description
If you want to support all backends equally, treat as a collective operation.
Note that openPMD represents constant record components with attributes, thus inheriting this for ``::makeConstant``.

When treating attribute definitions as collective, we advise specifying the ADIOS2 :ref:`JSON/TOML key <backendconfig>` ``adios2.attribute_writing_ranks`` for metadata aggregation scalabilty, typically as ``adios2.attribute_writing_ranks = 0``.

.. [4] We usually open iterations delayed on first access. This first access is usually the ``flush()`` call after a ``storeChunk``/``loadChunk`` operation. If the first access is non-collective, an explicit, collective ``Iteration::open()`` can be used to have the files already open.
Alternatively, iterations might be accessed for the first time by immediate operations such as ``::availableChunks()``.

Expand Down
6 changes: 5 additions & 1 deletion include/openPMD/IO/ADIOS/ADIOS2IOHandler.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -274,6 +274,8 @@ class ADIOS2IOHandlerImpl
return m_useGroupTable.value();
}

bool m_writeAttributesFromThisRank = true;

struct ParameterizedOperator
{
adios2::Operator op;
Expand All @@ -285,7 +287,9 @@ class ADIOS2IOHandlerImpl
json::TracingJSON m_config;
static json::TracingJSON nullvalue;

void init(json::TracingJSON config);
template <typename Callback>
void
init(json::TracingJSON config, Callback &&callbackWriteAttributesFromRank);

template <typename Key>
json::TracingJSON config(Key &&key, json::TracingJSON &cfg)
Expand Down
68 changes: 61 additions & 7 deletions src/IO/ADIOS/ADIOS2IOHandler.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,43 @@ ADIOS2IOHandlerImpl::ADIOS2IOHandlerImpl(
, m_engineType(std::move(engineType))
, m_userSpecifiedExtension{std::move(specifiedExtension)}
{
init(std::move(cfg));
init(
std::move(cfg),
/* callbackWriteAttributesFromRank = */
[communicator, this](nlohmann::json const &attribute_writing_ranks) {
int rank = 0;
MPI_Comm_rank(communicator, &rank);
auto throw_error = []() {
throw error::BackendConfigSchema(
{"adios2", "attribute_writing_ranks"},
"Type must be either an integer or an array of integers.");
};
if (attribute_writing_ranks.is_array())
{
m_writeAttributesFromThisRank = false;
for (auto const &val : attribute_writing_ranks)
{
if (!val.is_number())
{
throw_error();
}
if (val.get<int>() == rank)
{
m_writeAttributesFromThisRank = true;
break;
}
}
}
else if (attribute_writing_ranks.is_number())
{
m_writeAttributesFromThisRank =
attribute_writing_ranks.get<int>() == rank;
}
else
{
throw_error();
}
});
}

#endif // openPMD_HAVE_MPI
Expand All @@ -94,7 +130,7 @@ ADIOS2IOHandlerImpl::ADIOS2IOHandlerImpl(
, m_engineType(std::move(engineType))
, m_userSpecifiedExtension(std::move(specifiedExtension))
{
init(std::move(cfg));
init(std::move(cfg), [](auto const &...) {});
}

ADIOS2IOHandlerImpl::~ADIOS2IOHandlerImpl()
Expand Down Expand Up @@ -135,7 +171,9 @@ ADIOS2IOHandlerImpl::~ADIOS2IOHandlerImpl()
}
}

void ADIOS2IOHandlerImpl::init(json::TracingJSON cfg)
template <typename Callback>
void ADIOS2IOHandlerImpl::init(
json::TracingJSON cfg, Callback &&callbackWriteAttributesFromRank)
{
// allow overriding through environment variable
m_engineType =
Expand Down Expand Up @@ -181,6 +219,12 @@ void ADIOS2IOHandlerImpl::init(json::TracingJSON cfg)
: ModifiableAttributes::No;
}

if (m_config.json().contains("attribute_writing_ranks"))
{
callbackWriteAttributesFromRank(
m_config["attribute_writing_ranks"].json());
}

auto engineConfig = config(ADIOS2Defaults::str_engine);
if (!engineConfig.json().is_null())
{
Expand Down Expand Up @@ -915,6 +959,10 @@ void ADIOS2IOHandlerImpl::writeDataset(
void ADIOS2IOHandlerImpl::writeAttribute(
Writable *writable, const Parameter<Operation::WRITE_ATT> &parameters)
{
if (!m_writeAttributesFromThisRank)
{
return;
}
#if openPMD_HAS_ADIOS_2_9
switch (useGroupTable())
{
Expand Down Expand Up @@ -3033,7 +3081,11 @@ namespace detail
if (!initializedDefaults)
{
// Currently only schema 0 supported
m_IO.DefineAttribute<uint64_t>(ADIOS2Defaults::str_adios2Schema, 0);
if (m_impl->m_writeAttributesFromThisRank)
{
m_IO.DefineAttribute<uint64_t>(
ADIOS2Defaults::str_adios2Schema, 0);
}
initializedDefaults = true;
}

Expand Down Expand Up @@ -3168,7 +3220,8 @@ namespace detail
{
if (writeOnly(m_mode) &&
!m_IO.InquireAttribute<bool_representation>(
ADIOS2Defaults::str_usesstepsAttribute))
ADIOS2Defaults::str_usesstepsAttribute) &&
m_impl->m_writeAttributesFromThisRank)
{
m_IO.DefineAttribute<bool_representation>(
ADIOS2Defaults::str_usesstepsAttribute, 0);
Expand All @@ -3189,7 +3242,8 @@ namespace detail
*/
if (calledExplicitly && writeOnly(m_mode) &&
!m_IO.InquireAttribute<bool_representation>(
ADIOS2Defaults::str_usesstepsAttribute))
ADIOS2Defaults::str_usesstepsAttribute) &&
m_impl->m_writeAttributesFromThisRank)
{
m_IO.DefineAttribute<bool_representation>(
ADIOS2Defaults::str_usesstepsAttribute, 1);
Expand Down Expand Up @@ -3356,7 +3410,7 @@ namespace detail
case UseGroupTable::Yes:
#if openPMD_HAS_ADIOS_2_9
{
if (writeOnly(m_mode))
if (writeOnly(m_mode) && m_impl->m_writeAttributesFromThisRank)
{
requireActiveStep();
auto currentStepBuffered = currentStep();
Expand Down
11 changes: 9 additions & 2 deletions test/ParallelIOTest.cpp
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
/* Running this test in parallel with MPI requires MPI::Init.
* To guarantee a correct call to Init, launch the tests manually.
*/
#include "openPMD/IO/ADIOS/macros.hpp"
#include "openPMD/auxiliary/Environment.hpp"
#include "openPMD/auxiliary/Filesystem.hpp"
#include "openPMD/openPMD.hpp"
Expand Down Expand Up @@ -1170,10 +1171,16 @@ clevel = "1"
doshuffle = "BLOSC_BITSHUFFLE"
)END";

std::string writeConfigBP4 = R"END(
std::string writeConfigBP4 =
R"END(
[adios2]
unused = "parameter"

attribute_writing_ranks = 0
)END"
#if openPMD_HAS_ADIOS_2_9
"use_group_table = true"
#endif
R"END(
[adios2.engine]
type = "bp4"
unused = "as well"
Expand Down