Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSON/TOML backend: introduce abbreviated IO modes #1493

Merged
merged 28 commits into from
Dec 16, 2024
Merged
Show file tree
Hide file tree
Changes from 27 commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
3b06c09
Introduce dataset template mode to JSON backend
franzpoeschel Aug 4, 2023
0258000
Write used mode to JSON file
franzpoeschel Aug 4, 2023
a48e92e
Use Attribute::getOptional for snapshot attribute
franzpoeschel Feb 23, 2023
e6e5357
Introduce attribute mode
franzpoeschel Aug 4, 2023
0f2d33f
Add example 14_toml_template.cpp
franzpoeschel Aug 4, 2023
52e518d
Use Datatype::UNDEFINED to indicate no dataset definition in template
franzpoeschel Mar 10, 2023
e03d21d
Extend example
franzpoeschel May 19, 2022
ce3aab4
Test short attribute mode
franzpoeschel Aug 7, 2023
06d7ec1
Copy datatypeToString to JSON implementation
franzpoeschel Aug 7, 2023
b29d64d
Fix after rebase: Init JSON config in parallel mode
franzpoeschel Sep 22, 2023
76f2421
Fix after rebase: Don't erase JSON datasets when writing
franzpoeschel Sep 22, 2023
d5534cb
openpmd-pipe: use short modes for test
franzpoeschel Oct 12, 2023
7983763
Less intrusive warnings, allow disabling them
franzpoeschel Oct 12, 2023
f413d75
TOML: Use short modes by default
franzpoeschel Oct 12, 2023
429ade5
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 24, 2023
c309e89
Documentation
franzpoeschel Nov 24, 2023
e5f1177
Short mode in default in openPMD >= 2.
franzpoeschel Nov 24, 2023
183c1a1
Short value by default in TOML
franzpoeschel Mar 19, 2024
395cd10
Store the openPMD version information in the IOHandler
franzpoeschel Mar 19, 2024
d4b6f88
Fixes
franzpoeschel Mar 26, 2024
afa0165
Adapt test to recent rebase
franzpoeschel Jun 7, 2024
50ea53c
toml11 4.0 compatibility
franzpoeschel Aug 5, 2024
61a84c6
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 15, 2024
d5b35e2
wip: cleanup
franzpoeschel Dec 11, 2024
346c37e
wip: cleanup
franzpoeschel Dec 13, 2024
be3543a
Cleanup
franzpoeschel Dec 13, 2024
7f81e62
Extensive testing
franzpoeschel Dec 13, 2024
985a59f
CI fixes
franzpoeschel Dec 16, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -703,6 +703,7 @@ set(openPMD_EXAMPLE_NAMES
10_streaming_read
12_span_write
13_write_dynamic_configuration
14_toml_template
)
set(openPMD_PYTHON_EXAMPLE_NAMES
2_read_serial
Expand Down Expand Up @@ -1327,6 +1328,9 @@ if(openPMD_BUILD_TESTING)
${openPMD_RUNTIME_OUTPUT_DIRECTORY}/openpmd-pipe \
--infile ../samples/git-sample/thetaMode/data_%T.bp \
--outfile ../samples/git-sample/thetaMode/data%T.json \
--outconfig ' \
json.attribute.mode = \"short\" \n\
json.dataset.mode = \"template_no_warn\"' \
"
WORKING_DIRECTORY ${openPMD_RUNTIME_OUTPUT_DIRECTORY}
)
Expand Down
37 changes: 32 additions & 5 deletions docs/source/backends/json.rst
Original file line number Diff line number Diff line change
Expand Up @@ -38,20 +38,47 @@ when working with the JSON backend.
Datasets and groups have the same namespace, meaning that there may not be a subgroup
and a dataset with the same name contained in one group.

Any **openPMD dataset** is a JSON object with three keys:
Datasets
........

* ``attributes``: Attributes associated with the dataset. May be ``null`` or not present if no attributes are associated with the dataset.
* ``datatype``: A string describing the type of the stored data.
* ``data`` A nested array storing the actual data in row-major manner.
Datasets can be stored in two modes, either as actual datasets or as dataset templates.
The mode is selected by the :ref:`JSON/TOML parameter<backendconfig>` ``json.dataset.mode`` (resp. ``toml.dataset.mode``) with possible values ``["dataset", "template"]`` (default: ``"dataset"``).

Stored as an actual dataset, an **openPMD dataset** is a JSON object with three JSON keys:

* ``datatype`` (required): A string describing the type of the stored data.
* ``data`` (required): A nested array storing the actual data in row-major manner.
The data needs to be consistent with the fields ``datatype`` and ``extent``.
Checking whether this key points to an array can be (and is internally) used to distinguish groups from datasets.
* ``attributes``: Attributes associated with the dataset. May be ``null`` or not present if no attributes are associated with the dataset.

Stored as a **dataset template**, an openPMD dataset is represented by three JSON keys:

* ``datatype`` (required): As above.
* ``extent`` (required): A list of integers, describing the extent of the dataset.
This replaces the ``data`` key from the non-template representation.
* ``attributes``: As above.

**Attributes** are stored as a JSON object with a key for each attribute.
This mode stores only the dataset metadata.
Chunk load/store operations are ignored.

Attributes
..........

In order to avoid name clashes, attributes are generally stored within a separate subgroup ``attributes``.

Attributes can be stored in two formats.
The format is selected by the :ref:`JSON/TOML parameter<backendconfig>` ``json.attribute.mode`` (resp. ``toml.attribute.mode``) with possible values ``["long", "short"]`` (default: ``"long"`` for JSON in openPMD 1.*, ``"short"`` otherwise, i.e. generally in openPMD 2.*, but always in TOML).

Attributes in **long format** store the datatype explicitly, by representing attributes as JSON objects.
Every such attribute is itself a JSON object with two keys:

* ``datatype``: A string describing the type of the value.
* ``value``: The actual value of type ``datatype``.

Attributes in **short format** are stored as just the simple value corresponding with the attribute.
Since JSON/TOML values are pretty-printed into a human-readable format, byte-level type details can be lost when reading those values again later on (e.g. the distinction between different integer types).

TOML File Format
----------------

Expand Down
23 changes: 19 additions & 4 deletions docs/source/details/backendconfig.rst
Original file line number Diff line number Diff line change
Expand Up @@ -104,6 +104,8 @@ The key ``rank_table`` allows specifying the creation of a **rank table**, used
Configuration Structure per Backend
-----------------------------------

Please refer to the respective backends' documentations for further information on their configuration.

.. _backendconfig-adios2:

ADIOS2
Expand Down Expand Up @@ -231,8 +233,21 @@ The parameters eligible for being passed to flush calls may be configured global

.. _backendconfig-other:

Other backends
^^^^^^^^^^^^^^
JSON/TOML
^^^^^^^^^

Do currently not read the configuration string.
Please refer to the respective backends' documentations for further information on their configuration.
A full configuration of the JSON backend:

.. literalinclude:: json.json
:language: json

The TOML backend is configured analogously, replacing the ``"json"`` key with ``"toml"``.

All keys found under ``json.dataset`` are applicable globally as well as per dataset.
Explanation of the single keys:

* ``json.dataset.mode`` / ``toml.dataset.mode``: One of ``"dataset"`` (default) or ``"template"``.
In "dataset" mode, the dataset will be written as an n-dimensional (recursive) array, padded with nulls (JSON) or zeroes (TOML) for missing values.
In "template" mode, only the dataset metadata (type, extent and attributes) are stored and no chunks can be written or read (i.e. write/read operations will be skipped).
* ``json.attribute.mode`` / ``toml.attribute.mode``: One of ``"long"`` (default in openPMD 1.*) or ``"short"`` (default in openPMD 2.* and generally in TOML).
The long format explicitly encodes the attribute type in the dataset on disk, the short format only writes the actual attribute as a JSON/TOML value, requiring readers to recover the type.
10 changes: 10 additions & 0 deletions docs/source/details/json.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
{
"json": {
"dataset": {
"mode": "template"
},
"attribute": {
"mode": "short"
}
}
}
111 changes: 111 additions & 0 deletions examples/14_toml_template.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
#include <openPMD/openPMD.hpp>

std::string backendEnding()
{
auto extensions = openPMD::getFileExtensions();
if (auto it = std::find(extensions.begin(), extensions.end(), "toml");
it != extensions.end())
{
return *it;
}
else
{
// Fallback for buggy old NVidia compiler
return "json";
}
}

void write()
{
std::string config = R"(
{
"iteration_encoding": "variable_based",
"json": {
"dataset": {"mode": "template"},
"attribute": {"mode": "short"}
},
"toml": {
"dataset": {"mode": "template"},
"attribute": {"mode": "short"}
}
}
)";

openPMD::Series writeTemplate(
"../samples/tomlTemplate." + backendEnding(),
openPMD::Access::CREATE,
config);
auto iteration = writeTemplate.writeIterations()[0];

openPMD::Dataset ds{openPMD::Datatype::FLOAT, {5, 5}};

auto temperature =
iteration.meshes["temperature"][openPMD::RecordComponent::SCALAR];
temperature.resetDataset(ds);

auto E = iteration.meshes["E"];
E["x"].resetDataset(ds);
E["y"].resetDataset(ds);
/*
* Don't specify datatype and extent for this one to indicate that this
* information is not yet known.
*/
E["z"].resetDataset({});

ds.extent = {10};

auto electrons = iteration.particles["e"];
electrons["position"]["x"].resetDataset(ds);
electrons["position"]["y"].resetDataset(ds);
electrons["position"]["z"].resetDataset(ds);

electrons["positionOffset"]["x"].resetDataset(ds);
electrons["positionOffset"]["y"].resetDataset(ds);
electrons["positionOffset"]["z"].resetDataset(ds);
electrons["positionOffset"]["x"].makeConstant(3.14);
electrons["positionOffset"]["y"].makeConstant(3.14);
electrons["positionOffset"]["z"].makeConstant(3.14);

ds.dtype = openPMD::determineDatatype<uint64_t>();
electrons.particlePatches["numParticles"][openPMD::RecordComponent::SCALAR]
.resetDataset(ds);
electrons
.particlePatches["numParticlesOffset"][openPMD::RecordComponent::SCALAR]
.resetDataset(ds);
electrons.particlePatches["offset"]["x"].resetDataset(ds);
electrons.particlePatches["offset"]["y"].resetDataset(ds);
electrons.particlePatches["offset"]["z"].resetDataset(ds);
electrons.particlePatches["extent"]["x"].resetDataset(ds);
electrons.particlePatches["extent"]["y"].resetDataset(ds);
electrons.particlePatches["extent"]["z"].resetDataset(ds);
}

void read()
{
/*
* The config is entirely optional, these things are also detected
* automatically when reading
*/

// std::string config = R"(
// {
// "iteration_encoding": "variable_based",
// "toml": {
// "dataset": {"mode": "template"},
// "attribute": {"mode": "short"}
// }
// }
// )";
Comment on lines +90 to +98

Check notice

Code scanning / CodeQL

Commented-out code Note

This comment appears to contain commented-out code.

openPMD::Series read(
"../samples/tomlTemplate." + backendEnding(),
openPMD::Access::READ_LINEAR);
read.parseBase();
openPMD::helper::listSeries(read);
}

int main()
{
write();
read();
}
28 changes: 25 additions & 3 deletions include/openPMD/Dataset.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -41,18 +41,40 @@ class Dataset
public:
enum : std::uint64_t
{
JOINED_DIMENSION = std::numeric_limits<std::uint64_t>::max()
/**
* Setting one dimension of the extent as JOINED_DIMENSION means that
* the extent along that dimension will be defined by the sum of all
* parallel processes' contributions.
* Only one dimension can be joined. For store operations, the offset
* should be an empty array and the extent should give the actual
* extent of the chunk (i.e. the number of joined elements along the
* joined dimension, equal to the global extent in all other
* dimensions). For more details, refer to
* docs/source/usage/workflow.rst.
*/
JOINED_DIMENSION = std::numeric_limits<std::uint64_t>::max(),
/**
* Some backends (i.e. JSON and TOML in template mode) support the
* creation of dataset with undefined datatype and extent.
* The extent should be given as {UNDEFINED_EXTENT} for that.
*/
UNDEFINED_EXTENT = std::numeric_limits<std::uint64_t>::max() - 1
};

Dataset(Datatype, Extent, std::string options = "{}");

/**
* @brief Constructor that sets the datatype to undefined.
*
* Helpful for resizing datasets, since datatypes need not be given twice.
* Helpful for:
*
* 1. Resizing datasets, since datatypes need not be given twice.
* 2. Initializing datasets as undefined, as used by template mode in the
* JSON/TOML backend. In this case, the default (undefined) specification
* for the Extent may be used.
*
*/
Dataset(Extent);
Dataset(Extent = {UNDEFINED_EXTENT});

Dataset &extend(Extent newExtent);

Expand Down
1 change: 1 addition & 0 deletions include/openPMD/IO/AbstractIOHandler.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -201,6 +201,7 @@ class AbstractIOHandler
{
friend class Series;
friend class ADIOS2IOHandlerImpl;
friend class JSONIOHandlerImpl;
friend class detail::ADIOS2File;

private:
Expand Down
1 change: 1 addition & 0 deletions include/openPMD/IO/JSON/JSONIOHandler.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@

#include "openPMD/IO/AbstractIOHandler.hpp"
#include "openPMD/IO/JSON/JSONIOHandlerImpl.hpp"
#include "openPMD/auxiliary/JSON_internal.hpp"

#if openPMD_HAVE_MPI
#include <mpi.h>
Expand Down
Loading
Loading