Skip to content

Commit

Permalink
Built site for gh-pages
Browse files Browse the repository at this point in the history
  • Loading branch information
Quarto GHA Workflow Runner committed Jun 19, 2024
1 parent 140f523 commit 7b4d737
Show file tree
Hide file tree
Showing 24 changed files with 431 additions and 170 deletions.
2 changes: 1 addition & 1 deletion .nojekyll
Original file line number Diff line number Diff line change
@@ -1 +1 @@
9f6d0f22
446ef2c6
156 changes: 116 additions & 40 deletions _tex/index.tex
Original file line number Diff line number Diff line change
Expand Up @@ -269,7 +269,7 @@ \section{Introduction}\label{sec-intro}
can help this mode of standards development thrive and reach its full
potential.

\section{Use cases}\label{use-cases}
\section{Use cases}\label{sec-use-cases}

To understand how OSS development practices affect the development of
data and metadata standards, it is informative to demonstrate this
Expand Down Expand Up @@ -364,12 +364,12 @@ \subsection{Community science}\label{community-science}
the science.

\section{Opportunities and risks for open-source
standards}\label{sec-opportunities}
standards}\label{sec-challenges}

At the same time, these tools and practices are associated with risks
that need to be mitigated.

\subsection{Flexibility vs.~stability}\label{flexibility-vs.-stability}
\subsection{Flexibility vs.~Stability}\label{flexibility-vs.-stability}

One of the defining characteristics of OSS is its dynamism and its rapid
evolution. Because OSS can be used by anyone and, in most cases,
Expand Down Expand Up @@ -433,33 +433,68 @@ \subsection{Cross-domain funding gaps}\label{cross-domain-funding-gaps}
Data standardization investment is justified if the standard is
generalizable beyond any specific science domain. However while the use
cases are domain sciences based, data standardization is seen as a data
infrastructure and not a science investment. Moreover due to how science
research funding works, scientists lack incentives to work across
domains, or work on infrastructure problems.
infrastructure and not a science investment. Moreover, due to how
science research funding works, scientists lack incentives to work
across domains or to work on infrastructure problems.

\subsection{Data instrumentation
issues}\label{data-instrumentation-issues}

Data for scientific observations are often generated by proprietary
instrumentation due to commercialization or other profit driven
incentives. There islack of regulatory oversight to adhere to available
standards or evolve Significant data transformation is required to get
data to a state that is amenable to standards, if available. If not
available, there is lack of incentive to set aside investment or
resources to invest in establishing data standards.
instrumentation due to commercialization or other profit-driven
incentives. There is a lack of regulatory oversight to adhere to
available standards or evolve Significant data transformation is
required to get data to a state that is amenable to standards, if
available. If not available, there is a lack of incentive to set aside
investment or resources to invest in establishing data standards.

\subsubsection{Harnessing new computing paradigms and
technologies}\label{harnessing-new-computing-paradigms-and-technologies}

Open-source standards development faces the challenges of adapting to
new computing paradigms and technologies. Cloud computing provides a
particularly stark set of opportunities and challenges. On the one hand,
cloud computing offers practical solutions for many challenges of
contemporary data-driven research. For example, the scalability of cloud
resources addresses some of the challenges of the scale of data that is
produced by instruments in many fields. The cloud also makes data access
relatively straightforward, because of the ability to determine data
access permissions in a granular fashion. On the other hand, cloud
computing requires reinstrumenting many data formats. This is because
cloud data access patterns are fundamentally different from the ones
that are used in local posix-style file-systems. Suspicion of cloud
computing comes in two different flavors: the first by researchers and
administrators who may be wary of costs associated with cloud computing,
and especially with the difficulty of predicting these costs. Projects
such as NSF's Cloud Bank seek to mitigate some of these concerns, by
providing an additional layer of transparency into cloud costs (Norman
et al. 2021). The other type of objection relates to the fact that cloud
computing services, by their very nature, are closed ecosystems that
resist portability and interoperability. Some aspects of the services
are always going to remain hidden and privy only to the cloud computing
service provider. In this respect, cloud computing runs afoul of some of
the appealing aspects of OSS. That said, the development of ``cloud
native'' standards can provide significant benefits in terms of the
research that can be conducted. For example, NOAA plans to use cloud
computing for integration across the multiple disparate datasets that it
collects to build knowledge graphs that can be queried by researchers to
answer questions that can only be answered through this integration.
Putting all the data ``in one place'' should help with that. Adaptation
to the cloud in terms of data standards has driven development of new
file formats. A salient example is the ZARR format (Miles et al. 2024),
which supports random access into array-based datasets stored in cloud
object storage, facilitating scalable and parallelized computing on
these data. Indeed, data standards such as NWB (neuroscience) and OME
(microscopy) now use ZARR as a backend for cloud-based storage. In other
cases, file formats that were once not straightforward to use in the
cloud, such as HDF5 and TIFF have been adapted to cloud use (e.g.,
through the cloud-optimized geoTIFF format).

\subsection{Sustainability}\label{sustainability}

\subsection{The importance of automated
validation}\label{the-importance-of-automated-validation}

\subsection{Harnessing new computing paradigms and
technologies}\label{harnessing-new-computing-paradigms-and-technologies}

Open-source standards development faces the challenges of adapting to
new technologies The development of standards that are well-Cloud
computing provides

\section{Cross-sector interactions}\label{sec-cross-sector}

The importance of standards stems not only from discussions within
Expand Down Expand Up @@ -569,12 +604,12 @@ \subsection{Industry}\label{industry}
device vendors and researchers.

\section{Recommendations for open-source data and metadata
standards}\label{recommendations-for-open-source-data-and-metadata-standards}
standards}\label{sec-recommendations}

In conclusion of this report, we propose the following recommendations:

\subsection{Funding or Grantmaking
entities:}\label{funding-or-grantmaking-entities}
\subsection{Policy-making and Funding
entities:}\label{policy-making-and-funding-entities}

\subsubsection{Fund Data Standards
Development}\label{fund-data-standards-development}
Expand All @@ -588,20 +623,33 @@ \subsubsection{Fund Data Standards
The OSS model is seen as a particularly promising avenue for an
investment of resources, because it builds on previously-developed
procedures and technical infrastructure and because it provides avenues
for community input along the way. The clarity offered by procedures for
enhancement proposals and semantic versioning schemes adopted in
standards development offer avenues for a range of stakeholders to
propose to funding bodies well-defined contributions to large and
field-wide standards efforts.

\subsubsection{Invest in Data Stewards Recognize data stewards as a
distinct role
in}\label{invest-in-data-stewards-recognize-data-stewards-as-a-distinct-role-in}

research and science investment. Set up programs for training for data
stewards and invest in career paths that encourage this role. Initial
proposals for the curriculum and scope of the role have already been
proposed (e.g., in (Mons 2018))
for democratization of development processes and for community input
along the way. The clarity offered by procedures for enhancement
proposals and semantic versioning schemes adopted in standards
development offer avenues for a range of stakeholders to propose to
funding bodies well-defined contributions to large and field-wide
standards efforts (e.g., (Pestilli et al. 2021)).

\subsubsection{Invest in Data Stewards}\label{invest-in-data-stewards}

Advancing the development and adoption of open-source standards requires
the dissemination of knowledge to researchers in a variety of fields,
but this dissemination itself may not be enough without the fostering of
specialized expertise. Therefore, it is important to recognize
\emph{data stewards} as a distinct role in research. To truly support
experts whose role will be to develop, maintain, and facilitate the
adoption and use of open-source standards, it will be necessary to set
up programs for training for data stewards and invest in career paths
that encourage this role. Initial proposals for the curriculum and scope
of the role have already been proposed (e.g., in (Mons 2018)). In
addition, in order for these individuals to be able to make the best use
of open-source standards, it will be important for these individuals to
be facile in the methodology of OSS. This does not mean that they need
to become software engineers -- though there may be some overlap with
the role of research software engineers (Connolly et al. 2023) -- but
rather that they need to become familiar with those parts of the OSS
development life-cycle that are useful for development of open-source
standards.

\subsubsection{Review Data Standards
Pathways}\label{review-data-standards-pathways}
Expand Down Expand Up @@ -629,14 +677,14 @@ \subsubsection{Establish Governance}\label{establish-governance}
\subsubsection{Program Manage Cross Sector
alliances}\label{program-manage-cross-sector-alliances}

Encourage cross sector and cross domain alliances that can impact
Encourage cross-sector and cross-domain alliances that can impact
successful standards creation. Invest in robust program management of
these alliances to align pace and create incentives (for instance via
Open Source Program Office / OSPO efforts). Similar to program officers
at funding agencies, standards evolution need sustained PM efforts.
Multi company partnerships should include strategic initiatives for
standard establishment
e.g.~\href{https://www.pistoiaalliance.org/news/press-release-pistoia-alliance-launches-idmp-1-0/}{Pistoiaalliance}.
standard establishment e.g.
\href{https://www.pistoiaalliance.org/news/press-release-pistoia-alliance-launches-idmp-1-0/}{Pistoiaalliance}.

\subsubsection{Curriculum Development}\label{curriculum-development}

Expand All @@ -646,7 +694,7 @@ \subsubsection{Curriculum Development}\label{curriculum-development}
\subsection{Science and Technology
Communities:}\label{science-and-technology-communities}

\subsubsection{User Driven Development}\label{user-driven-development}
\subsubsection{User-Driven Development}\label{user-driven-development}

Standards should be needs-driven and developed in close collaboration
with users. Changes and enhancements should be in response to community
Expand Down Expand Up @@ -718,6 +766,13 @@ \section*{References}\label{references}
et al. 2023. {``Data Preservation in High Energy Physics.''} \emph{The
European Physical Journal C} 83 (9): 795.

\bibitem[\citeproctext]{ref-Connolly2023Software}
Connolly, Andrew, Joseph Hellerstein, Naomi Alterman, David Beck, Rob
Fatland, Ed Lazowska, Vani Mandava, and Sarah Stone. 2023. {``{Software}
{Engineering} {Practices} in {Academia}: Promoting the
3Rs---{Readability}, {Resilience}, and {Reuse}.''} \emph{Harvard Data
Science Review} 5 (2).

\bibitem[\citeproctext]{ref-Gorgolewski2016BIDS}
Gorgolewski, Krzysztof J, Tibor Auer, Vince D Calhoun, R Cameron
Craddock, Samir Das, Eugene P Duff, Guillaume Flandin, et al. 2016.
Expand All @@ -729,6 +784,12 @@ \section*{References}\label{references}
Koch, Christof, and R Clay Reid. 2012. {``Observatories of the Mind.''}
\url{http://dx.doi.org/10.1038/483397a}.

\bibitem[\citeproctext]{ref-zarr}
Miles, Alistair, jakirkham, M Bussonnier, Josh Moore, Dimitri
Papadopoulos Orfanos, Davis Bennett, David Stansby, et al. 2024.
{``Zarr-Developers/Zarr-Python: V3.0.0-Alpha.''} Zenodo.
\url{https://doi.org/10.5281/zenodo.11592827}.

\bibitem[\citeproctext]{ref-Mons2018DataStewardshipBook}
Mons, Barend. 2018. \emph{Data Stewardship for Open Science:
Implementing FAIR Principles}. 1st ed. Vol. 1. Milton: CRC Press.
Expand All @@ -739,10 +800,25 @@ \section*{References}\label{references}
{LEADERSHIP} {IN} {AI}: A Plan for Federal Engagement in Developing
Technical Standards and Related Tools.''}

\bibitem[\citeproctext]{ref-Norman2021CloudBank}
Norman, Michael, Vince Kellen, Shava Smallen, Brian DeMeulle, Shawn
Strande, Ed Lazowska, Naomi Alterman, et al. 2021. {``{CloudBank:
Managed Services to Simplify Cloud Access for Computer Science Research
and Education}.''} In \emph{Practice and Experience in Advanced Research
Computing}. PEARC '21. New York, NY, USA: Association for Computing
Machinery. \url{https://doi.org/10.1145/3437359.3465586}.

\bibitem[\citeproctext]{ref-Nosek2019CultureChange}
Nosek, Brian. n.d. {``Strategy for Culture Change.''}
\url{https://www.cos.io/blog/strategy-for-culture-change}.

\bibitem[\citeproctext]{ref-pestilli2021community}
Pestilli, Franco, Russ Poldrack, Ariel Rokem, Theodore Satterthwaite,
Franklin Feingold, Eugene Duff, Cyril Pernet, Robert Smith, Oscar
Esteban, and Matt Cieslak. 2021. {``A Community-Driven Development of
the Brain Imaging Data Standard (BIDS) to Describe Macroscopic Brain
Connections.''} \emph{OSF}.

\bibitem[\citeproctext]{ref-Poldrack2024BIDS}
Poldrack, Russell A, Christopher J Markiewicz, Stefan Appelhoff, Yoni K
Ashar, Tibor Auer, Sylvain Baillet, Shashank Bansal, et al. 2024. {``The
Expand Down
77 changes: 77 additions & 0 deletions _tex/references.bib
Original file line number Diff line number Diff line change
@@ -1,3 +1,80 @@
@software{zarr,
author = {Alistair Miles and
jakirkham and
M Bussonnier and
Josh Moore and
Dimitri Papadopoulos Orfanos and
Davis Bennett and
David Stansby and
Joe Hamman and
James Bourbeau and
Andrew Fulton and
Gregory Lee and
Ryan Abernathey and
Norman Rzepka and
Zain Patel and
Mads R. B. Kristensen and
Sanket Verma and
Saransh Chopra and
Matthew Rocklin and
AWA BRANDON AWA and
Max Jones and
Martin Durant and
Elliott Sales de Andrade and
Vincent Schut and
raphael dussin and
Shivank Chaudhary and
Chris Barnes and
Juan Nunez-Iglesias and
shikharsg},
title = {zarr-developers/zarr-python: v3.0.0-alpha},
month = jun,
year = 2024,
publisher = {Zenodo},
version = {v3.0.0-alpha},
doi = {10.5281/zenodo.11592827},
url = {https://doi.org/10.5281/zenodo.11592827}
}

@inproceedings{Norman2021CloudBank,
author = {Norman, Michael and Kellen, Vince and Smallen, Shava and DeMeulle, Brian and Strande, Shawn and Lazowska, Ed and Alterman, Naomi and Fatland, Rob and Stone, Sarah and Tan, Amanda and Yelick, Katherine and Van Dusen, Eric and Mitchell, James},
title = {{CloudBank: Managed Services to Simplify Cloud Access for Computer Science Research and Education}},
year = {2021},
isbn = {9781450382922},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3437359.3465586},
doi = {10.1145/3437359.3465586},
abstract = {CloudBank is a cloud access entity founded to enable the computer science research and education communities to harness the profound computational potential of public clouds. By delivering a set of managed services designed to alleviate common points of friction associated with cloud adoption, Cloudbank serves as an integrated service provider to the research and education community. These services include front-line help desk support, cloud solution consulting, training, account management, cost monitoring and optimization support, and automated billing. CloudBank has a multi-cloud pay-per-use billing model and aims to serve the spectrum of cloud users from novice to advanced.},
booktitle = {Practice and Experience in Advanced Research Computing},
articleno = {45},
numpages = {4},
keywords = {Cloud Computing},
location = {Boston, MA, USA},
series = {PEARC '21}
}

@article{Connolly2023Software,
author = {Connolly, Andrew and Hellerstein, Joseph and Alterman, Naomi and Beck, David and Fatland, Rob and Lazowska, Ed and Mandava, Vani and Stone, Sarah},
journal = {Harvard Data Science Review},
number = {2},
year = {2023},
month = {apr 27},
note = {https://hdsr.mitpress.mit.edu/pub/f0f7h5cu},
publisher = {},
title = {
{Software} {Engineering} {Practices} in {Academia}: Promoting the 3Rs---{Readability}, {Resilience}, and {Reuse}},
volume = {5},
}

@article{pestilli2021community,
title={A community-driven development of the Brain Imaging Data Standard (BIDS) to describe macroscopic brain connections},
author={Pestilli, Franco and Poldrack, Russ and Rokem, Ariel and Satterthwaite, Theodore and Feingold, Franklin and Duff, Eugene and Pernet, Cyril and Smith, Robert and Esteban, Oscar and Cieslak, Matt},
journal={OSF},
year={2021}
}

@MISC{Nosek2019CultureChange,
title = "Strategy for Culture Change",
author = "Nosek, Brian",
Expand Down
Binary file modified index.docx
Binary file not shown.
Loading

0 comments on commit 7b4d737

Please sign in to comment.