Skip to content

Commit

Permalink
Built site for gh-pages
Browse files Browse the repository at this point in the history
  • Loading branch information
Quarto GHA Workflow Runner committed Jun 19, 2024
1 parent 382738c commit 140f523
Show file tree
Hide file tree
Showing 29 changed files with 334 additions and 239 deletions.
2 changes: 1 addition & 1 deletion .nojekyll
Original file line number Diff line number Diff line change
@@ -1 +1 @@
31072696
9f6d0f22
214 changes: 123 additions & 91 deletions _tex/index.tex
Original file line number Diff line number Diff line change
Expand Up @@ -211,7 +211,7 @@ \section{Abstract}\label{abstract}
open-source models carry unique risks that need to be incorporated into
the process.

\section{Introduction}\label{introduction}
\section{Introduction}\label{sec-intro}

Data-intensive discovery has become an important mode of knowledge
production across many research fields and it is having a significant
Expand Down Expand Up @@ -257,93 +257,17 @@ \section{Introduction}\label{introduction}
a variety of stakeholders to guide the evolution of the software to take
their needs and interests into account.

The present report seeks to explore how OSS processes and tools have
affected the development of data and metadata standards. The report will
triangulate common features of a variety of use cases; it will identify
some of the challenges and pitfalls of this mode of standards
development; and it will make recommendations for future developments
and policies that can help this mode of standards development thrive and
reach its full potential.

\section{Opportunities and risks for open-source
standards}\label{opportunities-and-risks-for-open-source-standards}

Data and metadata standards that adopt tools and practices of OSS
(``open-source standards'' henceforth) stand to reap many of the
benefits that the OSS model has provided in the development of other
technologies. At the same time, these tools and practices are associated
with risks that need to be mitigated.

\subsection{Flexibility vs.~stability}\label{flexibility-vs.-stability}

One of the defining characteristics of OSS is its dynamism and its rapid
evolution. Because OSS can be used by anyone and, in most cases,
contributions can be made by anyone, innovations flow into OSS in a
bottom-up fashion from user/developers. Pathways to contribution by
members of the community are often well-defined: both from the technical
perspective (e.g., through a pull request on GitHub, or other similar
mechanisms), as well as from the social perspective (e.g., whether
contributors need to accept certain licensing conditions through a
contributor licensing agreement) and the socio-technical perspective
(e.g., how many people need to review a contribution, what are the
timelines for a contribution to be reviewed and accepted, what are the
release cycles of the software that make the contribution available to a
broader community of users, etc.). Similarly, open-source standards may
also find themselves addressing use cases and solutions that were not
originally envisioned through bottom-up contributions of members of a
research community to which the standard pertains. However, while this
dynamism provides an avenue for flexibility it also presents a source of
tension. This is because data and metadata standards apply to already
existing datasets, and changes may affect the compliance of these
existing datasets.

\subsection{Mismatches between standards developers and user
communities}\label{mismatches-between-standards-developers-and-user-communities}

There is an inherent gap in both interest and ability to engage with the
technical details undergirding standards and their development between
the core developers of the standard and their users. In extreme cases,
these interests may even be at odds, as developers implement
sophisticated mechanisms to automate the creation of the standard or
advocate for more technically advanced mechanisms for evolving the
standard, leaving potential users sidelined in the development of the
standard, and limiting their ability to provide feedback about the
practical implications of changes to the standards.

\subsection{Unclear pathways for standards
success}\label{unclear-pathways-for-standards-success}

Standards typically develop organically through sustained and persistent
efforts from dedicated groups of data practitioneers. These include
scientists and the broader ecosystem of data curators and users. However
there is no playbook on the structure and components of a data standard,
or the pathway that moves a data implementation to a data standard. As a
result, data standardization lacks formal avenues for research grants.

\subsection{Cross domain funding gaps}\label{cross-domain-funding-gaps}

Data standardization investment is justified if the standard is
generalizable beyond any specific science domain. However while the use
cases are domain sciences based, data standardization is seen as a data
infrastructure and not a science investment. Moreover due to how science
research funding works, scientists lack incentives to work across
domains, or work on infrastructure problems.

\subsection{Data instrumentation
issues}\label{data-instrumentation-issues}

Data for scientific observations are often generated by proprietary
instrumentation due to commercialization or other profit driven
incentives. There islack of regulatory oversight to adhere to available
standards or evolve Significant data transformation is required to get
data to a state that is amenable to standards, if available. If not
available, there is lack of incentive to set aside investment or
resources to invest in establishing data standards.

\subsection{Sustainability}\label{sustainability}

\subsection{The importance of automated
validation}\label{the-importance-of-automated-validation}
technologies.The present report explore how OSS processes and tools have
affected the development of data and metadata standards. The report will
triangulate common features of a variety of use cases; it will identify
some of the challenges and pitfalls of this mode of standards
development, with a particular focus on cross-sector interactions; and
it will make recommendations for future developments and policies that
can help this mode of standards development thrive and reach its full
potential.

\section{Use cases}\label{use-cases}

Expand Down Expand Up @@ -392,6 +316,10 @@ \subsection{High-energy physics (HEP)}\label{high-energy-physics-hep}
standards are provided by funders that require data management plans
that specify how the data is shared.

\subsection{Earth sciences}\label{earth-sciences}

XXX

\subsection{Neuroscience}\label{neuroscience}

In contrast to astronomy and HEP, Neuroscience has traditionally been a
Expand All @@ -416,19 +344,123 @@ \subsection{Neuroscience}\label{neuroscience}
success to the adoption of OSS development mechanisms (Poldrack et al.
2024). For example, small changes to the standard are managed through
the GitHub pull request mechanism; larger changes are managed through a
a BIDS Enhancement Proposal (BEP) process that is directly inspired by
the Python programming language community's Python Enhancement Proposal
procedure, which used to introduce new ideas into the language. Though
BIDS Enhancement Proposal (BEP) process that is directly inspired by the
Python programming language community's Python Enhancement Proposal
procedure, which isused to introduce new ideas into the language. Though
the BEP mechanism takes a slightly different technical approach, it
tries to emulate the open-ended and community-driven aspects of Python
development to accept contributions from a wide range of stakeholders
and tap a broad base of expertise.

\subsection{Automated discovery}\label{automated-discovery}

\subsection{Citizen science}\label{citizen-science}
\subsection{Community science}\label{community-science}

Another interesting use case for open-source standards is
community/citizen science. This approach, which has grown Here,
standards are needed to facilitate interactions between an in-group of
expert researchers who generate and curate data and a broader set of
out-group enthusiasts who would like to make meaningful contributions to
the science.

\section{Opportunities and risks for open-source
standards}\label{sec-opportunities}

At the same time, these tools and practices are associated with risks
that need to be mitigated.

\subsection{Flexibility vs.~stability}\label{flexibility-vs.-stability}

One of the defining characteristics of OSS is its dynamism and its rapid
evolution. Because OSS can be used by anyone and, in most cases,
contributions can be made by anyone, innovations flow into OSS in a
bottom-up fashion from user/developers. Pathways to contribution by
members of the community are often well-defined: both from the technical
perspective (e.g., through a pull request on GitHub, or other similar
mechanisms), as well as from the social perspective (e.g., whether
contributors need to accept certain licensing conditions through a
contributor licensing agreement) and the socio-technical perspective
(e.g., how many people need to review a contribution, what are the
timelines for a contribution to be reviewed and accepted, what are the
release cycles of the software that make the contribution available to a
broader community of users, etc.). Similarly, open-source standards may
also find themselves addressing use cases and solutions that were not
originally envisioned through bottom-up contributions of members of a
research community to which the standard pertains. However, while this
dynamism provides an avenue for flexibility it also presents a source of
tension. This is because data and metadata standards apply to already
existing datasets, and changes may affect the compliance of these
existing datasets. Similarly, analysis technology stacks that are
developed based on an existing version of a standard have to adapt to
the introduction of new ideas and changes into a standard. Dynamic
changes of this sort therefore risk causing a loss of faith in the
standard by a user community, and migration away from the standard.
Similarly, if a standard evolves too rapidly, users may choose to stick
to an outdated version of a standard for a long time, creating strains
on the community of developers and maintainers of a standard who will
need to accommodate long deprecation cycles.

\subsection{Mismatches between standards developers and user
communities}\label{mismatches-between-standards-developers-and-user-communities}

There is an inherent gap in both interest and ability to engage with the
technical details undergirding standards and their development between
the core developers of the standard and their users. In extreme cases,
these interests may even be at odds, as developers implement
sophisticated mechanisms to automate the creation and validation of the
standard or advocate for more technically advanced mechanisms for
evolving the standard. These advanced capabilities offer more robust
development practices and consistency in cases where the standards are
complex and elaborate. On the other hand, they may end up leaving
potential users sidelined in the development of the standard, and
limiting their ability to provide feedback about the practical
implications of changes to the standards.

\subsection{Unclear pathways for standards
success}\label{unclear-pathways-for-standards-success}

Standards typically develop organically through sustained and persistent
efforts from dedicated groups of data practitioners. These include
scientists and the broader ecosystem of data curators and users.
However, there is no playbook on the structure and components of a data
standard, or the pathway that moves a data implementation to a data
standard. As a result, data standardization lacks formal avenues for
success and recognition, for example through dedicated research grants
(and see Section~\ref{sec-cross-sector})

\subsection{Cross-domain funding gaps}\label{cross-domain-funding-gaps}

Data standardization investment is justified if the standard is
generalizable beyond any specific science domain. However while the use
cases are domain sciences based, data standardization is seen as a data
infrastructure and not a science investment. Moreover due to how science
research funding works, scientists lack incentives to work across
domains, or work on infrastructure problems.

\subsection{Data instrumentation
issues}\label{data-instrumentation-issues}

Data for scientific observations are often generated by proprietary
instrumentation due to commercialization or other profit driven
incentives. There islack of regulatory oversight to adhere to available
standards or evolve Significant data transformation is required to get
data to a state that is amenable to standards, if available. If not
available, there is lack of incentive to set aside investment or
resources to invest in establishing data standards.

\subsection{Sustainability}\label{sustainability}

\subsection{The importance of automated
validation}\label{the-importance-of-automated-validation}

\subsection{Harnessing new computing paradigms and
technologies}\label{harnessing-new-computing-paradigms-and-technologies}

Open-source standards development faces the challenges of adapting to
new technologies The development of standards that are well-Cloud
computing provides

\section{Cross-sector interactions}\label{cross-sector-interactions}
\section{Cross-sector interactions}\label{sec-cross-sector}

The importance of standards stems not only from discussions within
research fields about how research can best be conducted to take
Expand Down Expand Up @@ -531,7 +563,7 @@ \subsection{Industry}\label{industry}
proprietary/closed formats of data can create difficulty at various
transition points: from one instrument vendor to another, from data
producer to downstream recipient/user, etc. On the other hand, in some
cases cross-sector collaborations with commercial entities may pave the
cases, cross-sector collaborations with commercial entities may pave the
way to robust and useful standards. One example is the DICOM standard,
which is maintained by working groups that encompass commercial imaging
device vendors and researchers.
Expand Down
Binary file modified index.docx
Binary file not shown.
Loading

0 comments on commit 140f523

Please sign in to comment.