-
Notifications
You must be signed in to change notification settings - Fork 526
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
7 changed files
with
20 additions
and
377 deletions.
There are no files selected for viewing
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,263 +1,27 @@ | ||
--- | ||
# Ensure that this title is the same as the one in `myst.yml` | ||
title: A Numerical Perspective to Terraforming a Desert | ||
title: "Echostack: A flexible and scalable open-source software toolbox for echosounder data processing" | ||
abstract: | | ||
Water column sonar data collected by echosounders are essential for marine ecosystem research, allowing the detection, classification, and quantification of fish and zooplankton from many different ocean observing platforms. However, broad usage of these data has been hindered by the lack of software tools that allow intuitive and transparent data access, processing, and interpretation. We address this gap by developing Echostack, a toolbox of open-source packages leveraging distributed computing and cloud-interfacing libraries in the scientific Python ecosystem. These tools can be used individually or orchestrated together, which we will demonstrate in an end-to-end workflow. | ||
Water column sonar data collected by echosounders are essential for marine ecosystem research, allowing the detection, classification, and quantification of fish and zooplankton from many different ocean observing platforms. However, broad usage of these data has been hindered by the lack of software tools that allow intuitive and transparent data access, processing, and interpretation. We address this gap by developing Echostack, a toolbox of open-source packages leveraging distributed computing and cloud-interfacing libraries in the scientific Python ecosystem. These tools can be used individually or orchestrated together, which we demonstrate in example use cases in a common application scenario for a fisheries acoustic-trawl survey. | ||
--- | ||
|
||
## Introduction | ||
|
||
Twelve hundred years ago — in a galaxy just across the hill... | ||
Echosounders are high-frequency sonar systems optimized for sensing fish and zooplankton in the ocean. By transmitting sounds and analyzing the returning echoes, fishery and ocean scientists use echosounders to “image” the distribution and infer the abundance of these animals in the water column [@Medwin1998] [@fig:echo_data A]. As a remote sensing tool, echosounders are uniquely suitable for efficient, continuous biological monitoring across time and space, especially when compared to net trawls that are labor-intensive and discrete in nature, or optical imaging techniques that are limited in range due to the strong absorption of light in water [REF]. | ||
|
||
This document should be rendered with MyST Markdown [mystmd.org](https://mystmd.org), | ||
which is a markdown variant inspired by reStructuredText. This uses the `mystmd` | ||
CLI for scientific writing which can be [downloaded here](https://mystmd.org/guide/quickstart). | ||
When you have installed `mystmd`, run `myst start` in this folder and | ||
follow the link for a live preview, any changes to this file will be | ||
reflected immediately. | ||
In recent years, echosounders have been installed widely on many ocean observing platforms (fig:echo_data B), resulting in a deluge of data accumulating at an unprecedented speed from all corners of the ocean. These extensive datasets contain crucial information that can help scientists better understand the marine ecosystems and their response to the changing climate. However, the volume of the data (100s of GBs to TBs [REF]) and the complexity of the problem (e.g., how large- scale ocean processes drive changes in acoustically observed marine biota [REF]) naturally call for a paradigm shift in the data analysis workflow. | ||
|
||
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vestibulum sapien | ||
tortor, bibendum et pretium molestie, dapibus ac ante. Nam odio orci, interdum | ||
sit amet placerat non, molestie sed dui. Pellentesque eu quam ac mauris | ||
tristique sodales. Fusce sodales laoreet nulla, id pellentesque risus convallis | ||
eget. Nam id ante gravida justo eleifend semper vel ut nisi. Phasellus | ||
adipiscing risus quis dui facilisis fermentum. Duis quis sodales neque. Aliquam | ||
ut tellus dolor. Etiam ac elit nec risus lobortis tempus id nec erat. Morbi eu | ||
purus enim. Integer et velit vitae arcu interdum aliquet at eget purus. Integer | ||
quis nisi neque. Morbi ac odio et leo dignissim sodales. Pellentesque nec nibh | ||
nulla. Donec faucibus purus leo. Nullam vel lorem eget enim blandit ultrices. | ||
Ut urna lacus, scelerisque nec pellentesque quis, laoreet eu magna. Quisque ac | ||
justo vitae odio tincidunt tempus at vitae tortor. | ||
|
||
## Bibliographies, citations and block quotes | ||
|
||
Bibliography files and DOIs are automatically included and picked up by `mystmd`. | ||
These can be added using pandoc-style citations `[@doi:10.1109/MCSE.2007.55]` | ||
which fetches the citation information automatically and creates: [@doi:10.1109/MCSE.2007.55]. | ||
Additionally, you can use any key in the BibTeX file using `[@citation-key]`, | ||
as in [@hume48] (which literally is `[@hume48]` in accordance with | ||
the `hume48` cite-key in the associated `mybib.bib` file). | ||
Read more about [citations in the MyST documentation](https://mystmd.org/guide/citations). | ||
|
||
If you wish to have a block quote, you can just indent the text, as in: | ||
|
||
> When it is asked, What is the nature of all our reasonings concerning matter of fact? the proper answer seems to be, that they are founded on the relation of cause and effect. When again it is asked, What is the foundation of all our reasonings and conclusions concerning that relation? it may be replied in one word, experience. But if we still carry on our sifting humor, and ask, What is the foundation of all conclusions from experience? this implies a new question, which may be of more difficult solution and explication. | ||
> | ||
> -- @hume48 | ||
Other typography information can be found in the [MyST documentation](https://mystmd.org/guide/typography). | ||
|
||
### DOIs in bibliographies | ||
|
||
In order to include a DOI in your bibliography, add the DOI to your bibliography | ||
entry as a string. For example: | ||
|
||
```{code-block} bibtex | ||
:emphasize-lines: 7 | ||
:linenos: | ||
@book{hume48, | ||
author = "David Hume", | ||
year = {1748}, | ||
title = "An enquiry concerning human understanding", | ||
address = "Indianapolis, IN", | ||
publisher = "Hackett", | ||
doi = "10.1017/CBO9780511808432", | ||
} | ||
``` | ||
|
||
### Citing software and websites | ||
|
||
Any paper relying on open-source software would surely want to include citations. | ||
Often you can find a citation in BibTeX format via a web search. | ||
Authors of software packages may even publish guidelines on how to cite their work. | ||
|
||
For convenience, citations to common packages such as | ||
Jupyter [@jupyter], | ||
Matplotlib [@matplotlib], | ||
NumPy [@numpy], | ||
pandas [@pandas1; @pandas2], | ||
scikit-learn [@sklearn1; @sklearn2], and | ||
SciPy [@scipy] | ||
are included in this paper's `.bib` file. | ||
|
||
In this paper we not only terraform a desert using the package terradesert [@terradesert], we also catch a sandworm with it. | ||
To cite a website, the following BibTeX format plus any additional tags necessary for specifying the referenced content is recommended. | ||
If you are citing a team, ensure that the author name is wrapped in additional braces `{Team Name}`, so it is not treated as an author's first and last names. | ||
|
||
```{code-block} bibtex | ||
:emphasize-lines: 2 | ||
:linenos: | ||
@misc{terradesert, | ||
author = {{TerraDesert Team}}, | ||
title = {Code for terraforming a desert}, | ||
year = {2000}, | ||
url = {https://terradesert.com/code/}, | ||
note = {Accessed 1 Jan. 2000} | ||
} | ||
``` | ||
|
||
## Source code examples | ||
|
||
No paper would be complete without some source code. | ||
Code highlighting is completed if the name is given: | ||
|
||
```python | ||
def sum(a, b): | ||
"""Sum two numbers.""" | ||
|
||
return a + b | ||
``` | ||
|
||
Use the `{code-block}` directive if you are getting fancy with line numbers or emphasis. For example, line-numbers in `C` looks like: | ||
|
||
```{code-block} c | ||
:linenos: true | ||
int main() { | ||
for (int i = 0; i < 10; i++) { | ||
/* do something */ | ||
} | ||
return 0; | ||
} | ||
``` | ||
|
||
Or a snippet from the above code, starting at the correct line number, and emphasizing a line: | ||
|
||
```{code-block} c | ||
:linenos: true | ||
:lineno-start: 2 | ||
:emphasize-lines: 3 | ||
for (int i = 0; i < 10; i++) { | ||
/* do something */ | ||
} | ||
``` | ||
|
||
You can read more about code formatting in the [MyST documentation](https://mystmd.org/guide/code). | ||
|
||
## Figures, Equations and Tables | ||
|
||
It is well known that Spice grows on the planet Dune [@Atr03]. | ||
Test some maths, for example $e^{\pi i} + 3 \delta$. | ||
Or maybe an equation on a separate line: | ||
|
||
```{math} | ||
g(x) = \int_0^\infty f(x) dx | ||
``` | ||
|
||
or on multiple, aligned lines: | ||
|
||
```{math} | ||
\begin{aligned} | ||
g(x) &= \int_0^\infty f(x) dx \\ | ||
&= \ldots | ||
\end{aligned} | ||
``` | ||
|
||
The area of a circle and volume of a sphere are given as | ||
|
||
```{math} | ||
:label: circarea | ||
A(r) = \pi r^2. | ||
``` | ||
|
||
```{math} | ||
:label: spherevol | ||
V(r) = \frac{4}{3} \pi r^3 | ||
``` | ||
|
||
We can then refer back to Equation {ref}`circarea` or | ||
{ref}`spherevol` later. | ||
The `{ref}` role is another way to cross-reference in your document, which may be familiar to users of Sphinx. | ||
See complete documentation on [cross-references](https://mystmd.org/guide/cross-references). | ||
|
||
Mauris purus enim, volutpat non dapibus et, gravida sit amet sapien. In at | ||
consectetur lacus. Praesent orci nulla, blandit eu egestas nec, facilisis vel | ||
lacus. Fusce non ante vitae justo faucibus facilisis. Nam venenatis lacinia | ||
turpis. Donec eu ultrices mauris. Ut pulvinar viverra rhoncus. Vivamus | ||
adipiscing faucibus ligula, in porta orci vehicula in. Suspendisse quis augue | ||
arcu, sit amet accumsan diam. Vestibulum lacinia luctus dui. Aliquam odio arcu, | ||
faucibus non laoreet ac, condimentum eu quam. Quisque et nunc non diam | ||
consequat iaculis ut quis leo. Integer suscipit accumsan ligula. Sed nec eros a | ||
orci aliquam dictum sed ac felis. Suspendisse sit amet dui ut ligula iaculis | ||
sollicitudin vel id velit. Pellentesque hendrerit sapien ac ante facilisis | ||
lacinia. Nunc sit amet sem sem. In tellus metus, elementum vitae tincidunt ac, | ||
volutpat sit amet mauris. Maecenas[^footnote-1] diam turpis, placerat[^footnote-2] at adipiscing ac, | ||
pulvinar id metus. | ||
|
||
[^footnote-1]: On the one hand, a footnote. | ||
[^footnote-2]: On the other hand, another footnote. | ||
|
||
:::{figure} figure1.png | ||
:label: fig:stream | ||
This is the caption, sandworm vorticity based on storm location in a pleasing stream plot. Based on example in [matplotlib](https://matplotlib.org/stable/plot_types/arrays/streamplot.html). | ||
:::{figure} fig_echo_data.png | ||
:label: fig:echo_data | ||
:width: 700 px | ||
:align: center | ||
(A) Echograms at two different frequencies. Echo strength variation across frequency is useful for inferring scatterer identity. (B) The variety of ocean observing platforms with echosounders installed. | ||
::: | ||
|
||
:::{figure} figure2.png | ||
:label: fig:em | ||
This is the caption, electromagnetic signature of the sandworm based on remote sensing techniques. Based on example in [matplotlib](https://matplotlib.org/stable/plot_types/stats/hist2d.html). | ||
::: | ||
|
||
As you can see in @fig:stream and @fig:em, this is how you reference auto-numbered figures. | ||
To refer to a sub figure use the syntax `@label [a]` in text or `[@label a]` for a parenhetical citation (i.e. @fig:stream [a] vs [@fig:stream a]). | ||
For even more control, you can simply link to figures using `[Figure %s](#label)`, the `%s` will get filled in with the number, for example [Figure %s](#fig:stream). | ||
See complete documentation on [cross-references](https://mystmd.org/guide/cross-references). | ||
|
||
```{list-table} This is the caption for the materials table. | ||
:label: tbl:materials | ||
:header-rows: 1 | ||
* - Material | ||
- Units | ||
* - Stone | ||
- 3 | ||
* - Water | ||
- 12 | ||
* - Cement | ||
- {math}`\alpha` | ||
``` | ||
|
||
We show the different quantities of materials required in | ||
@tbl:materials. | ||
|
||
Unfortunately, markdown can be difficult for defining tables, so if your table is more complex you can try embedding HTML: | ||
|
||
:::{table} Area Comparisons (written in html) | ||
:label: tbl:areas-html | ||
|
||
<table> | ||
<tr><th rowspan="2">Projection</th><th colspan="3" align="center">Area in square miles</th></tr> | ||
<tr><th align="right">Large Horizontal Area</th><th align="right">Large Vertical Area</th><th align="right">Smaller Square Area<th></tr> | ||
<tr><td>Albers Equal Area </td><td align="right"> 7,498.7 </td><td align="right"> 10,847.3 </td><td align="right">35.8</td></tr> | ||
<tr><td>Web Mercator </td><td align="right"> 13,410.0 </td><td align="right"> 18,271.4 </td><td align="right">63.0</td></tr> | ||
<tr><td>Difference </td><td align="right"> 5,911.3 </td><td align="right"> 7,424.1 </td><td align="right">27.2</td></tr> | ||
<tr><td>Percent Difference </td><td align="right"> 44% </td><td align="right"> 41% </td><td align="right">43%</td></tr> | ||
</table> | ||
::: | ||
|
||
or if you prefer LaTeX you can try `tabular` or `longtable` environments: | ||
|
||
```{raw} latex | ||
\begin{table*} | ||
\begin{longtable*}{|l|r|r|r|} | ||
\hline | ||
\multirow{2}{*}{\bf Projection} & \multicolumn{3}{c|}{\bf Area in square miles} \\ | ||
\cline{2-4} | ||
& \textbf{Large Horizontal Area} & \textbf{Large Vertical Area} & \textbf{Smaller Square Area} \\ | ||
\hline | ||
Albers Equal Area & 7,498.7 & 10,847.3 & 35.8 \\ | ||
Web Mercator & 13,410.0 & 18,271.4 & 63.0 \\ | ||
Difference & 5,911.3 & 7,424.1 & 27.2 \\ | ||
Percent Difference & 44\% & 41\% & 43\% \\ | ||
\hline | ||
\end{longtable*} | ||
\caption{Area Comparisons (written in LaTeX) \label{tbl:areas-tex}} | ||
\end{table*} | ||
``` | ||
|
||
Perhaps we want to end off with a quote by Lao Tse[^footnote-3]: | ||
It is crucial to have software tools that are developed and shared openly, scalable in response to data size and computing platforms, easily interoperable with diverse analysis tools and different types of oceanographic data, and straightforward to reproduce to facilitate iterative modeling, parameterization, and mining of the data. These requirements are challenging to meet by the conventional echosounder data analysis workflows, which rely heavily on manual analysis on mostly closed-source software packages designed to be used with Graphic User Interface (GUI) on a single computer [REF]. Similarly, rather than continuing to store data in manufacturer-specific binary format, making echosounder data widely available in a standardized, machine-readable format will expand the use of these data beyond the original applications in fisheries surveys and specific research cruises. | ||
|
||
> Muddy water, let stand, becomes clear. | ||
In this paper, we introduce Echostack, an open-source Python software toolbox aimed at addressing these needs by providing the fisheries acoustics and ocean sciences communities with a suite of open tools for intuitive and transparent access, organization, processing, and visualization of these data. Echostack is a domain-specific adoption of the Pandata stack of Python libraries [REF] that streamlines the composition and execution of common echosounder data workflow, thereby allowing researchers to focus on the key interpretive stage of scientific data analysis. While it is possible for individual researchers to directly use Pandata tools for the same functionalities, we took the modularization approach and created the Echostack libraries, to: 1) enable and facilitate broader code reuse, especially for routine processing steps that are often common across echosounder data workflows [REF], 2) streamline interlace domain-specific operations with more general computational tools, such as the machine learning (ML) libraries, and 3) provide a friendlier on-ramp for researchers who are not already familiar with the scientific Python software ecosystem but possess domain expertise and can quickly benefit from “out-of-the-box” capabilities such native cloud access and distributed computing support. | ||
|
||
[^footnote-3]: $\mathrm{e^{-i\pi}}$ | ||
Below, we will discuss what the Echostack tools aim to achieve in Design considerations, outline the functionalities of individual libraries in The Echostack packages, demonstrate how Echostack tools can be leveraged in two example use cases, and conclude the discussion by looking forward in the Future work section. |
Oops, something went wrong.