Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Concept: Pipelines and Command Lists #714

Merged
merged 8 commits into from
Dec 8, 2024
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions concepts/pipelines/.meta/config.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
{
"authors": [
"glennj"
],
"contributors": [
],
glennj marked this conversation as resolved.
Show resolved Hide resolved
"blurb": "Compose more complex bash commands with pipelines and command lists"
}
1 change: 1 addition & 0 deletions concepts/pipelines/about.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# TODO
195 changes: 195 additions & 0 deletions concepts/pipelines/introduction.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,195 @@
# Pipelines and Command Lists

We have seen how to write simple commands, where a command is followed by arguments.
Now we will see how to make more complex commands by composing simple commands.

## I/O

Before we start, a quick intro to input/output.
glennj marked this conversation as resolved.
Show resolved Hide resolved

Processes have "standard I/O channels".

* A process can consume _input_ on "stdin".
* A process can emit _output_ on "stdout".
* A process can emit _error output_ on "stderr".

The `tr` command is a very pure example of this.
All it does is read text from its stdin, perform character transliterations, and print the resulting text to stdout.

We will see more about manipulating stdio channels later in the syllabus.

## Pipelines

This is one of the "killer features" of shell programming.
Pipelines allow you create sophisticated transformations on a stream of text.

To produce a sorted list of users:

```bash
cat /etc/passwd | cut -d : -f 1 | sort
```

The pipe symbol (`|`) connects the output of one command to the input of another.
`cut` reads the output of `cat`, and `sort` reads the output of `cut`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to take a step back and mention how many commands read from STDIN (and/or a file) and write to STDOUT? Introduce STDIN/STDOUT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I'll do a "sneak preview" of I/O, with a promise of more to come in a later concept.


~~~~exercism/advanced
* By default, each command in a pipeline runs in a separate subshell.
IsaacG marked this conversation as resolved.
Show resolved Hide resolved
(A subshell is a child process that is a copy of the currently running shell.)

* All the commands in a subshell execute in parallel.
glennj marked this conversation as resolved.
Show resolved Hide resolved

* There is a performance cost to running pipelines.
If you find yourself with long pipelines of similar commands, consider combining them in a single command.
For example, pipelines using multiple instances of `grep`, `cut`, `sed`, `awk`, and `tr` can generally be combined into a single `awk` command for efficiency.

* The exit status of a pipeline is the exit status of the last command in the pipeline.
However, there is a shell setting that can control this.
The "pipefail" setting (enabled with `set -o pipefail`) will use the _**last** non-zero exit status_ of the commands in a pipeline as the pipeline's exit status, unless all commands succeeded.
~~~~

## Command Lists

A command list is a sequence of pipelines separated by `;` (or newline), `&&` or `||`.

* `A; B` is a command list where `B` executes after `A` has completed.
* `A && B`, where `B` executes only if `A` succeeds (exits with status zero).
* `A || B`, where `B` executes only if `A` fails (exits with status non-zero).

The exit status of a command list is the exit status of the last command that was executed.

The `&&` and `||` operators can be chained so that the next command conditionally executes based on the status of the preceding commands.
For example

```bash
A && B && C && D || E
```

* B executes if A succeeds,
* C executes if A and B succeed,
* D executes if A and B and C succeed,
* E executes if **any** of A, B, C or D fails.

~~~~exercism/caution
Use these logical operators sparingly.
They can quickly lead to unreadable code, or logic that is hard to comprehend.

For example, do you think these are the same?

```bash
if A; then B; else C; fi
```

```bash
A && B || C
```

They differ in when C is executed.

* In the first snippet (the if statement), C will only execute if A fails.
* In the second snippet, C executes if A fails _or if A succeeds but B fails_!
~~~~

### Uses of Command Lists

Here are a couple of examples where command lists can simplify bash code.

#### Reading blocks of lines from a file

Suppose you have a data file that represents points of a triangle as the length of the three sides but each on a separate line.
glennj marked this conversation as resolved.
Show resolved Hide resolved

```bash
$ cat triangle.dat
3
4
5
9
12
14
```

You can use a while loop where the condition is three separate read commands:

```bash
while read a && read b && read c; do
if is_pythagorean "$a" "$b" "$c"; then
echo "$a:$b:$c is pythagorean"
else
echo "$a:$b:$c is not pythagorean"
fi
done < triangle.dat
```

Assuming `is_pythagorean` is a command that determines if the three sides satisfy the Pythagoran equation, the output would be

```none
3:4:5 is pythagorean
9:12:14 is not pythagorean
```

#### Assertions

Many programming languages have a form of assertion where an exception is thrown if some condition fails

```
assert(x == 5, "x must be 5");
```

We can use an OR operator in bash to simulate that function:

```bash
die () {
echo "$*" >&2
exit 1
}

[[ $x -eq 5 ]] || die "x must be equal to 5"
glennj marked this conversation as resolved.
Show resolved Hide resolved
[[ $y -gt 5 ]] || die "y must be greater than 5"
```

## Style Considerations

Long command lists become hard to read quite quickly.
Liberal use of newlines can help a lot.

Consider this example where a word is added to an array if two conditions are met.

```bash
[[ "$word" != "$topic" ]] && [[ "$key" == "$(sorted "$topic")" ]] && anagrams+=("$candidate")
```

Bash allows you to add a newline after a pipe or a logical operator.

```bash
[[ "$word" != "$topic" ]] &&
[[ "$key" == "$(sorted "$topic")" ]] &&
anagrams+=("$candidate")
```

However, the operator can be easy to miss at the end of the line.
Using a _line continuation_ means you can put the operator first, which makes it more obvious that the list is being continued:

```bash
[[ "$word" != "$topic" ]] \
&& [[ "$key" == "$(sorted "$topic")" ]] \
&& anagrams+=("$candidate")
```

~~~~exercism/note
A _line continuation_ is the two character sequence "backslash" and "newline" (`\` + `\n`).
When bash sees that sequence, it is simply removed from the code, thereby _continuing_ the current line with the next line.
Take care to not allow any spaces between the backslash and the newline.
~~~~

Here's another example

```bash
printf "%s\n" "${numbers[@]}" | bc --mathlib | sort --general-numeric-sort
```

or

```bash
printf "%s\n" "${numbers[@]}" \
| bc --mathlib \
| sort --general-numeric-sort
```
10 changes: 10 additions & 0 deletions concepts/pipelines/links.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
[
{
"url": "https://www.gnu.org/software/bash/manual/bash.html#Pipelines",
"description": "Pipelines"
},
{
"url": "https://www.gnu.org/software/bash/manual/bash.html#Lists",
"description": "Lists of Commands"
}
]
5 changes: 5 additions & 0 deletions config.json
Original file line number Diff line number Diff line change
Expand Up @@ -1243,6 +1243,11 @@
"uuid": "ae9f3e82-bcdd-4c09-9788-bc543235fd52",
"slug": "looping",
"name": "Looping"
},
{
"uuid": "f64a17aa-cbdb-49de-b50c-f3bf14a4e03d",
"slug": "pipelines",
"name": "Pipelines and Command Lists"
}
],
"key_features": [
Expand Down