Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace date-parser with log-parser (regular expressions) #1148

Merged
merged 17 commits into from
Jul 30, 2022
Merged
Show file tree
Hide file tree
Changes from 10 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
95 changes: 26 additions & 69 deletions concepts/regular-expressions/introduction.md
Original file line number Diff line number Diff line change
@@ -1,95 +1,52 @@
# Introduction

Regular expressions (regex) are a powerful tool for working with strings in Elixir. Regular expressions in Elixir follow the **PCRE** specification (**P**erl **C**ompatible **R**egular **E**xpressions). String patterns representing the regular expression's meaning are first compiled then used for matching all or part of a string.
## Regular Expressions

In Elixir, the most common way to create regular expressions is using the `~r` sigil. Sigils provide _syntactic sugar_ shortcuts for common tasks in Elixir. To match a _string literal_, we can use the string itself as a pattern following the sigil.
Regular expressions in Elixir follow the **PCRE** specification (**P**erl **C**ompatible **R**egular **E**xpressions), similarly to other popular languages like Java, JavaScript, or Ruby.

```elixir
~r/test/
```

The `=~/2` operator is useful to perform a regex match on a string to return a `boolean` result.
The `Regex` module offers functions for working with regular expressions. Some of the `String` module functions accept regular expressions as arguments as well.

```elixir
"this is a test" =~ ~r/test/
# => true
```
~~~~exercism/note
This exercise assumes that you already know regular expression syntax, including character classes, quantifiers, groups, and captures.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we offer a link or two for those who don't?
To be honest, I just googled learning regex and found plenty of material, but there are also a bunch of links in the hints we could reuse.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

~~~~

Two notes about using sigils:
### Sigils

- many different delimiters may be used depending on your requirements rather than `/`
- string patterns are already _escaped_, when writing the pattern as a string not using a regex, you will have to _escape_ backslashes (`\`)

## Character classes

Matching a range of characters using square brackets `[]` defines a _character class_. This will match any one character to the characters in the class. You can also specify a range of characters like `a-z`, as long as the start and end represent a contiguous range of code points.
The most common way to create regular expressions is using the `~r` sigil.

```elixir
regex = ~r/[a-z][ADKZ][0-9][!?]/
"jZ5!" =~ regex
# => true
"jB5?" =~ regex
# => false
~r/test/
```

_Shorthand character classes_ make the pattern more concise. For example:

- `\d` short for `[0-9]` (any digit)
- `\w` short for `[A-Za-z0-9_]` (any 'word' character)
- `\s` short for `[ \t\r\n\f]` (any whitespace character)

When a _shorthand character class_ is used outside of a sigil, it must be escaped: `"\\d"`
Note that all Elixir sigils support [different kinds of delimiters][sigils], not only `/`.

## Alternations
### Matching

_Alternations_ use `|` as a special character to denote matching one _or_ another
The `=~/2` can be used to perform a regex match that returns `boolean` result. Alternatively, there are also `match/3` functions in the `Regex` module as well as the `String` module.

```elixir
regex = ~r/cat|bat/
"bat" =~ regex
# => true
"cat" =~ regex
"this is a test" =~ ~r/test/
# => true
```

## Quantifiers

_Quantifiers_ allow for a repeating pattern in the regex. They affect the group preceding the quantifier.

- `{N, M}` where `N` is the minimum number of repetitions, and `M` is the maximum
- `{N,}` match `N` or more repetitions
- `{0,}` may also be written as `*`: match zero-or-more repetitions
- `{1,}` may also be written as `+`: match one-or-more repetitions
- `{,N}` match up to `N` repetitions

## Groups

Round brackets `()` are used to denote _groups_ and _captures_. The group may also be _captured_ in some instances to be returned for use. In Elixir, these may be named or un-named. Captures are named by appending `?<name>` after the opening parenthesis. Groups function as a single unit, like when followed by _quantifiers_.

```elixir
regex = ~r/(h)at/
Regex.replace(regex, "hat", "\\1op")
# => "hop"

regex = ~r/(?<letter_b>b)/
Regex.scan(regex, "blueberry", capture: :all_names)
# => [["b"], ["b"]]
String.match?("Alice has 7 apples", ~r/\d{2}/)
# => false
```

## Anchors
### Capturing

_Anchors_ are used to tie the regular expression to the beginning or end of the string to be matched:
If a simple boolean check is not enough, use the `Regex.run/3` function to get a list of all captures (or `nil` if there was no match). The first element in the returned list is always the whole string, and the following elements are matched groups.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should add an example, something like

Regex.run(~r/test number (\d+)/, "This is test number 256")
# => ["test number 256", "256"]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done: 394686e (#1148)

It also made me realize that the initial description was wrong :) the first element isn't always the whole input string, only the part of it that was a match for the regex.


- `^` anchors to the beginning of the string
- `$` anchors to the end of the string
### Modifiers

Because the `~r` is a shortcut for `"pattern" |> Regex.escape() |> Regex.compile!()`, you may also use string interpolation to dynamically build a regular expression pattern:
The behavior of a regular expression can be modified by appending special flags. When using a sigil to create a regular expression, add the modifiers after the second delimiter.

Common modifiers are:
- `i` - makes the match case-insensitive.
- `u` - enables Unicode specific patterns like `\p` snf causes character classes like `\w`, `\s` etc. to also match Unicode.

```elixir
anchor = "$"
regex = ~r/end of the line#{anchor}/
"end of the line?" =~ regex
# => false
"end of the line" =~ regex
"this is a TEST" =~ ~r/test/i
# => true
```

[sigils]: https://hexdocs.pm/elixir/syntax-reference.html#sigils
26 changes: 19 additions & 7 deletions config.json
Original file line number Diff line number Diff line change
Expand Up @@ -219,13 +219,9 @@
"slug": "date-parser",
"name": "Date Parser",
"uuid": "57198686-71c9-4f38-973a-a111435560e7",
"concepts": [
"regular-expressions"
],
"prerequisites": [
"strings"
],
"status": "active"
"concepts": [],
"prerequisites": [],
"status": "deprecated"
},
{
"slug": "rpg-character-sheet",
Expand Down Expand Up @@ -650,6 +646,22 @@
"typespecs"
],
"status": "beta"
},
{
"slug": "log-parser",
"name": "Log Parser",
"uuid": "708fa8b8-59b9-43e7-be40-74759c3cc9a4",
"concepts": [
"regular-expressions"
],
"prerequisites": [
"strings",
"lists",
"pattern-matching",
"nil",
"if"
],
"status": "beta"
}
],
"practice": [
Expand Down
48 changes: 48 additions & 0 deletions exercises/concept/log-parser/.docs/hints.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
# Hints

## General

- Review regular expression patterns from the introduction. Remember, when creating the pattern a string, you must escape some characters.
- Read about the [`Regex` module][regex-docs] in the documentation.
- Read about the [regular expression sigil][sigils-regex] in the Getting Started guide.
- Check out this website about regular expressions: [Regular-Expressions.info][website-regex-info].
- Check out this website about regular expressions: [Rex Egg -The world's most tyrannosauical regex tutorial][website-rexegg].
- Check out this website about regular expressions: [RegexOne - Learn Regular Expressions with simple, interactive exercises.][website-regexone].
- Check out this website about regular expressions: [Regular Expressions 101 - an online regex sandbox][website-regex-101].
- Check out this website about regular expressions: [RegExr - an online regex sandbox][website-regexr].

## 1. Identify garbled log lines

- Use the [`r` sigil][sigil-r] to create a regular expression.
- There is [an operator]([match-operator]) that can be used to check a string against a regular expression. There is also a [`Regex` function][regex-match] and a [`String` function][string-match] that can do the same.

## 2. Split the log line

- There is a [`Regex` function][regex-split] as well as a [`String` function][string-split] that can split a string into a list of strings based on a regular expression.

## 3. Remove artifacts from log

- There is a [`Regex` function][regex-replace] as well as a [`String` function][string-replace] that can change a part of a string that matches a given regular expression to a different string.
- There is a [modifier][regex-modifiers] that can make the whole regular expression case-insensitive.

## 4. Tag lines with user names

There is a [`Regex` function][regex-run] that runs a regular expression against a string and returns all captures.

[regex-docs]: https://hexdocs.pm/elixir/Regex.html
[sigils-regex]: https://elixir-lang.org/getting-started/sigils.html#regular-expressions
[website-regex-info]: https://www.regular-expressions.info
[website-rexegg]: https://www.rexegg.com/
[website-regexone]: https://regexone.com/
[website-regex-101]: https://regex101.com/
[website-regexr]: https://regexr.com/
[sigil-r]: https://hexdocs.pm/elixir/Kernel.html#sigil_r/2
[match-operator]: https://hexdocs.pm/elixir/Kernel.html#=~/2
[regex-match]: https://hexdocs.pm/elixir/Regex.html#match?/2
[string-match]: https://hexdocs.pm/elixir/String.html#match?/2
[regex-split]: https://hexdocs.pm/elixir/Regex.html#split/3
[string-split]: https://hexdocs.pm/elixir/String.html#split/3
[regex-replace]: https://hexdocs.pm/elixir/Regex.html#replace/4
[string-replace]: https://hexdocs.pm/elixir/String.html#replace/4
[regex-modifiers]: https://hexdocs.pm/elixir/Regex.html#module-modifiers
[regex-run]: https://hexdocs.pm/elixir/Regex.html#run/3
74 changes: 74 additions & 0 deletions exercises/concept/log-parser/.docs/instructions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
# Instructions

After a recent security review you have been asked to clean up the organization's archived log files.

## 1. Identify garbled log lines

You need some idea of how many log lines in your archive do not comply with current standards.
You believe that a simple test reveals whether a log line is valid.
To be considered valid a line should begin with one of the following strings:

- [DEBUG]
- [INFO]
- [WARNING]
- [ERROR]

Implement the `valid_line?/1` function to return `true` if the log line is valid.

```elixir
LogParser.valid_line?("[ERROR] Network Failure")
# => true

LogParser.valid_line?("Network Failure")
# => false
```

## 2. Split the log line

Shorting after starting the log parsing project, you realize that one application's logs aren't split into lines like the others. In this project, what should have been separate lines, is instead on a single line, connected by fancy arrows such as `<--->` or `<*~*~>`.

In fact, any string that has a first character of `<`, a last character of `>`, and any combination of the following characters `~`, `*`, `=`, and `-` in between can be used as a separator in this project's logs.

Implement the `split_line/1` function that takes a line and returns a list of strings.

```elixir
LogParser.split_line("[INFO] Start.<*>[INFO] Processing...<~~~>[INFO] Success.")
# => ["[INFO] Start.", "[INFO] Processing...", "[INFO] Success."]
```

## 3. Remove artifacts from log

You have found that some upstream processing of the logs has been scattering the text "end-of-line" followed by a line number (without an intervening space) throughout the logs.

Implement the `remove_artifacts/1` function to take a string and remove all occurrence end-of-line text and return a clean log line.

Lines not containing end-of-line text should be returned unmodified.

Just remove the end of line string. Do not attempt to adjust the whitespaces.

```elixir
LogParser.remove_artifacts("[WARNING] end-of-line23033 Network Failure end-of-line27")
# => "[WARNING] Network Failure "
```

## 4. Tag lines with user names

You have noticed that some of the log lines include sentences that refer to users.
These sentences always contain the string `"User"`, followed by one or more whitespace characters, and then a user name.
You decide to tag such lines.

Implement a function `tag_with_user_name/1` that processes log lines:

- Lines that do not contain the string `"User"` remain unchanged.
- For lines that contain the string `"User"`, prefix the line with `[USER]` followed by the user name.

```elixir
LogParser.tag_with_user_name("[INFO] User Alice created a new project")
# => "[USER] Alice [INFO] User Alice created a new project"
```

You can assume that:

- Each occurrence of the string `"User"` is followed by one or more whitespace character and the user name.
- There is at most one occurrence of the string `"User"` on each line.
- User names are non-empty strings that do not contain whitespace.
52 changes: 52 additions & 0 deletions exercises/concept/log-parser/.docs/introduction.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
# Introduction

## Regular Expressions

Regular expressions in Elixir follow the **PCRE** specification (**P**erl **C**ompatible **R**egular **E**xpressions), similarly to other popular languages like Java, JavaScript, or Ruby.

The `Regex` module offers functions for working with regular expressions. Some of the `String` module functions accept regular expressions as arguments as well.

~~~~exercism/note
This exercise assumes that you already know regular expression syntax, including character classes, quantifiers, groups, and captures.
~~~~

### Sigils

The most common way to create regular expressions is using the `~r` sigil.

```elixir
~r/test/
```

Note that all Elixir sigils support [different kinds of delimiters][sigils], not only `/`.

### Matching

The `=~/2` can be used to perform a regex match that returns `boolean` result. Alternatively, there are also `match/3` functions in the `Regex` module as well as the `String` module.

```elixir
"this is a test" =~ ~r/test/
# => true

String.match?("Alice has 7 apples", ~r/\d{2}/)
# => false
```

### Capturing

If a simple boolean check is not enough, use the `Regex.run/3` function to get a list of all captures (or `nil` if there was no match). The first element in the returned list is always the whole string, and the following elements are matched groups.

### Modifiers

The behavior of a regular expression can be modified by appending special flags. When using a sigil to create a regular expression, add the modifiers after the second delimiter.

Common modifiers are:
- `i` - makes the match case-insensitive.
- `u` - enables Unicode specific patterns like `\p` snf causes character classes like `\w`, `\s` etc. to also match Unicode.

```elixir
"this is a TEST" =~ ~r/test/i
# => true
```

[sigils]: https://hexdocs.pm/elixir/syntax-reference.html#sigils
4 changes: 4 additions & 0 deletions exercises/concept/log-parser/.formatter.exs
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# Used by "mix format"
[
inputs: ["{mix,.formatter}.exs", "{config,lib,test}/**/*.{ex,exs}"]
]
24 changes: 24 additions & 0 deletions exercises/concept/log-parser/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
# The directory Mix will write compiled artifacts to.
/_build/

# If you run "mix test --cover", coverage assets end up here.
/cover/

# The directory Mix downloads your dependencies sources to.
/deps/

# Where third-party dependencies like ExDoc output generated docs.
/doc/

# Ignore .fetch files in case you like to edit your project deps locally.
/.fetch

# If the VM crashes, it generates a dump, let's ignore it too.
erl_crash.dump

# Also ignore archive artifacts (built via "mix archive.build").
*.ez

# Ignore package tarball (built via "mix hex.build").
log-parser-*.tar

21 changes: 21 additions & 0 deletions exercises/concept/log-parser/.meta/config.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
{
"authors": [
"angelikatyborska"
],
"files": {
"solution": [
"lib/log_parser.ex"
],
"test": [
"test/log_parser_test.exs"
],
"exemplar": [
".meta/exemplar.ex"
]
},
"language_versions": ">=1.10",
"forked_from": [
"go/parsing-log-files"
],
"blurb": "Learn about regular expressions by parsing dates."
}
Loading