-
-
Notifications
You must be signed in to change notification settings - Fork 398
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace date-parser with log-parser (regular expressions) #1148
Merged
Merged
Changes from all commits
Commits
Show all changes
17 commits
Select commit
Hold shift + click to select a range
af23032
Add exercise boilerplate
angelikatyborska e959308
Fork go's exercise
angelikatyborska 0ff6249
Update config
angelikatyborska 134e8d0
Write hints
angelikatyborska 342a93f
Prepare boilerplate solution
angelikatyborska 0df8ca4
Write simpler introduction
angelikatyborska 7827640
Give up on interpolated regexes
angelikatyborska 14b4244
Fix gitignore
angelikatyborska b97fd3c
Fix config
angelikatyborska b07703a
Fix formatting config
angelikatyborska 531d46e
Hints need to be a list
angelikatyborska 6c175fb
Friendlier tone
angelikatyborska f76d149
Fix headling levels in concept introduction
angelikatyborska 876b208
Update blurb
angelikatyborska bc51846
Merge branch 'main' into replace-regex-exercise
angelikatyborska a133b42
Add tutorial links to assumed knowledge note
angelikatyborska 394686e
Add code example for capturing
angelikatyborska File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,97 +1,57 @@ | ||
# Introduction | ||
|
||
Regular expressions (regex) are a powerful tool for working with strings in Elixir. Regular expressions in Elixir follow the **PCRE** specification (**P**erl **C**ompatible **R**egular **E**xpressions). String patterns representing the regular expression's meaning are first compiled then used for matching all or part of a string. | ||
Regular expressions in Elixir follow the **PCRE** specification (**P**erl **C**ompatible **R**egular **E**xpressions), similarly to other popular languages like Java, JavaScript, or Ruby. | ||
|
||
In Elixir, the most common way to create regular expressions is using the `~r` sigil. Sigils provide _syntactic sugar_ shortcuts for common tasks in Elixir. To match a _string literal_, we can use the string itself as a pattern following the sigil. | ||
The `Regex` module offers functions for working with regular expressions. Some of the `String` module functions accept regular expressions as arguments as well. | ||
|
||
```elixir | ||
~r/test/ | ||
``` | ||
|
||
The `=~/2` operator is useful to perform a regex match on a string to return a `boolean` result. | ||
|
||
```elixir | ||
"this is a test" =~ ~r/test/ | ||
# => true | ||
``` | ||
~~~~exercism/note | ||
This exercise assumes that you already know regular expression syntax, including character classes, quantifiers, groups, and captures. | ||
|
||
Two notes about using sigils: | ||
If you need a refresh your regular expression knowledge, check out one of those sources: [Regular-Expressions.info][website-regex-info], [Rex Egg][website-rexegg], [RegexOne][website-regexone], [Regular Expressions 101][website-regex-101], [RegExr][website-regexr]. | ||
~~~~ | ||
|
||
- many different delimiters may be used depending on your requirements rather than `/` | ||
- string patterns are already _escaped_, when writing the pattern as a string not using a regex, you will have to _escape_ backslashes (`\`) | ||
## Sigils | ||
|
||
## Character classes | ||
|
||
Matching a range of characters using square brackets `[]` defines a _character class_. This will match any one character to the characters in the class. You can also specify a range of characters like `a-z`, as long as the start and end represent a contiguous range of code points. | ||
The most common way to create regular expressions is using the `~r` sigil. | ||
|
||
```elixir | ||
regex = ~r/[a-z][ADKZ][0-9][!?]/ | ||
"jZ5!" =~ regex | ||
# => true | ||
"jB5?" =~ regex | ||
# => false | ||
~r/test/ | ||
``` | ||
|
||
_Shorthand character classes_ make the pattern more concise. For example: | ||
|
||
- `\d` short for `[0-9]` (any digit) | ||
- `\w` short for `[A-Za-z0-9_]` (any 'word' character) | ||
- `\s` short for `[ \t\r\n\f]` (any whitespace character) | ||
Note that all Elixir sigils support [different kinds of delimiters][sigils], not only `/`. | ||
|
||
When a _shorthand character class_ is used outside of a sigil, it must be escaped: `"\\d"` | ||
## Matching | ||
|
||
## Alternations | ||
|
||
_Alternations_ use `|` as a special character to denote matching one _or_ another | ||
The `=~/2` can be used to perform a regex match that returns `boolean` result. Alternatively, there are also `match/3` functions in the `Regex` module as well as the `String` module. | ||
|
||
```elixir | ||
regex = ~r/cat|bat/ | ||
"bat" =~ regex | ||
# => true | ||
"cat" =~ regex | ||
"this is a test" =~ ~r/test/ | ||
# => true | ||
``` | ||
|
||
## Quantifiers | ||
|
||
_Quantifiers_ allow for a repeating pattern in the regex. They affect the group preceding the quantifier. | ||
|
||
- `{N, M}` where `N` is the minimum number of repetitions, and `M` is the maximum | ||
- `{N,}` match `N` or more repetitions | ||
- `{0,}` may also be written as `*`: match zero-or-more repetitions | ||
- `{1,}` may also be written as `+`: match one-or-more repetitions | ||
- `{,N}` match up to `N` repetitions | ||
String.match?("Alice has 7 apples", ~r/\d{2}/) | ||
# => false | ||
``` | ||
|
||
## Groups | ||
## Capturing | ||
|
||
Round brackets `()` are used to denote _groups_ and _captures_. The group may also be _captured_ in some instances to be returned for use. In Elixir, these may be named or un-named. Captures are named by appending `?<name>` after the opening parenthesis. Groups function as a single unit, like when followed by _quantifiers_. | ||
If a simple boolean check is not enough, use the `Regex.run/3` function to get a list of all captures (or `nil` if there was no match). The first element in the returned list is always a match for the whole regular expression, and the following elements are matched groups. | ||
|
||
```elixir | ||
regex = ~r/(h)at/ | ||
Regex.replace(regex, "hat", "\\1op") | ||
# => "hop" | ||
|
||
regex = ~r/(?<letter_b>b)/ | ||
Regex.scan(regex, "blueberry", capture: :all_names) | ||
# => [["b"], ["b"]] | ||
Regex.run(~r/(\d) apples/, "Alice has 7 apples") | ||
# => ["7 apples", "7"] | ||
``` | ||
|
||
## Anchors | ||
|
||
_Anchors_ are used to tie the regular expression to the beginning or end of the string to be matched: | ||
## Modifiers | ||
|
||
- `^` anchors to the beginning of the string | ||
- `$` anchors to the end of the string | ||
The behavior of a regular expression can be modified by appending special flags. When using a sigil to create a regular expression, add the modifiers after the second delimiter. | ||
|
||
## Interpolation | ||
|
||
Because the `~r` is a shortcut for `"pattern" |> Regex.escape() |> Regex.compile!()`, you may also use string interpolation to dynamically build a regular expression pattern: | ||
Common modifiers are: | ||
- `i` - makes the match case-insensitive. | ||
- `u` - enables Unicode specific patterns like `\p` snf causes character classes like `\w`, `\s` etc. to also match Unicode. | ||
|
||
```elixir | ||
anchor = "$" | ||
regex = ~r/end of the line#{anchor}/ | ||
"end of the line?" =~ regex | ||
# => false | ||
"end of the line" =~ regex | ||
"this is a TEST" =~ ~r/test/i | ||
# => true | ||
``` | ||
|
||
[sigils]: https://hexdocs.pm/elixir/syntax-reference.html#sigils |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,48 @@ | ||
# Hints | ||
|
||
## General | ||
|
||
- Review regular expression patterns from the introduction. Remember, when creating the pattern a string, you must escape some characters. | ||
- Read about the [`Regex` module][regex-docs] in the documentation. | ||
- Read about the [regular expression sigil][sigils-regex] in the Getting Started guide. | ||
- Check out this website about regular expressions: [Regular-Expressions.info][website-regex-info]. | ||
- Check out this website about regular expressions: [Rex Egg - The world's most tyrannosauical regex tutorial][website-rexegg]. | ||
- Check out this website about regular expressions: [RegexOne - Learn Regular Expressions with simple, interactive exercises][website-regexone]. | ||
- Check out this website about regular expressions: [Regular Expressions 101 - an online regex sandbox][website-regex-101]. | ||
- Check out this website about regular expressions: [RegExr - an online regex sandbox][website-regexr]. | ||
|
||
## 1. Identify garbled log lines | ||
|
||
- Use the [`r` sigil][sigil-r] to create a regular expression. | ||
- There is [an operator]([match-operator]) that can be used to check a string against a regular expression. There is also a [`Regex` function][regex-match] and a [`String` function][string-match] that can do the same. | ||
|
||
## 2. Split the log line | ||
|
||
- There is a [`Regex` function][regex-split] as well as a [`String` function][string-split] that can split a string into a list of strings based on a regular expression. | ||
|
||
## 3. Remove artifacts from log | ||
|
||
- There is a [`Regex` function][regex-replace] as well as a [`String` function][string-replace] that can change a part of a string that matches a given regular expression to a different string. | ||
- There is a [modifier][regex-modifiers] that can make the whole regular expression case-insensitive. | ||
|
||
## 4. Tag lines with user names | ||
|
||
- There is a [`Regex` function][regex-run] that runs a regular expression against a string and returns all captures. | ||
|
||
[regex-docs]: https://hexdocs.pm/elixir/Regex.html | ||
[sigils-regex]: https://elixir-lang.org/getting-started/sigils.html#regular-expressions | ||
[website-regex-info]: https://www.regular-expressions.info | ||
[website-rexegg]: https://www.rexegg.com/ | ||
[website-regexone]: https://regexone.com/ | ||
[website-regex-101]: https://regex101.com/ | ||
[website-regexr]: https://regexr.com/ | ||
[sigil-r]: https://hexdocs.pm/elixir/Kernel.html#sigil_r/2 | ||
[match-operator]: https://hexdocs.pm/elixir/Kernel.html#=~/2 | ||
[regex-match]: https://hexdocs.pm/elixir/Regex.html#match?/2 | ||
[string-match]: https://hexdocs.pm/elixir/String.html#match?/2 | ||
[regex-split]: https://hexdocs.pm/elixir/Regex.html#split/3 | ||
[string-split]: https://hexdocs.pm/elixir/String.html#split/3 | ||
[regex-replace]: https://hexdocs.pm/elixir/Regex.html#replace/4 | ||
[string-replace]: https://hexdocs.pm/elixir/String.html#replace/4 | ||
[regex-modifiers]: https://hexdocs.pm/elixir/Regex.html#module-modifiers | ||
[regex-run]: https://hexdocs.pm/elixir/Regex.html#run/3 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,74 @@ | ||
# Instructions | ||
|
||
After a recent security review you have been asked to clean up the organization's archived log files. | ||
|
||
## 1. Identify garbled log lines | ||
|
||
You need some idea of how many log lines in your archive do not comply with current standards. | ||
You believe that a simple test reveals whether a log line is valid. | ||
To be considered valid a line should begin with one of the following strings: | ||
|
||
- [DEBUG] | ||
- [INFO] | ||
- [WARNING] | ||
- [ERROR] | ||
|
||
Implement the `valid_line?/1` function to return `true` if the log line is valid. | ||
|
||
```elixir | ||
LogParser.valid_line?("[ERROR] Network Failure") | ||
# => true | ||
|
||
LogParser.valid_line?("Network Failure") | ||
# => false | ||
``` | ||
|
||
## 2. Split the log line | ||
|
||
Shorting after starting the log parsing project, you realize that one application's logs aren't split into lines like the others. In this project, what should have been separate lines, is instead on a single line, connected by fancy arrows such as `<--->` or `<*~*~>`. | ||
|
||
In fact, any string that has a first character of `<`, a last character of `>`, and any combination of the following characters `~`, `*`, `=`, and `-` in between can be used as a separator in this project's logs. | ||
|
||
Implement the `split_line/1` function that takes a line and returns a list of strings. | ||
|
||
```elixir | ||
LogParser.split_line("[INFO] Start.<*>[INFO] Processing...<~~~>[INFO] Success.") | ||
# => ["[INFO] Start.", "[INFO] Processing...", "[INFO] Success."] | ||
``` | ||
|
||
## 3. Remove artifacts from log | ||
|
||
You have found that some upstream processing of the logs has been scattering the text "end-of-line" followed by a line number (without an intervening space) throughout the logs. | ||
|
||
Implement the `remove_artifacts/1` function to take a string and remove all occurrence end-of-line text and return a clean log line. | ||
|
||
Lines not containing end-of-line text should be returned unmodified. | ||
|
||
Just remove the end of line string, there's no need to adjust the whitespaces. | ||
|
||
```elixir | ||
LogParser.remove_artifacts("[WARNING] end-of-line23033 Network Failure end-of-line27") | ||
# => "[WARNING] Network Failure " | ||
``` | ||
|
||
## 4. Tag lines with user names | ||
|
||
You have noticed that some of the log lines include sentences that refer to users. | ||
These sentences always contain the string `"User"`, followed by one or more whitespace characters, and then a user name. | ||
You decide to tag such lines. | ||
|
||
Implement a function `tag_with_user_name/1` that processes log lines: | ||
|
||
- Lines that do not contain the string `"User"` remain unchanged. | ||
- For lines that contain the string `"User"`, prefix the line with `[USER]` followed by the user name. | ||
|
||
```elixir | ||
LogParser.tag_with_user_name("[INFO] User Alice created a new project") | ||
# => "[USER] Alice [INFO] User Alice created a new project" | ||
``` | ||
|
||
You can assume that: | ||
|
||
- Each occurrence of the string `"User"` is followed by one or more whitespace character and the user name. | ||
- There is at most one occurrence of the string `"User"` on each line. | ||
- User names are non-empty strings that do not contain whitespace. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,64 @@ | ||
# Introduction | ||
|
||
## Regular Expressions | ||
|
||
Regular expressions in Elixir follow the **PCRE** specification (**P**erl **C**ompatible **R**egular **E**xpressions), similarly to other popular languages like Java, JavaScript, or Ruby. | ||
|
||
The `Regex` module offers functions for working with regular expressions. Some of the `String` module functions accept regular expressions as arguments as well. | ||
|
||
~~~~exercism/note | ||
This exercise assumes that you already know regular expression syntax, including character classes, quantifiers, groups, and captures. | ||
|
||
If you need a refresh your regular expression knowledge, check out one of those sources: [Regular-Expressions.info][website-regex-info], [Rex Egg][website-rexegg], [RegexOne][website-regexone], [Regular Expressions 101][website-regex-101], [RegExr][website-regexr]. | ||
~~~~ | ||
|
||
### Sigils | ||
|
||
The most common way to create regular expressions is using the `~r` sigil. | ||
|
||
```elixir | ||
~r/test/ | ||
``` | ||
|
||
Note that all Elixir sigils support [different kinds of delimiters][sigils], not only `/`. | ||
|
||
### Matching | ||
|
||
The `=~/2` can be used to perform a regex match that returns `boolean` result. Alternatively, there are also `match/3` functions in the `Regex` module as well as the `String` module. | ||
|
||
```elixir | ||
"this is a test" =~ ~r/test/ | ||
# => true | ||
|
||
String.match?("Alice has 7 apples", ~r/\d{2}/) | ||
# => false | ||
``` | ||
|
||
### Capturing | ||
|
||
If a simple boolean check is not enough, use the `Regex.run/3` function to get a list of all captures (or `nil` if there was no match). The first element in the returned list is always a match for the whole regular expression, and the following elements are matched groups. | ||
|
||
```elixir | ||
Regex.run(~r/(\d) apples/, "Alice has 7 apples") | ||
# => ["7 apples", "7"] | ||
``` | ||
|
||
### Modifiers | ||
|
||
The behavior of a regular expression can be modified by appending special flags. When using a sigil to create a regular expression, add the modifiers after the second delimiter. | ||
|
||
Common modifiers are: | ||
- `i` - makes the match case-insensitive. | ||
- `u` - enables Unicode specific patterns like `\p` snf causes character classes like `\w`, `\s` etc. to also match Unicode. | ||
|
||
```elixir | ||
"this is a TEST" =~ ~r/test/i | ||
# => true | ||
``` | ||
|
||
[sigils]: https://hexdocs.pm/elixir/syntax-reference.html#sigils | ||
[website-regex-info]: https://www.regular-expressions.info | ||
[website-rexegg]: https://www.rexegg.com/ | ||
[website-regexone]: https://regexone.com/ | ||
[website-regex-101]: https://regex101.com/ | ||
[website-regexr]: https://regexr.com/ |
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
# Used by "mix format" | ||
[ | ||
inputs: ["{mix,.formatter}.exs", "{config,lib,test}/**/*.{ex,exs}"] | ||
] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
# The directory Mix will write compiled artifacts to. | ||
/_build/ | ||
|
||
# If you run "mix test --cover", coverage assets end up here. | ||
/cover/ | ||
|
||
# The directory Mix downloads your dependencies sources to. | ||
/deps/ | ||
|
||
# Where third-party dependencies like ExDoc output generated docs. | ||
/doc/ | ||
|
||
# Ignore .fetch files in case you like to edit your project deps locally. | ||
/.fetch | ||
|
||
# If the VM crashes, it generates a dump, let's ignore it too. | ||
erl_crash.dump | ||
|
||
# Also ignore archive artifacts (built via "mix archive.build"). | ||
*.ez | ||
|
||
# Ignore package tarball (built via "mix hex.build"). | ||
log-parser-*.tar | ||
|
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we offer a link or two for those who don't?
To be honest, I just googled learning regex and found plenty of material, but there are also a bunch of links in the hints we could reuse.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done:
a133b42
(#1148)