-
-
Notifications
You must be signed in to change notification settings - Fork 398
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace date-parser with log-parser (regular expressions) #1148
Changes from 10 commits
af23032
e959308
0ff6249
134e8d0
342a93f
0df8ca4
7827640
14b4244
b97fd3c
b07703a
531d46e
6c175fb
f76d149
876b208
bc51846
a133b42
394686e
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,95 +1,52 @@ | ||
# Introduction | ||
|
||
Regular expressions (regex) are a powerful tool for working with strings in Elixir. Regular expressions in Elixir follow the **PCRE** specification (**P**erl **C**ompatible **R**egular **E**xpressions). String patterns representing the regular expression's meaning are first compiled then used for matching all or part of a string. | ||
## Regular Expressions | ||
|
||
In Elixir, the most common way to create regular expressions is using the `~r` sigil. Sigils provide _syntactic sugar_ shortcuts for common tasks in Elixir. To match a _string literal_, we can use the string itself as a pattern following the sigil. | ||
Regular expressions in Elixir follow the **PCRE** specification (**P**erl **C**ompatible **R**egular **E**xpressions), similarly to other popular languages like Java, JavaScript, or Ruby. | ||
|
||
```elixir | ||
~r/test/ | ||
``` | ||
|
||
The `=~/2` operator is useful to perform a regex match on a string to return a `boolean` result. | ||
The `Regex` module offers functions for working with regular expressions. Some of the `String` module functions accept regular expressions as arguments as well. | ||
|
||
```elixir | ||
"this is a test" =~ ~r/test/ | ||
# => true | ||
``` | ||
~~~~exercism/note | ||
This exercise assumes that you already know regular expression syntax, including character classes, quantifiers, groups, and captures. | ||
~~~~ | ||
|
||
Two notes about using sigils: | ||
### Sigils | ||
|
||
- many different delimiters may be used depending on your requirements rather than `/` | ||
- string patterns are already _escaped_, when writing the pattern as a string not using a regex, you will have to _escape_ backslashes (`\`) | ||
|
||
## Character classes | ||
|
||
Matching a range of characters using square brackets `[]` defines a _character class_. This will match any one character to the characters in the class. You can also specify a range of characters like `a-z`, as long as the start and end represent a contiguous range of code points. | ||
The most common way to create regular expressions is using the `~r` sigil. | ||
|
||
```elixir | ||
regex = ~r/[a-z][ADKZ][0-9][!?]/ | ||
"jZ5!" =~ regex | ||
# => true | ||
"jB5?" =~ regex | ||
# => false | ||
~r/test/ | ||
``` | ||
|
||
_Shorthand character classes_ make the pattern more concise. For example: | ||
|
||
- `\d` short for `[0-9]` (any digit) | ||
- `\w` short for `[A-Za-z0-9_]` (any 'word' character) | ||
- `\s` short for `[ \t\r\n\f]` (any whitespace character) | ||
|
||
When a _shorthand character class_ is used outside of a sigil, it must be escaped: `"\\d"` | ||
Note that all Elixir sigils support [different kinds of delimiters][sigils], not only `/`. | ||
|
||
## Alternations | ||
### Matching | ||
|
||
_Alternations_ use `|` as a special character to denote matching one _or_ another | ||
The `=~/2` can be used to perform a regex match that returns `boolean` result. Alternatively, there are also `match/3` functions in the `Regex` module as well as the `String` module. | ||
|
||
```elixir | ||
regex = ~r/cat|bat/ | ||
"bat" =~ regex | ||
# => true | ||
"cat" =~ regex | ||
"this is a test" =~ ~r/test/ | ||
# => true | ||
``` | ||
|
||
## Quantifiers | ||
|
||
_Quantifiers_ allow for a repeating pattern in the regex. They affect the group preceding the quantifier. | ||
|
||
- `{N, M}` where `N` is the minimum number of repetitions, and `M` is the maximum | ||
- `{N,}` match `N` or more repetitions | ||
- `{0,}` may also be written as `*`: match zero-or-more repetitions | ||
- `{1,}` may also be written as `+`: match one-or-more repetitions | ||
- `{,N}` match up to `N` repetitions | ||
|
||
## Groups | ||
|
||
Round brackets `()` are used to denote _groups_ and _captures_. The group may also be _captured_ in some instances to be returned for use. In Elixir, these may be named or un-named. Captures are named by appending `?<name>` after the opening parenthesis. Groups function as a single unit, like when followed by _quantifiers_. | ||
|
||
```elixir | ||
regex = ~r/(h)at/ | ||
Regex.replace(regex, "hat", "\\1op") | ||
# => "hop" | ||
|
||
regex = ~r/(?<letter_b>b)/ | ||
Regex.scan(regex, "blueberry", capture: :all_names) | ||
# => [["b"], ["b"]] | ||
String.match?("Alice has 7 apples", ~r/\d{2}/) | ||
# => false | ||
``` | ||
|
||
## Anchors | ||
### Capturing | ||
|
||
_Anchors_ are used to tie the regular expression to the beginning or end of the string to be matched: | ||
If a simple boolean check is not enough, use the `Regex.run/3` function to get a list of all captures (or `nil` if there was no match). The first element in the returned list is always the whole string, and the following elements are matched groups. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we should add an example, something like Regex.run(~r/test number (\d+)/, "This is test number 256")
# => ["test number 256", "256"] There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done: It also made me realize that the initial description was wrong :) the first element isn't always the whole input string, only the part of it that was a match for the regex. |
||
|
||
- `^` anchors to the beginning of the string | ||
- `$` anchors to the end of the string | ||
### Modifiers | ||
|
||
Because the `~r` is a shortcut for `"pattern" |> Regex.escape() |> Regex.compile!()`, you may also use string interpolation to dynamically build a regular expression pattern: | ||
The behavior of a regular expression can be modified by appending special flags. When using a sigil to create a regular expression, add the modifiers after the second delimiter. | ||
|
||
Common modifiers are: | ||
- `i` - makes the match case-insensitive. | ||
- `u` - enables Unicode specific patterns like `\p` snf causes character classes like `\w`, `\s` etc. to also match Unicode. | ||
|
||
```elixir | ||
anchor = "$" | ||
regex = ~r/end of the line#{anchor}/ | ||
"end of the line?" =~ regex | ||
# => false | ||
"end of the line" =~ regex | ||
"this is a TEST" =~ ~r/test/i | ||
# => true | ||
``` | ||
|
||
[sigils]: https://hexdocs.pm/elixir/syntax-reference.html#sigils |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,48 @@ | ||
# Hints | ||
|
||
## General | ||
|
||
- Review regular expression patterns from the introduction. Remember, when creating the pattern a string, you must escape some characters. | ||
- Read about the [`Regex` module][regex-docs] in the documentation. | ||
- Read about the [regular expression sigil][sigils-regex] in the Getting Started guide. | ||
- Check out this website about regular expressions: [Regular-Expressions.info][website-regex-info]. | ||
- Check out this website about regular expressions: [Rex Egg -The world's most tyrannosauical regex tutorial][website-rexegg]. | ||
- Check out this website about regular expressions: [RegexOne - Learn Regular Expressions with simple, interactive exercises.][website-regexone]. | ||
- Check out this website about regular expressions: [Regular Expressions 101 - an online regex sandbox][website-regex-101]. | ||
- Check out this website about regular expressions: [RegExr - an online regex sandbox][website-regexr]. | ||
|
||
## 1. Identify garbled log lines | ||
|
||
- Use the [`r` sigil][sigil-r] to create a regular expression. | ||
- There is [an operator]([match-operator]) that can be used to check a string against a regular expression. There is also a [`Regex` function][regex-match] and a [`String` function][string-match] that can do the same. | ||
|
||
## 2. Split the log line | ||
|
||
- There is a [`Regex` function][regex-split] as well as a [`String` function][string-split] that can split a string into a list of strings based on a regular expression. | ||
|
||
## 3. Remove artifacts from log | ||
|
||
- There is a [`Regex` function][regex-replace] as well as a [`String` function][string-replace] that can change a part of a string that matches a given regular expression to a different string. | ||
- There is a [modifier][regex-modifiers] that can make the whole regular expression case-insensitive. | ||
|
||
## 4. Tag lines with user names | ||
|
||
There is a [`Regex` function][regex-run] that runs a regular expression against a string and returns all captures. | ||
|
||
[regex-docs]: https://hexdocs.pm/elixir/Regex.html | ||
[sigils-regex]: https://elixir-lang.org/getting-started/sigils.html#regular-expressions | ||
[website-regex-info]: https://www.regular-expressions.info | ||
[website-rexegg]: https://www.rexegg.com/ | ||
[website-regexone]: https://regexone.com/ | ||
[website-regex-101]: https://regex101.com/ | ||
[website-regexr]: https://regexr.com/ | ||
[sigil-r]: https://hexdocs.pm/elixir/Kernel.html#sigil_r/2 | ||
[match-operator]: https://hexdocs.pm/elixir/Kernel.html#=~/2 | ||
[regex-match]: https://hexdocs.pm/elixir/Regex.html#match?/2 | ||
[string-match]: https://hexdocs.pm/elixir/String.html#match?/2 | ||
[regex-split]: https://hexdocs.pm/elixir/Regex.html#split/3 | ||
[string-split]: https://hexdocs.pm/elixir/String.html#split/3 | ||
[regex-replace]: https://hexdocs.pm/elixir/Regex.html#replace/4 | ||
[string-replace]: https://hexdocs.pm/elixir/String.html#replace/4 | ||
[regex-modifiers]: https://hexdocs.pm/elixir/Regex.html#module-modifiers | ||
[regex-run]: https://hexdocs.pm/elixir/Regex.html#run/3 |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,74 @@ | ||
# Instructions | ||
|
||
After a recent security review you have been asked to clean up the organization's archived log files. | ||
|
||
## 1. Identify garbled log lines | ||
|
||
You need some idea of how many log lines in your archive do not comply with current standards. | ||
You believe that a simple test reveals whether a log line is valid. | ||
To be considered valid a line should begin with one of the following strings: | ||
|
||
- [DEBUG] | ||
- [INFO] | ||
- [WARNING] | ||
- [ERROR] | ||
|
||
Implement the `valid_line?/1` function to return `true` if the log line is valid. | ||
|
||
```elixir | ||
LogParser.valid_line?("[ERROR] Network Failure") | ||
# => true | ||
|
||
LogParser.valid_line?("Network Failure") | ||
# => false | ||
``` | ||
|
||
## 2. Split the log line | ||
|
||
Shorting after starting the log parsing project, you realize that one application's logs aren't split into lines like the others. In this project, what should have been separate lines, is instead on a single line, connected by fancy arrows such as `<--->` or `<*~*~>`. | ||
|
||
In fact, any string that has a first character of `<`, a last character of `>`, and any combination of the following characters `~`, `*`, `=`, and `-` in between can be used as a separator in this project's logs. | ||
|
||
Implement the `split_line/1` function that takes a line and returns a list of strings. | ||
|
||
```elixir | ||
LogParser.split_line("[INFO] Start.<*>[INFO] Processing...<~~~>[INFO] Success.") | ||
# => ["[INFO] Start.", "[INFO] Processing...", "[INFO] Success."] | ||
``` | ||
|
||
## 3. Remove artifacts from log | ||
|
||
You have found that some upstream processing of the logs has been scattering the text "end-of-line" followed by a line number (without an intervening space) throughout the logs. | ||
|
||
Implement the `remove_artifacts/1` function to take a string and remove all occurrence end-of-line text and return a clean log line. | ||
|
||
Lines not containing end-of-line text should be returned unmodified. | ||
|
||
Just remove the end of line string. Do not attempt to adjust the whitespaces. | ||
|
||
```elixir | ||
LogParser.remove_artifacts("[WARNING] end-of-line23033 Network Failure end-of-line27") | ||
# => "[WARNING] Network Failure " | ||
``` | ||
|
||
## 4. Tag lines with user names | ||
|
||
You have noticed that some of the log lines include sentences that refer to users. | ||
These sentences always contain the string `"User"`, followed by one or more whitespace characters, and then a user name. | ||
You decide to tag such lines. | ||
|
||
Implement a function `tag_with_user_name/1` that processes log lines: | ||
|
||
- Lines that do not contain the string `"User"` remain unchanged. | ||
- For lines that contain the string `"User"`, prefix the line with `[USER]` followed by the user name. | ||
|
||
```elixir | ||
LogParser.tag_with_user_name("[INFO] User Alice created a new project") | ||
# => "[USER] Alice [INFO] User Alice created a new project" | ||
``` | ||
|
||
You can assume that: | ||
|
||
- Each occurrence of the string `"User"` is followed by one or more whitespace character and the user name. | ||
- There is at most one occurrence of the string `"User"` on each line. | ||
- User names are non-empty strings that do not contain whitespace. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,52 @@ | ||
# Introduction | ||
|
||
## Regular Expressions | ||
|
||
Regular expressions in Elixir follow the **PCRE** specification (**P**erl **C**ompatible **R**egular **E**xpressions), similarly to other popular languages like Java, JavaScript, or Ruby. | ||
|
||
The `Regex` module offers functions for working with regular expressions. Some of the `String` module functions accept regular expressions as arguments as well. | ||
|
||
~~~~exercism/note | ||
This exercise assumes that you already know regular expression syntax, including character classes, quantifiers, groups, and captures. | ||
~~~~ | ||
|
||
### Sigils | ||
|
||
The most common way to create regular expressions is using the `~r` sigil. | ||
|
||
```elixir | ||
~r/test/ | ||
``` | ||
|
||
Note that all Elixir sigils support [different kinds of delimiters][sigils], not only `/`. | ||
|
||
### Matching | ||
|
||
The `=~/2` can be used to perform a regex match that returns `boolean` result. Alternatively, there are also `match/3` functions in the `Regex` module as well as the `String` module. | ||
|
||
```elixir | ||
"this is a test" =~ ~r/test/ | ||
# => true | ||
|
||
String.match?("Alice has 7 apples", ~r/\d{2}/) | ||
# => false | ||
``` | ||
|
||
### Capturing | ||
|
||
If a simple boolean check is not enough, use the `Regex.run/3` function to get a list of all captures (or `nil` if there was no match). The first element in the returned list is always the whole string, and the following elements are matched groups. | ||
|
||
### Modifiers | ||
|
||
The behavior of a regular expression can be modified by appending special flags. When using a sigil to create a regular expression, add the modifiers after the second delimiter. | ||
|
||
Common modifiers are: | ||
- `i` - makes the match case-insensitive. | ||
- `u` - enables Unicode specific patterns like `\p` snf causes character classes like `\w`, `\s` etc. to also match Unicode. | ||
|
||
```elixir | ||
"this is a TEST" =~ ~r/test/i | ||
# => true | ||
``` | ||
|
||
[sigils]: https://hexdocs.pm/elixir/syntax-reference.html#sigils |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
# Used by "mix format" | ||
[ | ||
inputs: ["{mix,.formatter}.exs", "{config,lib,test}/**/*.{ex,exs}"] | ||
] |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
# The directory Mix will write compiled artifacts to. | ||
/_build/ | ||
|
||
# If you run "mix test --cover", coverage assets end up here. | ||
/cover/ | ||
|
||
# The directory Mix downloads your dependencies sources to. | ||
/deps/ | ||
|
||
# Where third-party dependencies like ExDoc output generated docs. | ||
/doc/ | ||
|
||
# Ignore .fetch files in case you like to edit your project deps locally. | ||
/.fetch | ||
|
||
# If the VM crashes, it generates a dump, let's ignore it too. | ||
erl_crash.dump | ||
|
||
# Also ignore archive artifacts (built via "mix archive.build"). | ||
*.ez | ||
|
||
# Ignore package tarball (built via "mix hex.build"). | ||
log-parser-*.tar | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
{ | ||
"authors": [ | ||
"angelikatyborska" | ||
], | ||
"files": { | ||
"solution": [ | ||
"lib/log_parser.ex" | ||
], | ||
"test": [ | ||
"test/log_parser_test.exs" | ||
], | ||
"exemplar": [ | ||
".meta/exemplar.ex" | ||
] | ||
}, | ||
"language_versions": ">=1.10", | ||
"forked_from": [ | ||
"go/parsing-log-files" | ||
], | ||
"blurb": "Learn about regular expressions by parsing dates." | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we offer a link or two for those who don't?
To be honest, I just googled learning regex and found plenty of material, but there are also a bunch of links in the hints we could reuse.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done:
a133b42
(#1148)