-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add a tests file to document and demonstrate a few conversions? #1
Comments
Hello @DivineDominion! Thank you for the suggestion on the
The script is fairly long since a good 46% of the code is to parse a config file and set various parameters 😆
I assume you meant |
Oops, that was sloppy of me, the missing bracket was not intentional -- I meant a test for wiki links inside inline code in general, sorry! Similarly, wiki links inside code blocks should be skipped. Unit tests shouldn't be more complicated than the scripts themselves. It boils down to writing a sample string with Markdown, then a string with the expected output, then pass the Markdown to your converter function and compare results using A testing primer: https://docs.python.org/2/library/unittest.html Example from there: import unittest
class TestStringMethods(unittest.TestCase):
def test_upper(self):
self.assertEqual('foo'.upper(), 'FOO')
def test_isupper(self):
self.assertTrue('FOO'.isupper())
self.assertFalse('Foo'.isupper())
def test_split(self):
s = 'hello world'
self.assertEqual(s.split(), ['hello', 'world'])
# check that s.split fails when the separator is not a string
with self.assertRaises(TypeError):
s.split(2)
if __name__ == '__main__':
unittest.main() You would run To get your code tested, I think you'll want to refactor you def modify_links(file_obj):
logging.debug("Going to open file %s for processing now.", file)
try:
with open(file_obj, encoding="utf8") as infile:
contents = infile.read()
logging.debug("Finished processing file %s", file)
return modify_string(contents)
except EnvironmentError:
logging.exception("Unable to open file %s for reading", file)
return ""
def modify_string(str):
linelist = []
for line in iter(foo.splitlines()):
linelist.append(re.sub(r"(\[\[)((?<=\[\[).*(?=\]\]))(\]\])(?!\()", r"[\2](\2.md)", line))
# Finds references that are in style [[foo]] only by excluding links in style [[foo]](bar).
# Capture group $2 returns just foo
linelist_final = [re.sub(r"(\[\[)((?<=\[\[)\d+(?=\]\]))(\]\])(\()((?!=\().*(?=\)))(\))",
r"[\2](\2 \5.md)", line) for line in linelist]
# Finds only references in style [[foo]](bar). Capture group $2 returns foo and capture group $5
# returns bar
return "\n".join(linelist_final) |
@DivineDominion thank you for the detailed feedback. I guess I was bit too confident earlier - skipping wikilinks inside inline code was pretty easy. Skipping wikilinks inside code blocks on the other hand is proving to be much more difficult. I will continue to work on it. |
Dang, I was afraid of this :) For fenced code blocks, I think a little boolean flag can do the trick for most cases when you consume the Markdown line by line: When a line starts with three backticks, turn "isInCodeBlock = true", and ignore the remaining lines until you get to another set of backticks again. For classic code blocks with 4 spaces at the front it gets trickier, depending on whether or not the preceding line is a list item, e.g. this should not be treated as code: - This list item
is continued here Also this is not a code block: - First item
- Second item But this is: - Looks like a list but isn't You can wiggle your way towards higher accuracy with a couple more checks, though, to get it into a workable condition. The problem is that regex are context free, while Markdown is not a context free grammar: the same thing (4 spaces at the beginning of a line) mean different things in different contexts. That's what makes Markdown so hard :) (If you really think you want to do it accurately, maybe you can find a Markdown Python library that parses the Markdown and provides you with an abstract syntax tree that you can consume and where you can ignore all code blocks and only apply replacements for the remainder.) |
Hi @DivineDominion I've updated the script with a different regex engine/syntax which I believe correctly handles all recognized Markdown code syntax. Given the following The script will generate Note that:
The script will process wikilinks in :
But it will fail if the wiklink has a new line in the middle. I believe this is expected behavior for markdown file. Please try it out and let me know what you think! The tests file will eventually be added as well 😃 EDIT I have discovered that there is one wikilink scenario the regex does not handle - wikilinks that are indented by 4 or more spaces in a list. For example:
|
This looks great! Glancing at the code, I saw that you parse the input document line-by-line based on Regex at some point, and the script is fairly long (longer than a simple global String replacement! :)), so I guess it contains a couple of syntax checks. That'd be great.
My suggestion is to add a tests file, or maybe even embed the tests if the script is launched without any arguments or a
--run-tests
flag, or something. I first saw this in http://alanwsmith.com/embedding-a-test-suite-in-a-single-file-ruby-app-part-1 and found that to be a neat pattern for simple CLI scripts where I don't want to create a directory with the script and the tests separately, but still want to document how it behaves.Anyway, I'd be mostly interested in seeing how
is handled, and GitHub Flavored Markdown fenced code blocks :)
The text was updated successfully, but these errors were encountered: