Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BeautifulSoup logic in separate file #56

Open
wants to merge 14 commits into
base: master
Choose a base branch
from
45 changes: 45 additions & 0 deletions smarsy/bs_helper.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
from bs4 import BeautifulSoup


class BSHelper(object):
"""
The help class for BeautifulSoup library
"""
def __init__(self, html):
self.html = html

@property
def get_bs_object(self):
dkultasev marked this conversation as resolved.
Show resolved Hide resolved
"""
Utility funtcion:
- Accepts html and checks its validity using BeautifulSoup library,
return BS object or False
"""
try:
soup = BeautifulSoup(self.html, 'html.parser')
except TypeError:
return False
return soup

def bs_safe_select(self, html, *args):
"""
Utility function used to get a content string from a
HTML and tuple of selectors. Returns False
if no object is found for the given selector
"""
for arg in args:
selectedElems = html.select_one(arg)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

won't it always take the last output of the select_one ? It's not adding, for every iteration it re-assigns selectedElems with the new value. No? or is it expected?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is what is expected. Each new iteration overrides a variable selectedElems

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then I don't get it, does it supposed to return:

  • all objects for all found selectors?
  • the last found object?
  • the first found object?
  • any object?
  • is it expected to be some kind of chaining action, when the result of the previous iteration is used in the next one?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then I don't get it, does it supposed to return:

we take the object, apply the method select_one with selector 1 to it, then apply the method select_one with selector 2 to the received object, then apply the method select_one with the selector X to the received object and return the object or False

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you please provide real example from smarsy website? expected call with expected result?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From issue#51
<TD valign=top align="left" width="120"><img src="https://smarsy.ua/images/mypage/parent_1.png"></TD>
We must find td with valign=top and in received object find img[src]

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and what would be the function call for that html?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

html.select_one([valign=top]).select_one('img[src]')

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will it work? Additionally here you are passing single value parameter, but in your function you are expecting array. Please provide an example with array

if selectedElems is not None:
return selectedElems
return False

def bs_safe_get(self, html, attribute):
"""
Utility function used to get a content string from a
HTML and attribute. Returns False
if no object is found for the given selector
"""
element = html.get(attribute)
if element is not None:
return element
return False
88 changes: 88 additions & 0 deletions tests/test_bs_helper.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
import unittest
import sys
import os

from unittest.mock import patch, PropertyMock

sys.path.append(os.path.abspath(os.path.join(os.path.dirname(__file__),
'..')))
sys.path.append(os.path.abspath(os.path.join(os.path.dirname(__file__),
'..', 'smarsy')))
# excluding following line for linter as it complains that
# from import is supposed to be at the top of the file

from smarsy.bs_helper import BSHelper # noqa


class TestBSHelperInstance(unittest.TestCase):
def setUp(self):
dkultasev marked this conversation as resolved.
Show resolved Hide resolved
self.html = 'some html'
self.source_page = BSHelper(self.html)

def test_bshelper_instance_created(self):
self.assertEqual(self.source_page.html, self.html)


class TestGetPageSource(unittest.TestCase):
dkultasev marked this conversation as resolved.
Show resolved Hide resolved
@patch('smarsy.bs_helper.BeautifulSoup', new_callable=PropertyMock)
def test_get_bs_object_called_with_expected_html(self, mocked_soup):
html = '<tr></tr>'
source_page = BSHelper(html)
source_page.get_bs_object
dkultasev marked this conversation as resolved.
Show resolved Hide resolved
mocked_soup.assert_called_with(html, 'html.parser')

@patch('smarsy.bs_helper.BeautifulSoup', side_effect=TypeError)
def test_get_bs_object_return_false_with_unexpected_html(
self, mocked_soup):
source_page = BSHelper(12345)
self.assertFalse(source_page.get_bs_object)


class TestBsSafeSelect(unittest.TestCase):
@patch('smarsy.bs_helper.BeautifulSoup')
def setUp(self, mocked_soup):
self.source_page = BSHelper('some html')
self.mocked_soup = mocked_soup
self.mocked_soup.select_one.return_value = 'some text'
self.selector = 'some_tag'

def test_bs_safe_select_return_expected_text_with_single_selector(self):
actual = self.source_page.bs_safe_select(self.mocked_soup,
self.selector)
self.assertEqual(actual, 'some text')
dkultasev marked this conversation as resolved.
Show resolved Hide resolved

def test_bs_safe_select_return_expected_text_with_many_selectors(self):
selector1, selector2, selector3 = 'some_tag1', 'some_tag2', 'some_tag3'
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this test. It works with any selector values and with any quantity of them. What's the purpose of it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure what was done for this one

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I refactored this test

actual = self.source_page.bs_safe_select(self.mocked_soup, selector1,
selector2, selector3)
self.assertEqual(actual, 'some text')
dkultasev marked this conversation as resolved.
Show resolved Hide resolved

def test_bs_safe_select_return_false_when_selectedElems_is_empty(
dkultasev marked this conversation as resolved.
Show resolved Hide resolved
self):
self.mocked_soup.select_one.return_value = ''
self.assertFalse(self.source_page.bs_safe_select(self.mocked_soup,
self.selector))


class TestBsSafeget(unittest.TestCase):
@patch('smarsy.bs_helper.BeautifulSoup')
def setUp(self, mocked_soup):
self.source_page = BSHelper('some html')
self.mocked_soup = mocked_soup

def test_bs_get_called_with_expected_html_and_attribute(self):
expected_attribute = 'some attribute'
self.source_page.bs_safe_get(self.mocked_soup, expected_attribute)
self.mocked_soup.get.assert_called_with(expected_attribute)

def test_bs_safe_get_return_false_when_element_is_empty(
self):
self.mocked_soup.get.return_value = ''
self.assertFalse(self.source_page.bs_safe_get(self.mocked_soup,
'some attribute'))

def test_bs_safe_get_return_expected_text(self):
self.mocked_soup.get.return_value = 'some text'
dkultasev marked this conversation as resolved.
Show resolved Hide resolved
actual = self.source_page.bs_safe_get(self.mocked_soup,
'some attribute')
dkultasev marked this conversation as resolved.
Show resolved Hide resolved
self.assertEqual(actual, 'some text')
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as you are testing the fact that you are getting the output of the get function, then you should use self.mocked_soup.get.return_value instead of hardcoding some text in the assert

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not done

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you saw latest commit?
def test_bs_safe_get_return_expected_text(self):
actual = self.source_page.bs_safe_get(self.mocked_soup, self.expected_attribute)
self.assertEqual(actual, self.expected_text)