Merge pull request #168 from chezou/black

Introduce black, isort, nox
chezou · Jul 27, 2019 · d6c65d3 · d6c65d3
2 parents a58b552 + 0c35203
commit d6c65d3
Show file tree

Hide file tree

Showing 17 changed files with 509 additions and 375 deletions.
diff --git a/.gitignore b/.gitignore
@@ -35,6 +35,7 @@ pip-delete-this-directory.txt
 # Unit test / coverage reports
 htmlcov/
 .tox/
+.nox/
 .coverage
 .coverage.*
 .cache

diff --git a/.travis.yml b/.travis.yml
@@ -7,11 +7,10 @@ python:
 before_install:
 - pip install --upgrade setuptools
 install:
-- pip install tox
-- pip install tox-travis
-- pip install coverage coveralls
+- pip install nox
+- pip install .
 script:
-- tox -r
+- nox
 deploy:
   provider: pypi
   user: chezou

diff --git a/README.md b/README.md
@@ -26,14 +26,21 @@ I confirmed working on macOS and Ubuntu. But some people confirm it works on Win
 
 ## Install
 
-```
+```bash
 pip install tabula-py
 ```
 
-If you want to become a contributor, you can install dependency for development of tabula-py as follows:
+If you want to become a contributor, you can install dependency after cloning the repo as follows:
 
+```bash
+pip install -e .[dev, test]
+pip install nox
 ```
-pip install -r requirements.txt -c constraints.txt
+
+For running text and liter, run nox command.
+
+```bash
+nox .
 ```
 
 ## Example
@@ -78,22 +85,23 @@ This instruction is originally written by [@lahoffm](https://github.com/lahoffm)
   - Example: 1, '1-2,3', 'all' or [1,2]. Default is 1
 - guess (bool, optional):
   - Guess the portion of the page to analyze per page. Default `True`
+  - Note that as of tabula-java 1.0.3, guess option becomes independent from lattice and stream option, you can use guess and lattice/stream option at the same time.
 - area (`list` of `float`, optional):
   - Portion of the page to analyze(top,left,bottom,right).
-  - Example: [269.875, 12.75, 790.5, 561]  or [[12.1,20.5,30.1,50.2],[1.0,3.2,10.5,40.2]]. Default is entire page
+  - Example: `[269.875, 12.75, 790.5, 561]`  or `[[12.1,20.5,30.1,50.2],[1.0,3.2,10.5,40.2]]`. Default is entire page
 - relative_area (bool, optional):
   - If all area values are between 0-100 (inclusive) and preceded by '%', input will be taken as % of actual height or width of the page. Default `False`.
 - lattice (bool, optional):
-  - [`spreadsheet` option is deprecated] Force PDF to be extracted using lattice-mode extraction (if there are ruling lines separating each cell, as in a PDF of an Excel spreadsheet).
+  - (`spreadsheet` option is deprecated) Force PDF to be extracted using lattice-mode extraction (if there are ruling lines separating each cell, as in a PDF of an Excel spreadsheet).
 - stream (bool, optional):
-  - [`nospreadsheet` option is deprecated] Force PDF to be extracted using stream-mode extraction (if there are no ruling lines separating each cell, as in a PDF of an Excel spreadsheet)
+  - (`nospreadsheet` option is deprecated) Force PDF to be extracted using stream-mode extraction (if there are no ruling lines separating each cell, as in a PDF of an Excel spreadsheet)
 - password (bool, optional):
   - Password to decrypt document. Default is empty
 - silent (bool, optional):
   - Suppress all stderr output.
 - columns (list, optional):
   - X coordinates of column boundaries.
-  - Example: [10.1, 20.2, 30.3]
+  - Example: `[10.1, 20.2, 30.3]`
 - output_format (str, optional):
   - Format for output file or extracted object.
   - For `read_pdf()`: `json`, `dataframe`
@@ -106,7 +114,7 @@ This instruction is originally written by [@lahoffm](https://github.com/lahoffm)
 - pandas_options (`dict`, optional):
   - Set pandas options like `{'header': None}`.
 - multiple_tables (bool, optional):
-  - (Experimental) Extract multiple tables.  If used with multiple pages (e.g. `pages='all'`) will extract separate tables from each page.
+  - Extract multiple tables.  If used with multiple pages (e.g. `pages='all'`) will extract separate tables from each page.
   - This option uses JSON as an intermediate format, so if tabula-java output format will change, this option doesn't work.
 - user_agent (str, optional)
   - Set a custom user-agent when download a pdf from a url. Otherwise it uses the default urllib.request user-agent
@@ -124,7 +132,7 @@ You can check whether tabula-py can call `java` from Python process with `tabula
 
 If you've installed `tabula`, it will be conflict the namespace. You should install `tabula-py` after removing `tabula`.
 
-```
+```bash
 pip uninstall tabula
 pip install tabula-py
 ```
@@ -137,15 +145,15 @@ pip install tabula-py
 
 Yes. You can use `options` argument as following. The format is same as cli of tabula-java.
 
-```py
+```python
 read_pdf(file_path, options="--columns 10.1,20.2,30.3")
 ```
 
 ### How can I ignore useless area?
 
 In short, you can extract with `area` and `spreadsheet` option.
 
-```py
+```python
 In [4]: tabula.read_pdf('./table.pdf', spreadsheet=True, area=(337.29, 226.49, 472.85, 384.91))
 Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8
 Out[4]:
@@ -161,7 +169,7 @@ Out[4]:
 8          F    E   E4    R    4
 ```
 
-*How to use `area` option*
+#### How to use `area` option
 
 According to tabula-java wiki, there is a explain how to specify the area:
 https://github.com/tabulapdf/tabula-java/wiki/Using-the-command-line-tabula-extractor-tool#grab-coordinates-of-the-table-you-want
@@ -171,14 +179,14 @@ For example, using macOS's preview, I got area information of this [PDF](https:/
 ![image](https://cloud.githubusercontent.com/assets/916653/22047470/b201de24-dd6a-11e6-9cfc-7bc73e33e3b2.png)
 
 
-```
+```bash
 java -jar ./target/tabula-1.0.1-jar-with-dependencies.jar -p all -a $y1,$x1,$y2,$x2 -o $csvfile $filename
 ```
 
 given
 
-```
-Note the left, top, height, and width parameters and calculate the following:
+```python
+# Note the left, top, height, and width parameters and calculate the following:
 
 y1 = top
 x1 = left
@@ -188,7 +196,7 @@ x2 = left + width
 
 I confirmed with tabula-java:
 
-```
+```bash
 java -jar ./tabula/tabula-1.0.1-jar-with-dependencies.jar -a "337.29,226.49,472.85,384.91" table.pdf
 ```
 
@@ -263,6 +271,10 @@ You can help by:
 - [@CurtLH](https://github.com/CurtLH)
 - [@nikhilgk](https://github.com/nikhilgk)
 - [@krassowski](https://github.com/krassowski)
+- [@alexandreio](https://github.com/alexandreio)
+- [@rmnevesLH](https://github.com/rmnevesLH)
+- [@red-bin](https://github.com/red-bin)
+- [@Gallaecio](https://github.com/Gallaecio)
 
 ### Another support
 

diff --git a/constraints.txt b/constraints.txt
@@ -5,6 +5,7 @@ attrs==19.1.0
 backcall==0.1.0
 black==19.3b0
 Click==7.0
+colorlog==3.2.0
 decorator==4.4.0
 distro==1.4.0
 entrypoints==0.3
@@ -17,6 +18,7 @@ isort==4.3.21
 jedi==0.14.1
 mccabe==0.6.1
 more-itertools==7.2.0
+nox==2019.5.30
 numpy==1.17.0
 packaging==19.0
 pandas==0.25.0

diff --git a/noxfile.py b/noxfile.py
@@ -0,0 +1,17 @@
+import nox
+
+
+@nox.session
+def lint(session):
+    lint_tools = ["black", "isort", "flake8"]
+    targets = ["tabula", "tests", "noxfile.py"]
+    session.install(*lint_tools)
+    session.run("flake8", *targets)
+    session.run("black", "--diff", "--check", *targets)
+    session.run("isort", "--check-only")
+
+
+@nox.session
+def tests(session):
+    session.install(".[test]")
+    session.run("pytest", "-v")
diff --git a/setup.cfg b/setup.cfg
@@ -1,6 +1,15 @@
-[wheel]
-universal = 1
-
 [flake8]
-ignore = F401
-max-line-length = 200
+ignore = E203, W503
+max-line-length = 88
+exclude =
+    .git,
+    __pycache__,
+    build,
+    dist,
+    .venv,
+    tabula/__init__.py
+
+[isort]
+line_length=88
+multi_line_output=3
+include_trailing_comma=True
diff --git a/setup.py b/setup.py
@@ -1,54 +1,47 @@
-from setuptools import setup
-from setuptools import find_packages
 import os
 
+from setuptools import find_packages, setup
+
 
 def read_file(filename):
-    filepath = os.path.join(
-        os.path.dirname(os.path.dirname(__file__)), filename)
+    filepath = os.path.join(os.path.dirname(os.path.dirname(__file__)), filename)
     if os.path.exists(filepath):
         return open(filepath).read()
     else:
-        return ''
+        return ""
 
 
 about = {}
-with open(os.path.join(os.path.dirname(__file__), 'tabula', '__version__.py')) as f:
+with open(os.path.join(os.path.dirname(__file__), "tabula", "__version__.py")) as f:
     exec(f.read(), about)
 
-with open(os.path.join(os.path.dirname(__file__), 'README.md')) as f:
-    about['__long_description__'] = f.read()
+with open(os.path.join(os.path.dirname(__file__), "README.md")) as f:
+    about["__long_description__"] = f.read()
 
 
 setup(
-    name=about['__title__'],
-    version=about['__version__'],
-    description=about['__description__'],
-    long_description=about['__long_description__'],
+    name=about["__title__"],
+    version=about["__version__"],
+    description=about["__description__"],
+    long_description=about["__long_description__"],
     long_description_content_type="text/markdown",
-    author=about['__author__'],
-    author_email=about['__author_email__'],
-    maintainer=about['__maintainer__'],
-    maintainer_email=about['__maintainer_email__'],
-    license=about['__license__'],
-    url=about['__url__'],
+    author=about["__author__"],
+    author_email=about["__author_email__"],
+    maintainer=about["__maintainer__"],
+    maintainer_email=about["__maintainer_email__"],
+    license=about["__license__"],
+    url=about["__url__"],
     classifiers=[
-        'Development Status :: 4 - Beta',
-        'Topic :: Text Processing :: General',
-        'License :: OSI Approved :: MIT License',
-        'Programming Language :: Python :: 3.7',
-        'Programming Language :: Python :: 3.6',
-        'Programming Language :: Python :: 3.5',
+        "Development Status :: 4 - Beta",
+        "Topic :: Text Processing :: General",
+        "License :: OSI Approved :: MIT License",
+        "Programming Language :: Python :: 3.7",
+        "Programming Language :: Python :: 3.6",
+        "Programming Language :: Python :: 3.5",
     ],
     include_package_data=True,
     packages=find_packages(),
-    keywords=['data frame', 'pdf', 'table'],
-    install_requires=[
-        'pandas',
-        'numpy',
-        'distro',
-    ],
-    extras_require={
-        'dev': ['pytest', 'flake8', 'black', 'isort']
-    },
+    keywords=["data frame", "pdf", "table"],
+    install_requires=["pandas", "numpy", "distro"],
+    extras_require={"dev": ["pytest", "flake8", "black", "isort"], "test": ["pytest"]},
 )
diff --git a/tabula/__init__.py b/tabula/__init__.py
@@ -1,6 +1,8 @@
-from .wrapper import read_pdf
-from .wrapper import read_pdf_with_template
-from .wrapper import convert_into
-from .wrapper import convert_into_by_batch
-from .util import environment_info
 from .__version__ import __version__
+from .util import environment_info
+from .wrapper import (
+    convert_into,
+    convert_into_by_batch,
+    read_pdf,
+    read_pdf_with_template,
+)
diff --git a/tabula/__version__.py b/tabula/__version__.py
@@ -1,9 +1,9 @@
-__title__ = 'tabula-py'
-__version__ = '1.3.1'
-__license__ = 'MIT License'
-__description__ = 'Simple wrapper for tabula-java, read tables from PDF into DataFrame'
-__author__ = 'Aki Ariga'
-__author_email__ = 'chezou@gmail.com'
-__maintainer__ = 'Aki Ariga'
-__maintainer_email__ = 'chezou@gmail.com'
-__url__ = 'https://github.com/chezou/tabula-py'
+__title__ = "tabula-py"
+__version__ = "1.3.1"
+__license__ = "MIT License"
+__description__ = "Simple wrapper for tabula-java, read tables from PDF into DataFrame"
+__author__ = "Aki Ariga"
+__author_email__ = "chezou@gmail.com"
+__maintainer__ = "Aki Ariga"
+__maintainer_email__ = "chezou@gmail.com"
+__url__ = "https://github.com/chezou/tabula-py"
diff --git a/tabula/errors/__init__.py b/tabula/errors/__init__.py
@@ -3,7 +3,7 @@
 
 class CSVParseError(ParserError):
     def __init__(self, message, cause):
-        super(CSVParseError, self).__init__(message + ', caused by ' + repr(cause))
+        super(CSVParseError, self).__init__(message + ", caused by " + repr(cause))
         self.cause = cause