SEC EDGAR Parser based on Python 3
This is a tool intended to parse XBRL files from SEC. Thus, the focus is to parse XBRL XML files so that data is more easily accessible. The idea is to provide a tool for you to code you want instead of a tool that implements a workflow but is rigid.
In addition, it's not intended to be a tool to scrap SEC EDGAR as it varies a lot as to how you want to do the scrapping and it's relatively easier. (though it can be added later if you want)
The repository is originally forked from https://github.com/tooksoi/ScraXBRL, but I soon find out that we have very different approaches and objectives, so soon afterwards the code in the 2 repositories are completely different and nothing is taken from ScraXBRL.
- Parsing of the main XBRL XML file to extract data
- Identify the main XBRL file within its XBRL package
Current verion: v0.2
Dependencies: in the requirements.txt
file, currently only the lxml
library
Installation:
pip install py-sec-xbrl
- get some XBRL XML files (see documentation if you don't have one yet)
- see
test-parse.py
, modify the path to the XML file and it's really easy
More detailed documentation can be found here: doc
2 priorities for the moment:
- Better formatting of extracted XBRL data
- More advanced extracting functionalities (notably on the segments & calculations)