Extra forms parsing #29

gpolydatas · 2024-12-22T23:57:11Z

3,4 forms being parsed with changes being pushed to the form parser and the template added in the json.

eloukas · 2024-12-24T12:23:59Z

Couple things to address first:

Right now, both extract_items.py and item_lists.py have the item lists for the new forms. Please keep them only at item_lists.py
The extract_items.py does not have any other change inside right now in the PR.
Also, at a later stage, for your contribution to be merged, we will need have some output pictures of the code working, description of the changes, plus some unit tests for the code. Let's focus on the first two (2) now in order not to be overwhelmed.

You can see the file differences (old vs. new version) yourself at https://github.com/nlpaueb/edgar-crawler/pull/29/files

gpolydatas · 2024-12-24T13:41:11Z

not really sure what's happening here:
The code is substantially more than those 800 rows appearing in your repo in your main branch

The code definitely has additions as those 2 functions did not exist before

the items listed also does not exist anywhere else apart from being imported on top

This is where i see those changes/addtions :
https://github.com/gpolydatas/edgar-crawler/blob/extra_forms_parsing/extract_items.py

eloukas · 2024-12-28T17:14:18Z

@gpolydatas I can now see some changes, nice!

Could you please give us some screenshots & markdown on the expected outcome of the code when run?
From one quick skim on the code, I see that you do the extraction based on the XML/XBRL tags of the filings? If that's correct, then what years have you tested on the filings? I am asking because, if I remember correctly, XBRL filings appeared after 2010, and inline xbrl filing started appearing (mainly) after 2018. If so, sadly, this xml-based extraction will not work for previous filings. And, as you might see, parsing for other items (10k,8k,10q) works based on regular expressions on strings and this actually works for modern filings (even if they have inline xbrl) or older filings (pure .txt files).
Based on the previous, could you also describe in a high-level what chat changes did you do in the code etc.? This helps open-source software maintainers have a better sense of what you did vs. actually reading in detail the code of each PR.

gpolydatas added 5 commits December 7, 2024 12:40

added parsing of forms 3,4,SC13G,SC13D/A

eaad4dd

removing personal ingo

fd12dab

restoring accidentally deleted previous functionality

cf0e884

removing unnecessary comment

b5ce500

form4 fixes

3ff5bea

gpolydatas added 2 commits December 24, 2024 13:18

test push

7b23c4e

Merge branch 'main' into extra_forms_parsing

cac9bd4

Update extract_items.py

b2f4189

Provide feedback