FRDocs

An interface for collecting and parsing Federal Register documents

Installation

Clone the code from github.

The module requires a file named config.py in the root project directory. The config file must define a variable named data_dir which points to the root directory where the data will be saved.

Data collection

The module collects data from two sources:

data_collection/download_meta.py downloads metadata describing Federal Register documents from the federalregister.gov API. Raw metadata is saved in annual zipped json files in data_dir/meta.
data_collection/download_xml.py downloads the text of daily Federal Register documents from the govinfo.gov. Raw XML files are saved in data_dir/xml.

dataCollection/compile_parsed.py builds parsed versions of the documents, where the XML is converted into Pandas data tables. These files are saved as pickled dataframes in data_dir/parsed. Files are named by document number, which must be extracted from the XML itself (and occasionally contains errors). The XML files sometimes contain duplicate printings of the same document, but each document only appears once in the parsed directory.

The complete dataset can be downloaded from scratch or updated to the latest available data by running update.py.

The complete dataset is approximately 20GB in size.

Loading data

frdocs.load_info_df() loads all document metadata as a single dataframe
frdocs.iter_parsed() iteratively loads available parsed documents
frdocs.load_parsed() loads a single parsed document

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
download		download
preprocessing		preprocessing
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
update.py		update.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FRDocs

Installation

Data collection

Loading data

About

Releases

Packages

Languages

License

bradhackinen/frdocs

Folders and files

Latest commit

History

Repository files navigation

FRDocs

Installation

Data collection

Loading data

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages