Skip to content

mek32390/parser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 

Repository files navigation

parser

Parser is a python3 script that parses information from html files into csv files. It reads a default config.ini file to initialize its starting settings. From there, command line flags or calls to the public apply_config function.

Below is a description of the script usage:

Flags: -v toggles verbose mode. Can be placed anywhere in the command line arguments.

-l [file.txt] sets a logger file to store verbose messages.
   Can be placed anywhere in the command line arguments.
    
-c [directory] Sets parser to dump csv files
   into the specified directory. Can be placed anywhere
   in the command line argument.
    
-d [directories[]] Sets parser to search for html files
   in the specified directories. Must be placed before and
   arguments.

Command Line Usage: python3 parser.py [arguments[.html|.csv] Arguments and can be given in any order. Data parsed form html files will be written into the next given csv file, or one will created based on the config file.

External Application Usage: Call parser.Parser(config) constructor. Should be able to have any number of parser instances at one time, however this has not been stress tested. Parser public methods: close_parser: Cleans the parser environment set_logger: Sets the parser's logger apply_config: Applies a new configuration file to the parser parse_files: Parses the html files given parse_and_write: Parses and html and writes it to a csv file set_csv_dump: sets the directory that parser will place csv files toggle_verbose: Toggles verbose mode get_config: Returns an instance of ConfigParser that represents the current state of the Parser Public functions: default_config: Gets a copy of the default and working config object get_entries: Returns a list of <keyword, value> dictionaries for the given parsing package get_parsingPackage: Returns a parsing package that can me used to parse an html file

Config.ini file: The parser sets its default settings based on the given config.ini file. When using a .ini to initialize the parser, if certain values are not set the parser will not allow the user to continue.

Required Fields:
    CSV_DUMP in  [VALUES]
        Path to the directory where created csv files should be placed
    FILENAME_BASE in [VALUES]
        The parsed keyword to base the name of the csv file on.
    parsed_keywords in [PARSED_KEYWORDS] 
        Determined the keywords to be parsed
    written_keywords in [WRITTEN_KEYWORDS]
        Determines the keywords and their order to be written to the csv files
    [ENTRY_RESTRICTIONS] must be defined
        Determines any restrictions that should be made when adding a row to a
        csv file. A key with a "~" will restrict an row from having the listed 
        values. A key without it will restrict rows without any of the listed
        values.
        Format:
            <keyword>:  <allowed val>
                        <allowed val>
            ~<keyword>: <disallowed val>
                        <disallowed val>
                        
 Optional fields:
    VERBOSE in [VALUES]
        Sets the default status of verbose mode.
    LOGGER in [VALUES]
        Sets the name of the verbosity logger. A value
        of "~None" will disable the logger.
    ERROR_LOG in [VALUES]
        Not yet implemented.

Known Issues: - Parser logging functionality is limited. - If findSchema functions in fileParser.py finds two schemas in schemas.py that have the same number of "hits", it is impossible to predict which schema it will use to parse the html file.

Other Files: fileParser Usage: - fileParses uses an instance of a ParsingPackage object to parse and html file. -Public fileParser Functions: -make_parsingPackage: return an instance of ParsingPackage that can be used to parse a file -can_parse: Returns true if an html file can be parsed -parse: Returns a collection on <Keyword, value> dictionaries from a ParsingPackage instance. schemas Usage: - There are no public schemas.py functions. - To create a new Schema: - add a function with a name that starts with SchemaDefinition_ to the schemas.py module. - add variable to store a schema in the beginning.
- If you want to inherit functionality from another schema, call the SchemaDefinition_ for that schema. - Define a function called _ for any keyword that you want this schema to be able to parse. definitions in sub-schemas will overwrite definitions in inherited super-schemas. - Set your schema variable to the value returned by calling the function that is returned by the getSchemaCreator function. -Note that getSchemaCreator takes the value returned by a call to locals() in the SchemaDefinition as a parameter. - return your schema variable. - All schemas must have a definition for _contents and _validation. - This definition can be inherited by another schema.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages