-
Notifications
You must be signed in to change notification settings - Fork 1
Configuration File
This is the basic configuration module. It contains two singleton-classes:
- CrawlerConfig
- JsonConfig
Because both of them are singleton-classes, both have "special" initialisations. All getters of the classes return deepcopies of the objects.
CrawlerConfig parses a normal cfg-file with "Sections" and "Options". JsonConfig parses a special JSON-File with the following format:
{
"base_urls" : {
"url": "http://examp.le"
}
}
Import it as early as possible:
from config import CrawlerConfig
First instanciation: The class must only be instanciated once. So it has to be instanciated at the beginning of the program itself. Afterwards this one step is not neccessary anymore and will result in a warning.
cfg = CrawlerConfig.get_instance()
cfg.setup(<FILEPATH>)
Further usage (in any file that is called after the first instanciation):
cfg = CrawlerConfig.get_instance()
Get the instance of the config-class.
This is a singleton-class so CrawlerConfig.get_instance()
is the right way to instanciate this class.
The basic setup of the config file: Reading the file and parsing it to the intern object.
Get a deep-copy of the config-form. Returns 2-dimensional dict.
config = cfg.config()
config[<section>][<option>] = <value>
Gets a copy of a section. Returns a 1-dimensional dict.
section = cfg.section(<section>)
section[<option>] = <value>
Sets the current section to get options out of it.
Requires set_section to be called before.
cfg.set_section(<section>)
option = cfg.option(<option>)
# option == <value>
Import it as early as possible:
from config import JsonConfig
First instanciation: The class must only be instanciated once. So it has to be instanciated at the beginning of the program itself. Afterwards this one step is not neccessary anymore and will result in a warning.
json = JsonConfig.get_instance()
json.setup(<FILEPATH>)
Further usage (in any file that is called after the first instanciation):
json = JsonConfig.get_instance()
Get the instance of the json-config-class.
This is a singleton-class so JsonConfig.get_instance()
is the right way to instanciate this class.
The basic setup of the json file: Reading the file and parsing it to the intern object.
Get a deep-copy of the whole parsed json-config-file.
json_config = json.config()
Load the JSON-file located at filepath. Should normally not be used. Only for switching inbetween the files. It overwrites all values.
json.load_json("../test.json");
Get all urls mentioned in the "base_url > url"-section of the file as an array. Returns them as a list.
print(json.get_url_array())
# Prints something like [u"http://examp.le", u"http://te.st"]
- [I am a developer](I am a developer)
- [I am a user](I am a user)
- [Database System](Database System)
- Logging
- Output
- Troubleshooting
- Use-cases
- Anti-crawling Issues
- Bottlenecks
- [Demo Crawls](Demo Crawls)
- IDE
- [RSS-Feed Decision](RSS-Feed Decision)
- [Thinking Process](Thinking Process)