Skip to content

Configuration File

movabo edited this page Oct 21, 2016 · 1 revision

config.py

This is the basic configuration module. It contains two singleton-classes:

  • CrawlerConfig
  • JsonConfig

Because both of them are singleton-classes, both have "special" initialisations. All getters of the classes return deepcopies of the objects.

CrawlerConfig parses a normal cfg-file with "Sections" and "Options". JsonConfig parses a special JSON-File with the following format:

{
  "base_urls" : {
    "url": "http://examp.le"
  }
}

CrawlerConfig

Usage

Import it as early as possible:

from config import CrawlerConfig

First instanciation: The class must only be instanciated once. So it has to be instanciated at the beginning of the program itself. Afterwards this one step is not neccessary anymore and will result in a warning.

cfg = CrawlerConfig.get_instance()
cfg.setup(<FILEPATH>)

Further usage (in any file that is called after the first instanciation):

cfg = CrawlerConfig.get_instance()

Methods

get_instance()

Get the instance of the config-class. This is a singleton-class so CrawlerConfig.get_instance() is the right way to instanciate this class.

setup(filepath)

The basic setup of the config file: Reading the file and parsing it to the intern object.

config()

Get a deep-copy of the config-form. Returns 2-dimensional dict.

config = cfg.config()

config[<section>][<option>] = <value>

section(section)

Gets a copy of a section. Returns a 1-dimensional dict.

section = cfg.section(<section>)
section[<option>] = <value>

set_section(section)

Sets the current section to get options out of it.

option(option)

Requires set_section to be called before.

cfg.set_section(<section>)
option = cfg.option(<option>)

# option == <value>

JsonConfig

Usage

Import it as early as possible:

from config import JsonConfig

First instanciation: The class must only be instanciated once. So it has to be instanciated at the beginning of the program itself. Afterwards this one step is not neccessary anymore and will result in a warning.

json = JsonConfig.get_instance()
json.setup(<FILEPATH>)

Further usage (in any file that is called after the first instanciation):

json = JsonConfig.get_instance()

Methods

get_instance()

Get the instance of the json-config-class. This is a singleton-class so JsonConfig.get_instance() is the right way to instanciate this class.

setup(filepath)

The basic setup of the json file: Reading the file and parsing it to the intern object.

config()

Get a deep-copy of the whole parsed json-config-file.

json_config = json.config()

load_json(filepath)

Load the JSON-file located at filepath. Should normally not be used. Only for switching inbetween the files. It overwrites all values.

json.load_json("../test.json");

get_url_array()

Get all urls mentioned in the "base_url > url"-section of the file as an array. Returns them as a list.

print(json.get_url_array())

# Prints something like [u"http://examp.le", u"http://te.st"]
  • [I am a developer](I am a developer)
  • [I am a user](I am a user)

Setup

Crawlers / Spiders

System design

Further Documentation

Clone this wiki locally