Skip to content

Data Feeds YML

Veronica Valeros edited this page Jul 29, 2023 · 3 revisions

Collectress consumes feed from a YML file with three keys: the feed name, the feed organisation and the feed URL. These three keys are combined to form the output file name: YYYY_MM_DD_organisation_feedname.gz. All feeds are Gzip compressed on download.

Example data_feeds.yml

The following is an example of a data_feeds.yml that contains the daily blocklists generated by the Stratosphere Laboratory:

feeds:
  - name: AIP-Alpha-latest.csv
    org: stratosphere
    url: https://mcfp.felk.cvut.cz/publicDatasets/CTU-AIPP-BlackList/Todays-Blacklists/AIP-Alpha-latest.csv
  - name: AIP-Alpha7-latest.csv
    org: stratosphere
    url: https://mcfp.felk.cvut.cz/publicDatasets/CTU-AIPP-BlackList/Todays-Blacklists/AIP-Alpha7-latest.csv
  - name: AIP-Prioritize_Consistent-latest.csv
    org: stratosphere
    url: https://mcfp.felk.cvut.cz/publicDatasets/CTU-AIPP-BlackList/Todays-Blacklists/AIP-Prioritize_Consistent-latest.csv
  - name: AIP-Prioritize_New-latest.csv
    org: stratosphere
    url: https://mcfp.felk.cvut.cz/publicDatasets/CTU-AIPP-BlackList/Todays-Blacklists/AIP-Prioritize_New-latest.csv
  - name: AIP_blacklist_for_IPs_seen_last_24_hours.csv
    org: stratosphere
    url: https://mcfp.felk.cvut.cz/publicDatasets/CTU-AIPP-BlackList/Todays-Blacklists/AIP_blacklist_for_IPs_seen_last_24_hours.csv
  - name: AIP_historical_blacklist_prioritized_by_newest_attackers.csv
    org: stratosphere
    url: https://mcfp.felk.cvut.cz/publicDatasets/CTU-AIPP-BlackList/Todays-Blacklists/AIP_historical_blacklist_prioritized_by_newest_attackers.csv
  - name: AIP_historical_blacklist_prioritized_by_repeated_attackers.csv
    org: stratosphere
    url: https://mcfp.felk.cvut.cz/publicDatasets/CTU-AIPP-BlackList/Todays-Blacklists/AIP_historical_blacklist_prioritized_by_repeated_attackers.csv

Example feeds output

An example of the output generated by this input data_feeds.yml is shown below:

└── 2023
    └── 07
        └── 29
            ├── 2023_07_29_stratosphere_AIP-Alpha7-latest.csv.txt.gz
            ├── 2023_07_29_stratosphere_AIP-Alpha-latest.csv.txt.gz
            ├── 2023_07_29_stratosphere_AIP_blacklist_for_IPs_seen_last_24_hours.csv.txt.gz
            ├── 2023_07_29_stratosphere_AIP_historical_blacklist_prioritized_by_newest_attackers.csv.txt.gz
            ├── 2023_07_29_stratosphere_AIP_historical_blacklist_prioritized_by_repeated_attackers.csv.txt.gz
            ├── 2023_07_29_stratosphere_AIP-Prioritize_Consistent-latest.csv.txt.gz
            └── 2023_07_29_stratosphere_AIP-Prioritize_New-latest.csv.txt.gz

Example log.json output

An example of the log.json when the feeds download are successful:

{
  "message": "2023-07-29 collectress download summary",
  "timestamp": "2023-07-29T11:24:04.920035",
  "total_feeds_processed": 7,
  "total_feeds_not_modified": 0,
  "total_feeds_success": 7,
  "total_feeds_failed": 0,
  "feeds_not_modified": [],
  "feeds_successful": [
    "AIP-Alpha-latest.csv",
    "AIP-Alpha7-latest.csv",
    "AIP-Prioritize_Consistent-latest.csv",
    "AIP-Prioritize_New-latest.csv",
    "AIP_blacklist_for_IPs_seen_last_24_hours.csv",
    "AIP_historical_blacklist_prioritized_by_newest_attackers.csv",
    "AIP_historical_blacklist_prioritized_by_repeated_attackers.csv"
  ],
  "feeds_failed": [],
  "total_data_downloaded_bytes": 4359904,
  "total_runtime_seconds": 1.011902093887329,
  "success_rate": 100,
  "error_rate": 0
}

An example of the log.json when the feeds were found to be modified hence the download did not happen:

{
  "message": "2023-07-29 collectress download summary",
  "timestamp": "2023-07-29T11:25:03.475469",
  "total_feeds_processed": 7,
  "total_feeds_not_modified": 7,
  "total_feeds_success": 0,
  "total_feeds_failed": 0,
  "feeds_not_modified": [
    "AIP-Alpha-latest.csv",
    "AIP-Alpha7-latest.csv",
    "AIP-Prioritize_Consistent-latest.csv",
    "AIP-Prioritize_New-latest.csv",
    "AIP_blacklist_for_IPs_seen_last_24_hours.csv",
    "AIP_historical_blacklist_prioritized_by_newest_attackers.csv",
    "AIP_historical_blacklist_prioritized_by_repeated_attackers.csv"
  ],
  "feeds_successful": [],
  "feeds_failed": [],
  "total_data_downloaded_bytes": 0,
  "total_runtime_seconds": 0.2106630802154541,
  "success_rate": 100,
  "error_rate": 0
}

Example etag_cache.json output

An example of the etag_cache.json file that stores the eTags of the files that have been downloaded:

{
  "https://mcfp.felk.cvut.cz/publicDatasets/CTU-AIPP-BlackList/Todays-Blacklists/AIP-Alpha-latest.csv": {
    "etag": "\"336a6-60193d8818b05\"",
    "feed_name": "AIP-Alpha-latest.csv",
    "feed_organization": "stratosphere",
    "download_date": "2023-07-29T11:24:03.969029"
  },
  "https://mcfp.felk.cvut.cz/publicDatasets/CTU-AIPP-BlackList/Todays-Blacklists/AIP-Alpha7-latest.csv": {
    "etag": "\"cf288-60193d89d6fbf\"",
    "feed_name": "AIP-Alpha7-latest.csv",
    "feed_organization": "stratosphere",
    "download_date": "2023-07-29T11:24:04.053091"
  },
  "https://mcfp.felk.cvut.cz/publicDatasets/CTU-AIPP-BlackList/Todays-Blacklists/AIP-Prioritize_Consistent-latest.csv": {
    "etag": "\"fef2b-60193d8ccdc8c\"",
    "feed_name": "AIP-Prioritize_Consistent-latest.csv",
    "feed_organization": "stratosphere",
    "download_date": "2023-07-29T11:24:04.280724"
  },
  "https://mcfp.felk.cvut.cz/publicDatasets/CTU-AIPP-BlackList/Todays-Blacklists/AIP-Prioritize_New-latest.csv": {
    "etag": "\"7a45b-60193d8b915fa\"",
    "feed_name": "AIP-Prioritize_New-latest.csv",
    "feed_organization": "stratosphere",
    "download_date": "2023-07-29T11:24:04.472105"
  },
  "https://mcfp.felk.cvut.cz/publicDatasets/CTU-AIPP-BlackList/Todays-Blacklists/AIP_blacklist_for_IPs_seen_last_24_hours.csv": {
    "etag": "\"336a6-60193d8818b05\"",
    "feed_name": "AIP_blacklist_for_IPs_seen_last_24_hours.csv",
    "feed_organization": "stratosphere",
    "download_date": "2023-07-29T11:24:04.573827"
  },
  "https://mcfp.felk.cvut.cz/publicDatasets/CTU-AIPP-BlackList/Todays-Blacklists/AIP_historical_blacklist_prioritized_by_newest_attackers.csv": {
    "etag": "\"7a45b-60193d8b915fa\"",
    "feed_name": "AIP_historical_blacklist_prioritized_by_newest_attackers.csv",
    "feed_organization": "stratosphere",
    "download_date": "2023-07-29T11:24:04.650502"
  },
  "https://mcfp.felk.cvut.cz/publicDatasets/CTU-AIPP-BlackList/Todays-Blacklists/AIP_historical_blacklist_prioritized_by_repeated_attackers.csv": {
    "etag": "\"fef2b-60193d8ccdc8c\"",
    "feed_name": "AIP_historical_blacklist_prioritized_by_repeated_attackers.csv",
    "feed_organization": "stratosphere",
    "download_date": "2023-07-29T11:24:04.775467"
  }
}