diff --git a/.n6-version b/.n6-version index 1454f6e..fdc6698 100644 --- a/.n6-version +++ b/.n6-version @@ -1 +1 @@ -4.0.1 +4.4.0 diff --git a/CHANGELOG.md b/CHANGELOG.md index 030de68..c3f6949 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,11 +1,257 @@ # Changelog -Starting with the 4.0.0 release, all notable changes to the -[code of _n6_](https://github.com/CERT-Polska/n6) are continuously -documented here. +*Note: some features of this document's layout were inspired by +[Keep a Changelog](https://keepachangelog.com/).* -Significant features of this document's format are inspired by -[Keep a Changelog](https://keepachangelog.com/). + +## [4.4.0] - 2023-11-23 + +### Features and Notable Changes + +#### Data Pipeline and External Communication + +- [data sources, config] Added *parser* for the `shadowserver.msmq` source. + +- [data sources, config] Removed support for the following data sources: + `blueliv.map` and `darklist-de.bl` (removed both *collectors* and *parsers*!) + as well as `shadowserver.modbus` (removed just this channel's *parser*). + +- [data sources] The *parsers* for the `dataplane.*` sources have been + changed to support the current data format (there was a need to change + the delimiter and the row parsing mechanism...). + +- [data sources] The *collector* for the `abuse-ch.ssl-blacklist` source + (implemented in `n6datasources.collectors.abuse_ch` as the class named + `AbuseChSslBlacklistCollector`) used to support the legacy state format + related to the value of the `row_time_legacy_state_key` attribute -- it + is no longer supported, as `_BaseAbuseChDownloadingTimeOrderedRowsCollect` + (the local base class) no longer makes use of that attribute. *Note:* + these changes are relevant and breaking *only* if you need to load a + *collector state* in a very old format -- almost certainly you do *not*. + +- [data sources] A new processing mechanism has been added to + numerous existing *parsers* for `shadowserver.*` sources (by + enhancing the `_BaseShadowserverParser` class, defined in the + module `n6datasources.parsers.shadowserver`) -- concerning events + categorized as `"amplifier"`. The mechanism is activated when a + `CVE-...`-like-regex-based match is found in the `tag` field of + the input data -- then the *parser*, apart from yielding an event + (hereinafter referred to as a *basic* event) with `category` set to + `"amplifier"`, also yields an *extra* event -- which is identical to the + *basic* one, except that its `category` is set to `"vulnerable"` and its + `name` is set to the regex-matched value (which is, basically, the CVE + identifier). Because of that, `name` and `category` should no longer be + declared as *parser*'s `constant_items`, so now `_BaseShadowserverParser` + provides support for `additional_standard_items` (which is a *parser* + class's attribute similar to `constant_items`). For relevant *parser* + classes, the `name` and `category` items have been moved from their + `constant_items` to their `additional_standard_items`. + +- [data sources] Now the generic `*.misp` *collector* supports loading + state also in its legacy Python-2-specific format. + +- [data sources, data pipeline, lib] A new restriction (implemented in + `n6lib.data_spec.fields`, concerning the `IPv4FieldForN6` and + `AddressFieldForN6` classes) is that, from now on, the zero IP address + (`0.0.0.0`) is *neither* a valid component IP within a *record dict's* + `address` (i.e., its items' `ip`) or `enriched` (i.e., keys in the + mapping being its second item), *nor* a valid value of a *record dict's* + `dip`. Note that this restriction regards most of the *n6* pipeline + components, especially data *parsers* (via the machinery of + `n6lib.record_dict.RecordDict` *et consortes*...). + +- [data pipeline] The name of the AMQP input queue declared by `n6enrich` + has been changed (!) from `enrichement` to `enrichment`. + +- [data pipeline] The `n6enrich` pipeline component (implemented in + `n6datapipeline.enrich`): from now on, the zero IP address (`0.0.0.0`), + irrespective of its exact formatting (i.e., regardless whether some + octets are formatted with redundant leading zeros), is no longer taken + into account when IPs are extracted from `url`s, and when `fqdn`s are + resolved to IPs. + +- [data pipeline, config] From now on, when `n6recorder`, during its + activity (i.e., within `Recorder.input_callback()`...), encounters + an exception which represents a *database/DB API error* (i.e., an + instance of a `MySQLdb.MySQLError` subclass, possibly wrapped in + (an) SQLAlchemy-specific exception(s)...) whose *error code* (i.e., + `.args[0]` being an `int`, if any) indicates a *fatal + condition* -- then a `SystemExit()` is raised, so + that the AMQP input message is requeued and the `n6recorder` executable + script exits with a non-zero status. The set of *error codes* which are + considered *fatal* (i.e. which trigger this behavior) is configurable -- + by setting the `fatal_db_api_error_codes` configuration option in the + `recorder` section; by default, that set includes only one value: `1021` + (i.e., the `ERR_DISK_FULL` code -- see: + https://mariadb.com/kb/en/mariadb-error-codes/). + +- [Portal, REST API, Stream API, data pipeline, lib] A *security-related* + behavioral fix has been applied to the *event access rights* and *event + ownership* machinery (implemented in `n6lib.auth_api`...): from now on, + *IP-network-based access or ownership criteria* (those stored in the + `criteria_ip_network` and `inside_filter_ip_network` tables of Auth DB) + referring to networks that contain the zero IP address (`0.0.0.0`) are + translated to IP address ranges whose lower bound is `0.0.0.1` (in other + words, `0.0.0.0` is excluded). Thanks to that, *events without `ip` are + no longer erroneously considered as matching* such IP-network-based + criteria. In practice, *from the security point of view*, the fix is + most important when it comes to REST API and Portal; for other involved + components, i.e., `n6lilter` and `n6anonymizer`/Stream API, the security + risk was rather small or non-existent. *Note:* as the fix is related to + `n6filter`, it regards *also* values of `min_ip` in the `inside_criteria` + part of the JSON returned by the Portal API's endpoint `/info/config`; + they are displayed by the Portal's GUI in the *Account information* + view, below the *IP network filter* label -- as IP ranges' lower + bounds. + +- [Portal, REST API, lib] A behavioral fix related to the one described + above (yet, this time, not related to security) has been applied to the + procedure of translation of *the `ip.net` request parameter* to the + corresponding fragment of Event DB queries (see: the `ip_net_query()` + method of `n6lib.db_events.n6NormalizedData`...): from now on, each + value that refers to a network which contains the zero IP address + (`0.0.0.0`) is translated to an IP address range whose lower bound is + `0.0.0.1` (in other words, `0.0.0.0` is excluded); thanks to that, + *events with no `ip`* are no longer erroneously included in such cases. + +- [Portal, REST API, lib] A new restriction (implemented in + `n6lib.data_spec.fields`, concerning the `IPv4FieldForN6` and + `AddressFieldForN6` classes) is that the zero IP address (`0.0.0.0`) is + no longer a valid value of the `ip` and `dip` request parameters + received by REST API's endpoints and analogous Portal API's endpoints. + Also, regarding the Portal's GUI, the front-end validation part related + to the *IP* search parameter has been appropriately adjusted. + +- [Portal, REST API, lib] The mechanism of result data cleaning + (implemented as a part of a certain non-public stuff invoked in + `n6lib.data_spec.N6DataSpec.clean_result_dict()`) has been enhanced in + such a way that the `address` field of *cleaned result dicts* no longer + includes any items with `ip` equal to the zero IP address (`0.0.0.0`), + i.e., they are filtered out even if they appear in some Event DB records + (they could when it comes to legacy data). Note that it is complemented + by the already existing mechanism of removing from *raw result dicts* + any `ip` and `dip` fields whose values are equal to the zero IP address + (see: `n6lib.db_events.make_raw_result_dict()`...). + +#### Setup, Configuration and CLI + +- [data sources, data pipeline, config, docker/etc] Added, fixed, changed + and removed several config prototype (`*.conf`) files in the directories: + `N6DataSources/n6datasources/data/conf/`, + `N6DataPipeline/n6datapipeline/data/conf/` and + `etc/n6/`. *Note:* for some of them, manual adjustments in user's actual + configuration files are required (see the relevant comments in those + files...). + +- [setup, lib] `N6Lib`'s dependencies: changed the version of `dnspython` + from `1.16` to `2.4`. Also, added a new dependency, `importlib_resources`, + with version locked as `>=5.12, <5.13`. + +- [setup, data pipeline] `N6DataPipeline`'s dependencies: temporarily + locked the version of `intelmq` as `<3.2`. + +#### Developers-Relevant-Only Matters + +- [data pipeline] `n6datapipeline.enrich.Enricher`: renamed the + `url_to_fqdn_or_ip()` method to `url_to_hostname()`, and changed its + interface regarding its return values: now the method always returns + either a non-empty `str` or `None`. + +- [lib] `n6lib.common_helpers` and `n6sdk.encoding_helpers`: renamed the + `try_to_normalize_surrogate_pairs_to_proper_codepoints()` function to + `replace_surrogate_pairs_with_proper_codepoints()`. + +- [lib] Removed three functions from `n6lib.common_helpers`: + `is_ipv4()`, `is_pure_ascii()` and `lower_if_pure_ascii()`. + +- [lib] `n6lib.db_events`: removed `IPAddress`'s constant attributes + `NONE` and `NONE_STR` (instead of them use the `n6lib.const`'s constants + `LACK_OF_IPv4_PLACEHOLDER_AS_INT` and `LACK_OF_IPv4_PLACEHOLDER_AS_STR`). + +- [lib] `n6lib.record_dict`: removed `RecordDict`'s constant attribute + `setitem_key_to_target_key` (together with some internal *experimental* + mechanism based on it...). + +- [lib] `n6lib.generate_test_events`: several changes and enhancements + regarding the `RandomEvent` class have been made, including some + modifications regarding its *configuration specification*... Also, the + configuration-related stuff has been factored out to a new mixin class, + `RandomEventGeneratorConfigMixin`. + +- [lib] `n6lib.url_helpers`: changed `normalize_url()`'s signature and + behavior... + +- [tests] `n6datasources.tests.parsers._parser_test_mixin.ParserTestMixin` + (and inheriting *parser* test classes): added checking *raw format + version tags* in parser tests (using the `ParserTestMixin`'s attribute + `PARSER_RAW_FORMAT_VERSION_TAG`...). + +### Less Notable Changes and Fixes + +- [data sources] Added missing `re.ASCII` flag to regex definitions in a + few parsers: `sblam.spam`, `spamhaus.drop` and `spamhaus.edrop` (note: + before these fixes, the lack of that flag caused that the affected + regexes were too broad...). + +- [data sources, config] Restored, in the `ShadowserverMailCollector` section + of the `N6DataSources/n6datasources/data/conf/60_shadowserver.conf` config + prototype file, the (mistakenly deleted) `"Poland Netcore/Netis Router + Vulnerability Scan":"netis"` item of the `subject_to_channel` mapping. + +- [data pipeline] `n6enrich`: fixed a few bugs concerning extraction of + a domain name (to become `fqdn`) or an IP address (to become `ip` in + `address`...) from the hostname part of `url`. Those bugs caused that, + for certain (rather uncommon) cases of malformed or untypical URLs, + whole events were rejected, or (*only* for some cases and *only* if + `__debug__` was false, i.e., when the Python's *assertion-removal + optimization* mode was in effect) that the resultant event's `enriched` + field erroneously included the `"fqdn"` marker whereas `fqdn` was *not* + successfully extracted from `url`). + +- [data pipeline] Fixed `n6anonymizer`: now the + `_get_result_dicts_and_output_body()` method of + `n6datapipeline.aux.anonymizer.Anonymizer` returns + objects of the proper type (`bytes`). + +- [Admin Panel] Fixed a *RIPE search*-related bug in Admin Panel (in + `N6AdminPanel/n6adminpanel/static/lookup_api_handler.js` -- in the + `RipePopupBase._getListsOfSeparatePersonOrOrgData()` function where the + initial empty list was inadvertently added to the `resultList`, leading + to duplicate data entries in certain cases; this update ensures that a + new `currentList` is only added to `resultList` upon encountering a + valid separator and contains data, preventing the addition of an empty + initial list and the duplication of the first data set). + +- [lib, Admin Panel] Added an `org`-key-based search feature to the + `n6lib.ripe_api_client.RIPEApiClient`, enabling it to perform additional + searches when encountering the `org` key. The enhancement allows for the + retrieval and integration of organization-specific results into the + existing data set, broadening the overall search capabilities (and, + consequently, improving UX for users of Admin Panel, which makes use of + `RIPEApiClient`). + +- [lib] `n6lib.common_helpers`: from now on, the + `ip_network_tuple_to_min_max_ip()` function (also available + via `n6sdk.encoding_helpers`) accepts an optional flag argument, + `force_min_ip_greater_than_zero`. + +- [lib] `n6lib.common_helpers`: added the `as_str_with_minimum_esc()` + function (also available via `n6sdk.encoding_helpers`). + +- [lib] `n6lib.const`: added the + `LACK_OF_IPv4_PLACEHOLDER_AS_INT` (equal to `0`) and + `LACK_OF_IPv4_PLACEHOLDER_AS_STR` (equal to `"0.0.0.0"`) constants. + +- [lib, tests] `n6lib.unit_test_helpers`: added to `TestCaseMixin` a new + helper method, `raise_exc()`. + +- [docker/etc] Replaced expired test/example certificates. + +- [data sources, data pipeline, portal, setup, config, cli, lib, tests, docker/etc, docs] + Various additions, fixes, changes, enhancements as well as some cleanups, + and code modernization/refactoring... + +- [lib] Various additions, changes and removals regarding *experimental* code. ## [4.0.1] - 2023-06-03 @@ -48,8 +294,8 @@ Among others: much faster, and `n6aggregator`'s memory consumption has been considerably reduced; -- also, many minor improvements, a bunch of fixes, some refactorization - and various cleanups have been made. +- also, many minor improvements, a bunch of fixes, some refactoring and + various cleanups have been made. Note that some of the changes are *not* backwards-compatible. @@ -73,7 +319,7 @@ Note that some of the changes are *not* backwards-compatible. - in the *n6 REST API:* API-key-based authentication - and many, many more improvements, a bunch of fixes, as well as - some refactorization, removals and cleanups... + some refactoring, removals and cleanups... Note that many of the changes are *not* backwards-compatible. @@ -83,6 +329,19 @@ Also, note that most of the main elements of *n6* -- namely: Python-3-only (more precisely: are compatible with CPython 3.9). +## [Consecutive updates of 2.0 series...] + +[...] + + +## [2.0.0] - 2018-06-22 + +**This is the first public release of *n6*.** + + +[4.4.0]: https://github.com/CERT-Polska/n6/compare/v4.0.1...v4.4.0 [4.0.1]: https://github.com/CERT-Polska/n6/compare/v4.0.0...v4.0.1 [4.0.0]: https://github.com/CERT-Polska/n6/compare/v3.0.0...v4.0.0 [3.0.0]: https://github.com/CERT-Polska/n6/compare/v2.0.6a2-dev1...v3.0.0 +[Consecutive updates of 2.0 series...]: https://github.com/CERT-Polska/n6/compare/v2.0.0...v2.0.6a2-dev1 +[2.0.0]: https://github.com/CERT-Polska/n6/tree/v2.0.0 diff --git a/N6AdminPanel/n6adminpanel/static/lookup_api_handler.js b/N6AdminPanel/n6adminpanel/static/lookup_api_handler.js index c0b32f3..e6eff5f 100644 --- a/N6AdminPanel/n6adminpanel/static/lookup_api_handler.js +++ b/N6AdminPanel/n6adminpanel/static/lookup_api_handler.js @@ -209,7 +209,7 @@ class RipePopupBase extends PopupBase { _buildResultTable(data) { // the script requests only one value at the time from RIPE API // client, so the first item of result list is being handled - const resultList = this._getListsOfSeparatePersonData(data[0]); + const resultList = this._getListsOfSeparatePersonOrOrgData(data[0]); // every item of `resultList` will be used to create separate table for (const result of resultList) { const resultTable = document.createElement('table'); @@ -235,31 +235,35 @@ class RipePopupBase extends PopupBase { } } - _getListsOfSeparatePersonData(items) { - const resultList = []; - let currentList = []; - resultList.push(currentList); - for (const item of items) { - const [key, val] = item; - // single list-item containing both empty strings serves - // as a separator between each result set - if (!key && !val) { - if (currentList.length) resultList.push(currentList); + _getListsOfSeparatePersonOrOrgData(items) { + const resultList = []; + let currentList = []; + for (const item of items) { + const [key, val] = item; + // single list-item containing both empty strings serves + // as a separator between each result set + if (!key && !val) { + if (currentList.length) { + resultList.push(currentList); currentList = []; - } else { - // each item of the new list will be a 2-element list, - // where its first item is a key, and second - a list - // of values; there will be no key duplicates - const elIndex = currentList.findIndex(el => el[0] === key); - const valList = Array.isArray(val) ? val : [val]; - if (elIndex !== -1) { - currentList[elIndex][1].push(...valList); - } else { - currentList.push([key, valList]); - } } + continue; + } + + const elIndex = currentList.findIndex(el => el[0] === key); + const valList = Array.isArray(val) ? val : [val]; + if (elIndex !== -1) { + currentList[elIndex][1].push(...valList); + } else { + currentList.push([key, valList]); } - return resultList; + } + + if (currentList.length) { + resultList.push(currentList); + } + + return resultList; } } diff --git a/N6BrokerAuthApi/n6brokerauthapi/__init__.py b/N6BrokerAuthApi/n6brokerauthapi/__init__.py index 5fda5da..286b7c8 100644 --- a/N6BrokerAuthApi/n6brokerauthapi/__init__.py +++ b/N6BrokerAuthApi/n6brokerauthapi/__init__.py @@ -1,4 +1,4 @@ -# Copyright (c) 2019-2021 NASK. All rights reserved. +# Copyright (c) 2019-2023 NASK. All rights reserved. """ This package provides a REST API implementation intended to cooperate @@ -51,7 +51,7 @@ def _get_auth_manager_maker(self): return auth_manager_maker -# (see: https://github.com/rabbitmq/rabbitmq-auth-backend-http#what-must-my-web-server-do) +# (see: https://github.com/rabbitmq/rabbitmq-server/tree/main/deps/rabbitmq_auth_backend_http#what-must-my-web-server-do) # noqa RESOURCES = [ HttpResource( resource_id='user', diff --git a/N6DataPipeline/n6datapipeline/aux/anonymizer.py b/N6DataPipeline/n6datapipeline/aux/anonymizer.py index 4cc3ca6..0f9d8c6 100644 --- a/N6DataPipeline/n6datapipeline/aux/anonymizer.py +++ b/N6DataPipeline/n6datapipeline/aux/anonymizer.py @@ -1,4 +1,4 @@ -# Copyright (c) 2015-2021 NASK. All rights reserved. +# Copyright (c) 2015-2023 NASK. All rights reserved. """ Anonymizer -- performs validation and anonymization of event data @@ -22,10 +22,10 @@ class Anonymizer(LegacyQueuedBase): - # note: here `resource` denotes a *Stream API resource*: + # Note: here `resource` denotes a *Stream API resource*: # "inside" (corresponding to the "inside" access zone) or # "threats" (corresponding to the "threats" access zone) - # -- see the _get_resource_to_org_ids() method below + # -- see the `_get_resource_to_org_ids()` method below. OUTPUT_RK_PATTERN = '{resource}.{category}.{anon_source}' input_queue = { @@ -52,13 +52,13 @@ def __init__(self, **kwargs): self.data_spec = N6DataSpec() def input_callback(self, routing_key, body, properties): - # NOTE: we do not need to use n6lib.record_dict.RecordDict here, + # Note: we do not need to use `n6lib.record_dict.RecordDict` here, # because: - # * previous components (such as filter) have already done the - # preliminary validation (using RecordDict's mechanisms); - # * we are doing the final validation anyway using - # N6DataSpec.clean_result_dict() (below -- in the - # _get_result_dicts_and_output_body() method) + # * previous components (such as `n6filter`) have already performed + # the preliminary validation (using `RecordDict`'s mechanisms); + # * in this component we are doing the final validation anyway + # using `N6DataSpec.clean_result_dict()` (below -- in the + # `_get_result_dicts_and_output_body()` method). event_data = json.loads(body) with self.setting_error_event_info(event_data): event_type = routing_key.split('.', 1)[0] @@ -145,7 +145,7 @@ def _get_result_dicts_and_output_body(self, raw_result_dict = { k: v for k, v in event_data.items() if (k in self.data_spec.all_result_keys and - # eliminating empty `address` and `client` sequences + # Eliminating empty `address` and `client` sequences # (as the data spec will not accept them empty): not (k in ('address', 'client') and not v))} cleaned_result_dict = self.data_spec.clean_result_dict( @@ -154,11 +154,12 @@ def _get_result_dicts_and_output_body(self, full_access=False, opt_primary=False) cleaned_result_dict['type'] = event_type - # note: the output body will be a cleaned result dict, - # being an ordinary dict (not a RecordDict instance), - # with the 'type' item added, serialized to a string - # using n6sdk.pyramid_commons.renderers.data_dict_to_json() - output_body = data_dict_to_json(cleaned_result_dict) + # Note: the output body will be a *cleaned result dict*, + # being an ordinary `dict` (not a `RecordDict` instance), + # with the 'type' item added, then serialized to a `str` + # using `n6sdk.pyramid_commons.renderers.data_dict_to_json()`, + # and then encoded to `bytes`. + output_body = data_dict_to_json(cleaned_result_dict).encode('ascii') return ( raw_result_dict, cleaned_result_dict, @@ -175,7 +176,7 @@ def _get_result_dicts_and_output_body(self, '; '.join( '`{0}` org ids: {1}'.format( r, - ', '.join(map(repr, resource_to_org_ids[r])) or 'none') + ', '.join(map(ascii, resource_to_org_ids[r])) or 'none') for r in sorted(resource_to_org_ids))) raise @@ -218,8 +219,8 @@ def _publish_output_data(self, '* skipped for the org ids: {1}; ' '* done for the org ids: {2}'.format( r, - ', '.join(map(repr, resource_to_org_ids[r])) or 'none', - ', '.join(map(repr, done_resource_to_org_ids[r])) or 'none') + ', '.join(map(ascii, resource_to_org_ids[r])) or 'none', + ', '.join(map(ascii, done_resource_to_org_ids[r])) or 'none') for r in sorted(resource_to_org_ids))) raise else: diff --git a/N6DataPipeline/n6datapipeline/data/conf/00_global.conf b/N6DataPipeline/n6datapipeline/data/conf/00_global.conf index 5de931d..bc8c7db 100644 --- a/N6DataPipeline/n6datapipeline/data/conf/00_global.conf +++ b/N6DataPipeline/n6datapipeline/data/conf/00_global.conf @@ -1,16 +1,19 @@ [rabbitmq] -host=localhost +host = rabbit # `url` is a deprecated (and generally not used) legacy alias for `host` -url=%(host)s -port=5671 +url = %(host)s +port = 5671 # if you want to use SSL, the `ssl` option must be set to 1 and the # following options must be set to appropriate file paths: -ssl=0 -path_to_cert=~/cert -ssl_ca_certs=%(path_to_cert)s/testca/cacert.pem -ssl_certfile=%(path_to_cert)s/client/cert.pem -ssl_keyfile=%(path_to_cert)s/client/key.pem +ssl = 1 +path_to_cert = ~/certs +ssl_ca_certs = %(path_to_cert)s/n6-CA/cacert.pem +ssl_certfile = %(path_to_cert)s/cert.pem +ssl_keyfile = %(path_to_cert)s/key.pem # AMQP heartbeat interval for most of the components -heartbeat_interval=30 +heartbeat_interval = 30 + +# AMQP heartbeat interval for parser components +heartbeat_interval_parsers = 600 diff --git a/N6DataPipeline/n6datapipeline/data/conf/21_recorder.conf b/N6DataPipeline/n6datapipeline/data/conf/21_recorder.conf index d195212..4bf3ff0 100644 --- a/N6DataPipeline/n6datapipeline/data/conf/21_recorder.conf +++ b/N6DataPipeline/n6datapipeline/data/conf/21_recorder.conf @@ -10,3 +10,10 @@ ;echo = 0 ;wait_timeout = 28800 + +# Which database API exceptions' error codes should be considered *fatal*, +# i.e., should make the n6recorder script requeue the AMQP input message +# and then immediately exit with a non-zero status (by default, only one +# error code is considered *fatal*: 1021 which represents the *disk full* +# condition -- see: https://mariadb.com/kb/en/mariadb-error-codes/). +;fatal_db_api_error_codes = 1021, diff --git a/N6DataPipeline/n6datapipeline/enrich.py b/N6DataPipeline/n6datapipeline/enrich.py index 394f184..de820aa 100644 --- a/N6DataPipeline/n6datapipeline/enrich.py +++ b/N6DataPipeline/n6datapipeline/enrich.py @@ -1,4 +1,4 @@ -# Copyright (c) 2013-2021 NASK. All rights reserved. +# Copyright (c) 2013-2023 NASK. All rights reserved. import collections import os @@ -10,8 +10,13 @@ from geoip2 import database, errors from n6datapipeline.base import LegacyQueuedBase -from n6lib.common_helpers import replace_segment, is_ipv4 +from n6lib.common_helpers import ( + ipv4_to_int, + ipv4_to_str, + replace_segment, +) from n6lib.config import ConfigMixin +from n6lib.const import LACK_OF_IPv4_PLACEHOLDER_AS_INT from n6lib.log_helpers import get_logger, logging_configured from n6lib.record_dict import RecordDict from n6sdk.addr_helpers import IPv4Container @@ -25,7 +30,7 @@ class Enricher(ConfigMixin, LegacyQueuedBase): input_queue = { 'exchange': 'event', 'exchange_type': 'topic', - 'queue_name': 'enrichement', + 'queue_name': 'enrichment', 'accepted_event_types': [ 'event', 'bl', @@ -126,39 +131,43 @@ def _extract_ip_or_fqdn(self, data): ip_from_url = fqdn_from_url = None url = data.get('url') if url is not None: - _fqdn_or_ip = self.url_to_fqdn_or_ip(url) - # ^ note: the returned _fqdn_or_ip *can* be an empty string - ## but it should not be None; added the following condition for debug - if _fqdn_or_ip is None: - LOGGER.error( - '_fqdn_or_ip is None, source: %a, url: %a', - data['source'], - url) - if is_ipv4(_fqdn_or_ip): - ip_from_url = _fqdn_or_ip - elif _fqdn_or_ip: - fqdn_from_url = _fqdn_or_ip + hostname = self.url_to_hostname(url) + if hostname is not None: + try: + ip_from_url = ipv4_to_str(hostname) + except ValueError: + # Note: FQDN validation + normalization will be done in + # `_maybe_set_fqdn()` (see below) by `RecordDict`'s stuff. + fqdn_from_url = hostname return ip_from_url, fqdn_from_url def _maybe_set_fqdn(self, fqdn_from_url, data, enriched_keys): if data.get('fqdn') is None and fqdn_from_url: data['fqdn'] = fqdn_from_url - enriched_keys.append('fqdn') + # (the value might be rejected by `RecordDict.adjust_fqdn()`) + if 'fqdn' in data: + enriched_keys.append('fqdn') def _maybe_set_address_ips(self, ip_from_url, data, ip_to_enriched_address_keys): if not data.get('address'): if data.get('fqdn') is None: - if ip_from_url: + if ip_from_url and not self._is_no_ip_placeholder(ip_from_url): data['address'] = [{'ip': ip_from_url}] ip_to_enriched_address_keys[ip_from_url].append('ip') elif not data.get('_do_not_resolve_fqdn_to_ip'): _address = [] for ip in self.fqdn_to_ip(data.get('fqdn')): - _address.append({'ip': ip}) - ip_to_enriched_address_keys[ip].append('ip') + if not self._is_no_ip_placeholder(ip): + _address.append({'ip': ip}) + ip_to_enriched_address_keys[ip].append('ip') if _address: data['address'] = _address + def _is_no_ip_placeholder(self, ip): + # (note: anyway, it would be rejected by `RecordDict`'s + # `adjust_enrich()` and `adjust_address()`) + return ipv4_to_int(ip) == LACK_OF_IPv4_PLACEHOLDER_AS_INT + def _filter_out_excluded_ips(self, data, ip_to_enriched_address_keys): assert 'address' in data if self.excluded_ips: @@ -237,29 +246,37 @@ def _final_sanity_assertions(self, data): for addr in data.get('address', ())} assert all( name in data - for name in enriched_keys) + for name in enriched_keys), enriched_keys assert all( set(addr_keys).issubset(ip_to_addr[ip]) - for ip, addr_keys in ip_to_enriched_address_keys.items()) + for ip, addr_keys in ip_to_enriched_address_keys.items()), ( + ip_to_enriched_address_keys, ip_to_addr) # # Resolution helpers - def url_to_fqdn_or_ip(self, url): + def url_to_hostname(self, url): + assert isinstance(url, str) parsed_url = urllib.parse.urlparse(url) if parsed_url.netloc.endswith(':'): # URL is probably wrong -- something like: "http://http://..." - return '' - return parsed_url.hostname + return None + hostname = parsed_url.hostname + if hostname is None or hostname == '': + return None + assert isinstance(hostname, str) and hostname + return hostname def fqdn_to_ip(self, fqdn): try: - dns_result = self._resolver.query(fqdn, 'A') + dns_result = self._resolver.resolve(fqdn, 'A', search=True) except DNSException: return [] ip_set = set() - for i in dns_result: - ip_set.add(str(i)) + for res in dns_result: + ip = str(res) + ip_normalized = ipv4_to_str(ip) # (typically unnecessary, but does not hurt...) + ip_set.add(ip_normalized) return sorted(ip_set) def ip_to_asn(self, ip): diff --git a/N6DataPipeline/n6datapipeline/recorder.py b/N6DataPipeline/n6datapipeline/recorder.py index 132c8b0..51ea678 100644 --- a/N6DataPipeline/n6datapipeline/recorder.py +++ b/N6DataPipeline/n6datapipeline/recorder.py @@ -13,6 +13,7 @@ import MySQLdb.cursors import sqlalchemy.event +from MySQLdb import MySQLError from sqlalchemy import ( create_engine, text as sqla_text, @@ -24,7 +25,12 @@ ) from n6datapipeline.base import LegacyQueuedBase -from n6lib.common_helpers import str_to_bool +from n6lib.common_helpers import ( + ascii_str, + make_exc_ascii_str, + replace_segment, + str_to_bool, +) from n6lib.config import Config from n6lib.data_backend_api import ( N6DataBackendAPI, @@ -44,11 +50,6 @@ RecordDict, BLRecordDict, ) -from n6lib.common_helpers import ( - ascii_str, - make_exc_ascii_str, - replace_segment, -) LOGGER = get_logger(__name__) @@ -115,6 +116,13 @@ class Recorder(LegacyQueuedBase): _MIN_WAIT_TIMEOUT = 3600 _MAX_WAIT_TIMEOUT = _DEFAULT_WAIT_TIMEOUT = 28800 + _DEFAULT_FATAL_DB_API_ERROR_CODES = ( + # A *string* (not a sequence of strings!) that contains DB API + # error codes, comma-separated (the string will be converted + # with `n6lib.config.Config.BASIC_CONVERTERS['list_of_int']`). + '1021,' # <- ER_DISK_FULL (see: https://mariadb.com/kb/en/mariadb-error-codes/) + ) + input_queue = { "exchange": "event", "exchange_type": "topic", @@ -139,6 +147,7 @@ def __init__(self, **kwargs): LOGGER.info("Recorder Start") config = Config(required={"recorder": ("uri",)}) self.config = config["recorder"] + self._fatal_db_api_error_codes = self._get_fatal_db_api_error_codes_from_config() self.record_dict = None self.records = None self.routing_key = None @@ -157,6 +166,11 @@ def __init__(self, **kwargs): self.HANDLE_EVENT = 1 super(Recorder, self).__init__(**kwargs) + def _get_fatal_db_api_error_codes_from_config(self): + conv = Config.BASIC_CONVERTERS['list_of_int'] + return frozenset(conv( + self.config.get('fatal_db_api_error_codes', self._DEFAULT_FATAL_DB_API_ERROR_CODES))) + def _setup_db(self): wait_timeout = int(self.config.get("wait_timeout", self._DEFAULT_WAIT_TIMEOUT)) wait_timeout = min(max(wait_timeout, self._MIN_WAIT_TIMEOUT), self._MAX_WAIT_TIMEOUT) @@ -446,6 +460,17 @@ def get_truncated_rk(rk, parts): return '.'.join(parts_rk) def input_callback(self, routing_key, body, properties): + try: + self._input_callback(routing_key, body, properties) + except Exception as exc: + error_code = self._get_db_api_error_code(exc) + if error_code in self._fatal_db_api_error_codes: + raise SystemExit( + f'Fatal DB API error code: {error_code!a} ' + f'(from {make_exc_ascii_str(exc)})') from exc + raise + + def _input_callback(self, routing_key, body, properties): """ Channel callback method """ # first let's try ping mysql server self.ping_connection() @@ -472,6 +497,21 @@ def input_callback(self, routing_key, body, properties): LOGGER.debug("properties: %a", properties) #LOGGER.debug("body: %a", body) + @staticmethod + def _get_db_api_error_code(exc): + while isinstance(exc, SQLAlchemyError): + orig = getattr(exc, 'orig', None) + if orig is exc: + break + exc = orig + if isinstance(exc, MySQLError): + exc_args = getattr(exc, 'args', None) + if isinstance(exc_args, tuple) and exc_args: + error_code = exc_args[0] + if isinstance(error_code, int): + return error_code + return None + def json_to_record(self, rows): """ Deserialize json to record db.append. diff --git a/N6DataPipeline/n6datapipeline/tests/test_anonymizer.py b/N6DataPipeline/n6datapipeline/tests/test_anonymizer.py index 2bc0b8a..01a399c 100644 --- a/N6DataPipeline/n6datapipeline/tests/test_anonymizer.py +++ b/N6DataPipeline/n6datapipeline/tests/test_anonymizer.py @@ -1,4 +1,4 @@ -# Copyright (c) 2015-2021 NASK. All rights reserved. +# Copyright (c) 2015-2023 NASK. All rights reserved. import datetime import json @@ -36,7 +36,7 @@ def setUp(self): self.event_type = 'bl-update' self.event_data = {'some...': 'content...', 'id': 'some id...'} self.routing_key = self.event_type + '.filtered.*.*' - self.body = json.dumps(self.event_data) + self.body = json.dumps(self.event_data).encode('ascii') self.resource_to_org_ids = {} self.mock = MagicMock(__class__=Anonymizer) diff --git a/N6DataPipeline/n6datapipeline/tests/test_enrich.py b/N6DataPipeline/n6datapipeline/tests/test_enrich.py index 091293a..03f4e0b 100644 --- a/N6DataPipeline/n6datapipeline/tests/test_enrich.py +++ b/N6DataPipeline/n6datapipeline/tests/test_enrich.py @@ -1,4 +1,4 @@ -# Copyright (c) 2013-2022 NASK. All rights reserved. +# Copyright (c) 2013-2023 NASK. All rights reserved. import datetime import hashlib @@ -76,7 +76,7 @@ def setUp(self, config_mock, *args): Enricher._setup_dnsresolver = unittest.mock.MagicMock() self.enricher = Enricher() self.enricher._resolver = unittest.mock.MagicMock() - self.enricher._resolver.query = unittest.mock.MagicMock(return_value=["127.0.0.1"]) + self.enricher._resolver.resolve = unittest.mock.MagicMock(return_value=["127.0.0.1"]) def test__ip_to_asn__called_or_not(self): """ @@ -98,28 +98,30 @@ def test__ip_to_cc__called_or_not(self): def test__enrich__with_fqdn_given(self): data = self.enricher.enrich(RecordDict({"fqdn": "cert.pl"})) - self.enricher._resolver.query.assert_called_once_with("cert.pl", "A") + self.enricher._resolver.resolve.assert_called_once_with("cert.pl", "A", search=True) return data def test__enrich__with_fqdn_given__resolved_to_various_ips_with_duplicates(self): - self.enricher._resolver.query.return_value = [ + self.enricher._resolver.resolve.return_value = [ '2.2.2.2', + '0.0.0.0', # equal to `n6lib.const.LACK_OF_IPv4_PLACEHOLDER_AS_STR` '127.0.0.1', '13.1.2.3', - '1.1.1.1', - '127.0.0.1', # duplicate + '001.1.000001.1', # with redundant 0s in octets (see #8860...) + '127.00000.0000.1', # with redundant 0s in octets, effectively duplicate of 127.0.0.1 '13.1.2.3', # duplicate '12.11.10.9', + '0.000.0.00', # *equivalent* to `n6lib.const.LACK_OF_IPv4_PLACEHOLDER_AS_STR` '13.1.2.3', # duplicate '1.0.1.1', ] data = self.enricher.enrich(RecordDict({"fqdn": "cert.pl"})) - self.enricher._resolver.query.assert_called_once_with("cert.pl", "A") + self.enricher._resolver.resolve.assert_called_once_with("cert.pl", "A", search=True) return data def test__enrich__with_url_given(self): data = self.enricher.enrich(RecordDict({"url": "http://www.nask.pl/asd"})) - self.enricher._resolver.query.assert_called_once_with("www.nask.pl", "A") + self.enricher._resolver.resolve.assert_called_once_with("www.nask.pl", "A", search=True) return data def test__enrich__with_ip_url_given(self): @@ -127,20 +129,29 @@ def test__enrich__with_ip_url_given(self): def test__enrich__with_ip_url_given__with_nodns_flag(self): return self.enricher.enrich(RecordDict({ - "url": "http://192.168.0.1/asd", + # A detail unrelated to the main purpose of this test + # method: here the `url` item contains, as the URL's + # hostname, a non-canonical form of IP (with redundant + # leading zeros; see: #8860...) -- to be normalized + # everywhere else (in `address` and `enriched`), except + # in the `url`. See also two more focused test methods + # added to `TestEnricherWithFullConfig`: + # `test__enrich__with_ip_url_given__when_ip_contains_zero_octet_with_redundant_0()` and + # `test__enrich__with_ip_url_given__when_ip_contains_nonzero_octet_with_redundant_0()`. + "url": "http://192.0168.0000.1/asd", "_do_not_resolve_fqdn_to_ip": True})) def test__enrich__with_fqdn_and_url_given(self): data = self.enricher.enrich(RecordDict({"fqdn": "cert.pl", "url": "http://www.nask.pl/asd"})) - self.enricher._resolver.query.assert_called_once_with("cert.pl", "A") + self.enricher._resolver.resolve.assert_called_once_with("cert.pl", "A", search=True) return data def test__enrich__with_fqdn_and_ip_url_given(self): data = self.enricher.enrich(RecordDict({ "fqdn": "cert.pl", "url": "http://192.168.0.1/asd"})) - self.enricher._resolver.query.assert_called_once_with("cert.pl", "A") + self.enricher._resolver.resolve.assert_called_once_with("cert.pl", "A", search=True) return data def test__enrich__with_address_and_fqdn_given(self): @@ -180,7 +191,7 @@ def test__enrich__with_excluded_ips_config__without_any_ip_to_exclude(self): self._prepare_config_for_excluded_ips(['2.2.2.2', '3.3.3.3']) self.enricher.excluded_ips = self.enricher._get_excluded_ips() data = self.enricher.enrich(RecordDict({"url": "http://www.nask.pl/asd"})) - self.enricher._resolver.query.assert_called_once_with("www.nask.pl", "A") + self.enricher._resolver.resolve.assert_called_once_with("www.nask.pl", "A", search=True) return data # helper methods @@ -293,7 +304,7 @@ def test__enrich__with_fqdn_given__with_nodns_flag(self): data = self.enricher.enrich(RecordDict({ "fqdn": "cert.pl", "_do_not_resolve_fqdn_to_ip": True})) - self.assertFalse(self.enricher._resolver.query.called) + self.assertFalse(self.enricher._resolver.resolve.called) self.assertEqualIncludingTypes(data, RecordDict({ "enriched": ([], {}), "fqdn": "cert.pl", @@ -310,9 +321,9 @@ def test__enrich__with_fqdn_given__resolved_to_various_ips_with_duplicates(self) "13.1.2.3": ["asn", "cc", "ip"], "2.2.2.2": ["asn", "cc", "ip"]}), "fqdn": "cert.pl", - "address": [{"ip": '1.0.1.1', # note: *removed IP duplicates* and - "asn": '1234', # *ordered* by IP (textually) - "cc": 'PL'}, + "address": [{"ip": '1.0.1.1', # note: *removed IP duplicates* + anything equivalent to + "asn": '1234', # `n6lib.const.LACK_OF_IPv4_PLACEHOLDER_AS_STR`; + "cc": 'PL'}, # and then *ordered* by IP (textually) {"ip": '1.1.1.1', "asn": '1234', "cc": 'PL'}, @@ -343,7 +354,7 @@ def test__enrich__with_url_given__with_nodns_flag(self): data = self.enricher.enrich(RecordDict({ "url": "http://www.nask.pl/asd", "_do_not_resolve_fqdn_to_ip": True})) - self.assertFalse(self.enricher._resolver.query.called) + self.assertFalse(self.enricher._resolver.resolve.called) self.assertEqualIncludingTypes(data, RecordDict({ "enriched": (["fqdn"], {}), "url": "http://www.nask.pl/asd", @@ -357,15 +368,33 @@ def test__enrich__with_wrong_url_given(self): "enriched": ([], {}), "url": "http://http://www.nask.pl/asd"})) + def test__enrich__with_url_given__when_url_does_not_contain_hostname(self): + # (see: #8860) + data = self.enricher.enrich(RecordDict({"url": "file:///asd", "source": "a.b"})) + self.assertEqual(self.enricher._resolver.mock_calls, []) + self.assertEqualIncludingTypes(data, RecordDict({ + "enriched": ([], {}), + "url": "file:///asd", + "source": "a.b"})) + + def test__enrich__with_url_given__when_url_contains_unusable_hostname(self): + # (see: #8860) + # The URL's hostname is neither an acceptable domain name nor a valid IP. + data = self.enricher.enrich(RecordDict({"url": "http://111.222.333.444.555/asd"})) + self.assertEqual(self.enricher._resolver.mock_calls, []) + self.assertEqualIncludingTypes(data, RecordDict({ + "enriched": ([], {}), + "url": "http://111.222.333.444.555/asd"})) + def test__enrich__with_fqdn_not_resolved(self): - self.enricher._resolver.query = unittest.mock.MagicMock(side_effect=DNSException) + self.enricher._resolver.resolve = unittest.mock.MagicMock(side_effect=DNSException) data = self.enricher.enrich(RecordDict({"fqdn": "cert.pl"})) self.assertEqualIncludingTypes(data, RecordDict({ "enriched": ([], {}), "fqdn": "cert.pl"})) def test__enrich__with_fqdn_from_url_not_resolved(self): - self.enricher._resolver.query = unittest.mock.MagicMock(side_effect=DNSException) + self.enricher._resolver.resolve = unittest.mock.MagicMock(side_effect=DNSException) data = self.enricher.enrich(RecordDict({"url": "http://www.nask.pl/asd"})) self.assertEqualIncludingTypes(data, RecordDict({ "enriched": (["fqdn"], {}), @@ -381,13 +410,47 @@ def test__enrich__with_ip_url_given(self): "asn": '1234', "cc": 'PL'}]})) + def test__enrich__with_ip_url_given__when_ip_contains_zero_octet_with_redundant_0(self): + # (see: #8860) + data = self.enricher.enrich(RecordDict({"url": "http://192.168.00.1/asd"})) + self.assertEqualIncludingTypes(data, RecordDict({ + "enriched": ([], {"192.168.0.1": ["asn", "cc", "ip"]}), + "url": "http://192.168.00.1/asd", + "address": [{"ip": '192.168.0.1', + "asn": '1234', + "cc": 'PL'}]})) + + def test__enrich__with_ip_url_given__when_ip_contains_nonzero_octet_with_redundant_0(self): + # (see: #8860) + data = self.enricher.enrich(RecordDict({"url": "http://192.0168.0.1/asd"})) + self.assertEqualIncludingTypes(data, RecordDict({ + "enriched": ([], {"192.168.0.1": ["asn", "cc", "ip"]}), + "url": "http://192.0168.0.1/asd", + "address": [{"ip": '192.168.0.1', + "asn": '1234', + "cc": 'PL'}]})) + + def test__enrich__with_ip_url_given__when_ip_is_lack_of_ip_placeholder(self): + # The IP in the URL is equal to `n6lib.const.LACK_OF_IPv4_PLACEHOLDER_AS_STR`. + data = self.enricher.enrich(RecordDict({"url": "http://0.0.0.0/asd"})) + self.assertEqualIncludingTypes(data, RecordDict({ + "enriched": ([], {}), + "url": "http://0.0.0.0/asd"})) + + def test__enrich__with_ip_url_given__when_ip_is_lack_of_ip_placeholder_unnormalized(self): + # The IP in the URL is *equivalent* to `n6lib.const.LACK_OF_IPv4_PLACEHOLDER_AS_STR`. + data = self.enricher.enrich(RecordDict({"url": "http://000.0.0000.00/asd"})) + self.assertEqualIncludingTypes(data, RecordDict({ + "enriched": ([], {}), + "url": "http://000.0.0000.00/asd"})) + def test__enrich__with_ip_url_given__with_nodns_flag(self): data = super(TestEnricherWithFullConfig, self).test__enrich__with_ip_url_given__with_nodns_flag() self.assertEqualIncludingTypes(data, RecordDict({ # (here the '_do_not_resolve_fqdn_to_ip' flag did *not* change behaviour) "enriched": ([], {"192.168.0.1": ["asn", "cc", "ip"]}), - "url": "http://192.168.0.1/asd", + "url": "http://192.0168.0000.1/asd", "address": [{"ip": '192.168.0.1', "asn": '1234', "cc": 'PL'}], @@ -408,7 +471,7 @@ def test__enrich__with_fqdn_and_url_given__with_nodns_flag(self): "fqdn": "cert.pl", "url": "http://www.nask.pl/asd", "_do_not_resolve_fqdn_to_ip": True})) - self.assertFalse(self.enricher._resolver.query.called) + self.assertFalse(self.enricher._resolver.resolve.called) self.assertEqualIncludingTypes(data, RecordDict({ "enriched": ([], {}), "url": "http://www.nask.pl/asd", @@ -497,21 +560,21 @@ def test__fqdn_to_ip__called(self): self.enricher.enrich(data) self.enricher.fqdn_to_ip.assert_called_with("cert.pl") - def test__url_to_fqdn_or_ip__called(self): - """Test if url_to_fqdn_or_ip is called if data does not contain address and fqdn""" + def test__url_to_hostname__called(self): + """Test if url_to_hostname is called if data does not contain address and fqdn""" data = RecordDict({"url": "http://www.cert.pl"}) data.update(self.COMMON_DATA) - self.enricher.url_to_fqdn_or_ip = unittest.mock.MagicMock(return_value="www.cert.pl") + self.enricher.url_to_hostname = unittest.mock.MagicMock(return_value="www.cert.pl") self.enricher.enrich(data) - self.enricher.url_to_fqdn_or_ip.assert_called_with("http://www.cert.pl") + self.enricher.url_to_hostname.assert_called_with("http://www.cert.pl") - def test__url_to_fqdn_or_ip__called_for_ip_url(self): - """Test if url_to_fqdn_or_ip is called if data does not contain address and fqdn""" + def test__url_to_hostname__called_for_ip_url(self): + """Test if url_to_hostname is called if data does not contain address and fqdn""" data = RecordDict({"url": "http://192.168.0.1"}) data.update(self.COMMON_DATA) - self.enricher.url_to_fqdn_or_ip = unittest.mock.MagicMock(return_value="192.168.0.1") + self.enricher.url_to_hostname = unittest.mock.MagicMock(return_value="192.168.0.1") self.enricher.enrich(data) - self.enricher.url_to_fqdn_or_ip.assert_called_with("http://192.168.0.1") + self.enricher.url_to_hostname.assert_called_with("http://192.168.0.1") def test_adding_asn_cc_if_asn_not_valid_and_cc_is_valid(self): """Test if asn/cc are (maybe) added""" @@ -620,7 +683,7 @@ def test__fqdn_to_ip__not_called(self): self.assertFalse(self.enricher.fqdn_to_ip.called) def test_routing_key_modified(self): - """Test if routing key after enrichement is set to "enriched.*" + """Test if routing key after enrichment is set to "enriched.*" when publishing to output queue""" self.enricher.publish_output = unittest.mock.MagicMock() data = RecordDict({ @@ -655,7 +718,7 @@ def test__enrich__with_excluded_ips_config__with_some_ip_to_exclude__1(self): "address": [{'ip': "127.0.0.1"}]})) # the 'data' field is present, so FQDN will not be resolved # to IP addresses - self.assertFalse(self.enricher._resolver.query.called) + self.assertFalse(self.enricher._resolver.resolve.called) self.assertEqualIncludingTypes(data, RecordDict({ "enriched": (["fqdn"], {}), "url": "http://www.nask.pl/asd", @@ -665,7 +728,7 @@ def test__enrich__with_excluded_ips_config__with_some_ip_to_exclude__2(self): self._prepare_config_for_excluded_ips(['127.0.0.1', '2.2.2.2', '3.3.3.3']) self.enricher.excluded_ips = self.enricher._get_excluded_ips() data = self.enricher.enrich(RecordDict({"url": "http://www.nask.pl/asd"})) - self.enricher._resolver.query.assert_called_once_with("www.nask.pl", "A") + self.enricher._resolver.resolve.assert_called_once_with("www.nask.pl", "A", search=True) self.assertEqualIncludingTypes(data, RecordDict({ "enriched": (["fqdn"], {}), "url": "http://www.nask.pl/asd", @@ -814,7 +877,7 @@ def test__ip_to_cc__called_or_not(self): def test__enrich__with_fqdn_given(self): data = self.enricher.enrich(RecordDict({"fqdn": "cert.pl"})) - self.enricher._resolver.query.assert_called_once_with("cert.pl", "A") + self.enricher._resolver.resolve.assert_called_once_with("cert.pl", "A", search=True) self.assertEqualIncludingTypes(data, RecordDict({ "enriched": ([], {"127.0.0.1": ["cc", "ip"]}), "fqdn": "cert.pl", @@ -832,9 +895,9 @@ def test__enrich__with_fqdn_given__resolved_to_various_ips_with_duplicates(self) "13.1.2.3": ["cc", "ip"], "2.2.2.2": ["cc", "ip"]}), "fqdn": "cert.pl", - "address": [{"ip": '1.0.1.1', # note: *removed IP duplicates* and - "cc": 'PL'}, # *ordered* by IP (textually) - {"ip": '1.1.1.1', + "address": [{"ip": '1.0.1.1', # note: *removed IP duplicates* + anything equivalent to + "cc": 'PL'}, # `n6lib.const.LACK_OF_IPv4_PLACEHOLDER_AS_STR`; + {"ip": '1.1.1.1', # and then *ordered* by IP (textually) "cc": 'PL'}, {"ip": '12.11.10.9', "cc": 'PL'}, @@ -868,7 +931,7 @@ def test__enrich__with_ip_url_given__with_nodns_flag(self): self.assertEqualIncludingTypes(data, RecordDict({ # (here the '_do_not_resolve_fqdn_to_ip' flag did *not* change behaviour) "enriched": ([], {"192.168.0.1": ["cc", "ip"]}), - "url": "http://192.168.0.1/asd", + "url": "http://192.0168.0000.1/asd", "address": [{"ip": '192.168.0.1', "cc": 'PL'}], "_do_not_resolve_fqdn_to_ip": True})) @@ -1028,7 +1091,7 @@ def test__ip_to_cc__called_or_not(self): def test__enrich__with_fqdn_given(self): data = self.enricher.enrich(RecordDict({"fqdn": "cert.pl"})) - self.enricher._resolver.query.assert_called_once_with("cert.pl", "A") + self.enricher._resolver.resolve.assert_called_once_with("cert.pl", "A", search=True) self.assertEqualIncludingTypes(data, RecordDict({ "enriched": ([], {"127.0.0.1": ["asn", "ip"]}), "fqdn": "cert.pl", @@ -1046,9 +1109,9 @@ def test__enrich__with_fqdn_given__resolved_to_various_ips_with_duplicates(self) "13.1.2.3": ["asn", "ip"], "2.2.2.2": ["asn", "ip"]}), "fqdn": "cert.pl", - "address": [{"ip": '1.0.1.1', # note: *removed IP duplicates* and - "asn": '1234'}, # *ordered* by IP (textually) - {"ip": '1.1.1.1', + "address": [{"ip": '1.0.1.1', # note: *removed IP duplicates* + anything equivalent to + "asn": '1234'}, # `n6lib.const.LACK_OF_IPv4_PLACEHOLDER_AS_STR`; + {"ip": '1.1.1.1', # and then *ordered* by IP (textually) "asn": '1234'}, {"ip": '12.11.10.9', "asn": '1234'}, @@ -1082,7 +1145,7 @@ def test__enrich__with_ip_url_given__with_nodns_flag(self): self.assertEqualIncludingTypes(data, RecordDict({ # (here the '_do_not_resolve_fqdn_to_ip' flag did *not* change behaviour) "enriched": ([], {"192.168.0.1": ["asn", "ip"]}), - "url": "http://192.168.0.1/asd", + "url": "http://192.0168.0000.1/asd", "address": [{"ip": '192.168.0.1', "asn": '1234'}], "_do_not_resolve_fqdn_to_ip": True})) @@ -1241,7 +1304,7 @@ def test__ip_to_cc__called_or_not(self): def test__enrich__with_fqdn_given(self): data = self.enricher.enrich(RecordDict({"fqdn": "cert.pl"})) - self.enricher._resolver.query.assert_called_once_with("cert.pl", "A") + self.enricher._resolver.resolve.assert_called_once_with("cert.pl", "A", search=True) self.assertEqualIncludingTypes(data, RecordDict({ "enriched": ([], {"127.0.0.1": ["ip"]}), "fqdn": "cert.pl", @@ -1258,9 +1321,9 @@ def test__enrich__with_fqdn_given__resolved_to_various_ips_with_duplicates(self) "13.1.2.3": ["ip"], "2.2.2.2": ["ip"]}), "fqdn": "cert.pl", - "address": [{"ip": '1.0.1.1'}, # note: *removed IP duplicates* and - {"ip": '1.1.1.1'}, # *ordered* by IP (textually) - {"ip": '12.11.10.9'}, + "address": [{"ip": '1.0.1.1'}, # note: *removed IP duplicates* + anything equiv. to + {"ip": '1.1.1.1'}, # `n6lib.const.LACK_OF_IPv4_PLACEHOLDER_AS_STR`; + {"ip": '12.11.10.9'}, # and then *ordered* by IP (textually) {"ip": '127.0.0.1'}, {"ip": '13.1.2.3'}, {"ip": '2.2.2.2'}]})) @@ -1286,7 +1349,7 @@ def test__enrich__with_ip_url_given__with_nodns_flag(self): self.assertEqualIncludingTypes(data, RecordDict({ # (here the '_do_not_resolve_fqdn_to_ip' flag did *not* change behaviour) "enriched": ([], {"192.168.0.1": ["ip"]}), - "url": "http://192.168.0.1/asd", + "url": "http://192.0168.0000.1/asd", "address": [{"ip": '192.168.0.1'}], "_do_not_resolve_fqdn_to_ip": True})) diff --git a/N6DataPipeline/n6datapipeline/tests/test_filter.py b/N6DataPipeline/n6datapipeline/tests/test_filter.py index 130fb6b..4f243c9 100644 --- a/N6DataPipeline/n6datapipeline/tests/test_filter.py +++ b/N6DataPipeline/n6datapipeline/tests/test_filter.py @@ -1,4 +1,4 @@ -# Copyright (c) 2013-2021 NASK. All rights reserved. +# Copyright (c) 2013-2023 NASK. All rights reserved. import json import unittest @@ -139,9 +139,9 @@ def reset_body(self, d): d['address'][0]['asn'] = '1' d['address'][1]['asn'] = '1' d['address'][2]['asn'] = '1' - d['address'][0]['ip'] = '0.0.0.0' - d['address'][1]['ip'] = '0.0.0.1' - d['address'][2]['ip'] = '0.0.0.2' + d['address'][0]['ip'] = '0.0.0.1' + d['address'][1]['ip'] = '0.0.0.2' + d['address'][2]['ip'] = '1.0.0.0' return d def test__get_client_and_urls_matched__1(self): @@ -404,7 +404,7 @@ def test__get_client_and_urls_matched__single_ip(self): '77.4.0.0', '1.2.3.4', '10.20.30.40', - '0.0.0.0', + '0.0.0.1', '255.255.255.255', ]: data['address'][0]['ip'] = other_ip diff --git a/N6DataPipeline/setup.py b/N6DataPipeline/setup.py index 9e5b0bb..f9b87fb 100644 --- a/N6DataPipeline/setup.py +++ b/N6DataPipeline/setup.py @@ -56,7 +56,7 @@ def list_console_scripts(): n6_version = get_n6_version('.n6-version') -requirements = ['n6sdk==' + n6_version, 'n6lib==' + n6_version, 'intelmq'] +requirements = ['n6sdk==' + n6_version, 'n6lib==' + n6_version, 'intelmq<3.2'] console_scripts = list_console_scripts() setup( diff --git a/N6DataSources/console_scripts b/N6DataSources/console_scripts index ebe4b86..a6f431f 100644 --- a/N6DataSources/console_scripts +++ b/N6DataSources/console_scripts @@ -7,9 +7,6 @@ n6collector_abusechurlhauspayloadsamples = n6datasources.collectors.abuse_ch:Abu n6collector_abusechurlhauspayloadsurls = n6datasources.collectors.abuse_ch:AbuseChUrlhausPayloadsUrlsCollector_main n6collector_abusechurlhausurls = n6datasources.collectors.abuse_ch:AbuseChUrlhausUrlsCollector_main -## blueliv.* -n6collector_bluelivmap = n6datasources.collectors.blueliv:BluelivMapCollector_main - ## cert_pl.* n6collector_certplshield = n6datasources.collectors.cert_pl:CertPlShieldCollector_main @@ -19,9 +16,6 @@ n6collector_cesnetczwarden = n6datasources.collectors.cesnet_cz:CesnetCzWardenCo ## dan-tv.* n6collector_dantvtor = n6datasources.collectors.dan_tv:DanTvTorCollector_main -## darklist-de.* -n6collector_darklistdebl = n6datasources.collectors.darklist_de:DarklistDeBlCollector_main - ## dataplane.* n6collector_dataplanednsrd = n6datasources.collectors.dataplane:DataplaneDnsrdCollector_main n6collector_dataplanednsrdany = n6datasources.collectors.dataplane:DataplaneDnsrdanyCollector_main @@ -90,9 +84,6 @@ n6parser_abusechsslblacklist201902 = n6datasources.parsers.abuse_ch:AbuseChSslBl n6parser_abusechurlhauspayloadsurls = n6datasources.parsers.abuse_ch:AbuseChUrlhausPayloadsUrlsParser_main n6parser_abusechurlhausurls202001 = n6datasources.parsers.abuse_ch:AbuseChUrlhausUrls202001Parser_main -## blueliv.* -n6parser_bluelivmap = n6datasources.parsers.blueliv:BluelivMapParser_main - ## cert_pl.* n6parser_certplshield = n6datasources.parsers.cert_pl:CertPlShieldParser_main @@ -102,9 +93,6 @@ n6parser_cesnetczwarden = n6datasources.parsers.cesnet_cz:CesnetCzWardenParser_m ## dan-tv.* n6parser_dantvtor = n6datasources.parsers.dan_tv:DanTvTorParser_main -## darklist-de.* -n6parser_darklistdebl = n6datasources.parsers.darklist_de:DarklistDeBlParser_main - ## dataplane.* n6parser_dataplanednsrd = n6datasources.parsers.dataplane:DataplaneDnsrdParser_main n6parser_dataplanednsrdany = n6datasources.parsers.dataplane:DataplaneDnsrdanyParser_main @@ -161,9 +149,9 @@ n6parser_shadowserverldap201412 = n6datasources.parsers.shadowserver:Shadowserve n6parser_shadowserverldaptcp202204 = n6datasources.parsers.shadowserver:ShadowserverLdapTcp202204Parser_main n6parser_shadowservermdns201412 = n6datasources.parsers.shadowserver:ShadowserverMdns201412Parser_main n6parser_shadowservermemcached201412 = n6datasources.parsers.shadowserver:ShadowserverMemcached201412Parser_main -n6parser_shadowservermodbus202203 = n6datasources.parsers.shadowserver:ShadowserverModbus202203Parser_main n6parser_shadowservermongodb201412 = n6datasources.parsers.shadowserver:ShadowserverMongodb201412Parser_main n6parser_shadowservermqtt202204 = n6datasources.parsers.shadowserver:ShadowserverMqtt202204Parser_main +n6parser_shadowservermsmq202308 = n6datasources.parsers.shadowserver:ShadowserverMsmq202308Parser_main n6parser_shadowservermssql201412 = n6datasources.parsers.shadowserver:ShadowserverMssql201412Parser_main n6parser_shadowservernatpmp201412 = n6datasources.parsers.shadowserver:ShadowserverNatpmp201412Parser_main n6parser_shadowservernetbios201412 = n6datasources.parsers.shadowserver:ShadowserverNetbios201412Parser_main @@ -198,13 +186,13 @@ n6parser_spamhausbots = n6datasources.parsers.spamhaus:SpamhausBotsParser_main n6parser_spamhausdrop = n6datasources.parsers.spamhaus:SpamhausDropParser_main n6parser_spamhausedrop202303 = n6datasources.parsers.spamhaus:SpamhausEdrop202303Parser_main -## stopforum.* -n6parser_stopforumspam = n6datasources.parsers.stopforum:StopForumSpamParser_main - ## spam404_com.* n6parser_spam404comscamlist = n6datasources.parsers.spam404_com:Spam404ComScamListParser_main n6parser_spam404comscamlistbl = n6datasources.parsers.spam404_com:Spam404ComScamListBlParser_main +## stopforum.* +n6parser_stopforumspam = n6datasources.parsers.stopforum:StopForumSpamParser_main + ## zoneh.* n6parser_zonehrss = n6datasources.parsers.zoneh:ZonehRssParser_main diff --git a/N6DataSources/n6datasources/collectors/abuse_ch.py b/N6DataSources/n6datasources/collectors/abuse_ch.py index e163781..ca30932 100644 --- a/N6DataSources/n6datasources/collectors/abuse_ch.py +++ b/N6DataSources/n6datasources/collectors/abuse_ch.py @@ -54,23 +54,8 @@ class _BaseAbuseChDownloadingTimeOrderedRowsCollector(BaseDownloadingTimeOrderedRowsCollector): - row_time_legacy_state_key = None time_field_index = None - def load_state(self): - state = super().load_state() - if self.row_time_legacy_state_key and self.row_time_legacy_state_key in state: - # got `state` in a legacy form - row_time = self.normalize_row_time(state[self.row_time_legacy_state_key]) - state = { - # note: one or a few rows (those containing this "boundary" - # time value) will be duplicated, but we can live with that - self._NEWEST_ROW_TIME_STATE_KEY: row_time, - self._NEWEST_ROWS_STATE_KEY: set(), - self._ROWS_COUNT_KEY: None, - } - return state - def pick_raw_row_time(self, row): return extract_field_from_csv_row(row, column_index=self.time_field_index).strip() @@ -116,7 +101,6 @@ class AbuseChSslBlacklistCollector(_BaseAbuseChDownloadingTimeOrderedRowsCollect raw_format_version_tag = '201902' - row_time_legacy_state_key = 'time' time_field_index = 0 def get_source(self, **processed_data) -> str: @@ -278,17 +262,9 @@ class AbuseChUrlhausPayloadSamplesCollector(StatefulCollectorMixin, def __init__(self, **kwargs): super().__init__(**kwargs) - self._samples_per_run: int = self._get_samples_per_run() - self._state: Optional[dict] = None self._today: datetime.date = datetime.datetime.today().date() - self._payload_sample_and_headers_pairs: FilePagedSequence = FilePagedSequence( - page_size=self.config['max_samples_in_memory'], - ) - - def run_collection(self) -> None: - with self._payload_sample_and_headers_pairs: - super().run_collection() + self._state: dict = self.load_state() def _get_samples_per_run(self) -> int: if not 0 < self.config['samples_per_run'] <= 1000: @@ -304,12 +280,11 @@ def make_default_state(self) -> dict: def obtain_input_pile(self) -> Optional[Sequence[tuple[bytes, dict]]]: LOGGER.info("%s's main activity started", self.__class__.__name__) - self._state = self.load_state() self._maintain_state() all_recent_payload_summaries = self._fetch_recent_payload_summaries() - payload_sample_and_headers_pairs = self._payload_sample_and_headers_pairs + max_samples_in_memory = self.config['max_samples_in_memory'] + payload_sample_and_headers_pairs = FilePagedSequence(page_size=max_samples_in_memory) payload_sample_and_headers_pairs: MutableSequence[tuple[bytes, dict]] - assert not payload_sample_and_headers_pairs for payload_summary in all_recent_payload_summaries: if len(payload_sample_and_headers_pairs) >= self._samples_per_run: assert len(payload_sample_and_headers_pairs) == self._samples_per_run diff --git a/N6DataSources/n6datasources/collectors/blueliv.py b/N6DataSources/n6datasources/collectors/blueliv.py deleted file mode 100644 index 6cdf488..0000000 --- a/N6DataSources/n6datasources/collectors/blueliv.py +++ /dev/null @@ -1,63 +0,0 @@ -# Copyright (c) 2015-2023 NASK. All rights reserved. - -""" -Collector: `blueliv.map`. -""" - -import json -from urllib.parse import urljoin - -from n6datasources.collectors.base import ( - BaseDownloadingCollector, - BaseSimpleCollector, - add_collector_entry_point_functions, -) -from n6lib.config import combined_config_spec -from n6lib.log_helpers import get_logger - - -LOGGER = get_logger(__name__) - - -class BluelivMapCollector(BaseDownloadingCollector, BaseSimpleCollector): - - raw_type = 'blacklist' - content_type = 'application/json' - - config_spec_pattern = combined_config_spec(''' - [BluelivMapCollector] - base_url :: str - endpoint_name :: str - token :: str - ''') - - - def get_source(self, **processed_data): - return 'blueliv.map' - - def obtain_data_body(self, **kwargs): - base_url = self.config['base_url'] - endpoint_name = self.config['endpoint_name'] - token = self.config['token'] - - # TODO: verify these headers; some of them might be obsolete - # (see: #8706) - headers = { - 'Content-Type': 'application/json', - "Authorization": f"bearer {token}", - "User-Agent": 'SDK v2', - "X-API-CLIENT": f"{token}", - 'Accept-Encoding': 'gzip, deflate', - } - url = urljoin(base_url, endpoint_name) - endpoint_key = endpoint_name.split('/')[-1] - raw_response = json.loads(self.download(method='GET', - url=url, - custom_request_headers=headers)) - response = raw_response.get(endpoint_key) - if response: - return json.dumps(response).encode('utf-8') - return None - - -add_collector_entry_point_functions(__name__) diff --git a/N6DataSources/n6datasources/collectors/darklist_de.py b/N6DataSources/n6datasources/collectors/darklist_de.py deleted file mode 100644 index 2942a5e..0000000 --- a/N6DataSources/n6datasources/collectors/darklist_de.py +++ /dev/null @@ -1,32 +0,0 @@ -# Copyright (c) 2020-2023 NASK. All rights reserved. - -from n6datasources.collectors.base import ( - BaseDownloadingCollector, - BaseSimpleCollector, - add_collector_entry_point_functions, -) -from n6lib.config import combined_config_spec -from n6lib.log_helpers import get_logger - - -LOGGER = get_logger(__name__) - - -class DarklistDeBlCollector(BaseDownloadingCollector, BaseSimpleCollector): - - config_spec_pattern = combined_config_spec(''' - [{collector_class_name}] - url :: str - ''') - - raw_type = 'blacklist' - content_type = 'text/plain' - - def obtain_data_body(self) -> bytes: - return self.download(self.config['url']) - - def get_source(self, **processed_data): - return 'darklist-de.bl' - - -add_collector_entry_point_functions(__name__) diff --git a/N6DataSources/n6datasources/collectors/misp.py b/N6DataSources/n6datasources/collectors/misp.py index 5f935c4..111c221 100644 --- a/N6DataSources/n6datasources/collectors/misp.py +++ b/N6DataSources/n6datasources/collectors/misp.py @@ -20,6 +20,7 @@ ) from urllib.parse import urljoin +from dateutil.tz import gettz from pymisp import PyMISP import requests @@ -37,6 +38,10 @@ Config, ConfigSection, ) +from n6lib.datetime_helpers import ( + ReactionToProblematicTime, + datetime_with_tz_to_utc, +) from n6lib.log_helpers import get_logger from n6lib.typing_helpers import ( Jsonable, @@ -341,6 +346,81 @@ def make_default_state(self) -> StateDict: 'already_processed_sample_ids': set(), } + # (Py2-to-Py3-state-transition-related) + def get_py2_pickle_load_kwargs(self): + # We need to use the `latin1` encoding to be able to unpickle + # any Py2-pickled `datetime` objects (see: #8278 and #8717). + return dict(encoding='latin1') + + # (Py2-to-Py3-state-transition-related) + def adjust_state_from_py2_pickle(self, py2_state: dict) -> StateDict: + self._check_py2_state_keys(py2_state) + self._check_py2_state_value_types(py2_state) + self._check_py2_state_datetime_values(py2_state) + return { + # Differences between Py3's vs. Py2's state dicts: + # * `datetime` objects are still naive (timezone-unaware), + # but whereas in Py2 they represented local time, in Py3 + # they represent UTC time; + # * in Py3, a `set` (instead of a `list`) of `int` numbers + # is used to store identifiers of processed samples; + # * state dict keys are different. + 'events_last_proc_datetime': self.__as_utc(py2_state['events_publishing_datetime']), + 'samples_last_proc_datetime': self.__as_utc(py2_state['samples_publishing_datetime']), + 'already_processed_sample_ids': set(py2_state['last_published_samples']), + } + + def _check_py2_state_keys(self, py2_state: dict) -> None: + if py2_state.keys() != { + 'events_publishing_datetime', + 'samples_publishing_datetime', + 'last_published_samples', + }: + raise NotImplementedError( + f"unexpected set of Py2 state keys: " + f"{', '.join(map(ascii, py2_state.keys()))}") + + def _check_py2_state_value_types(self, py2_state: dict) -> None: + if not isinstance(py2_state['events_publishing_datetime'], datetime): + raise NotImplementedError( + f"unexpected {type(py2_state['events_publishing_datetime'])=!a}") + if not isinstance(py2_state['samples_publishing_datetime'], datetime): + raise NotImplementedError( + f"unexpected {type(py2_state['samples_publishing_datetime'])=!a}") + if not isinstance(py2_state['last_published_samples'], list): + raise NotImplementedError( + f"unexpected {type(py2_state['last_published_samples'])=!a}") + if not all(isinstance(sample_id, int) + for sample_id in py2_state['last_published_samples']): + raise NotImplementedError( + f"unexpected non-int value(s) found in " + f"{py2_state['last_published_samples']=!a}") + + def _check_py2_state_datetime_values(self, py2_state: dict) -> None: + if py2_state['events_publishing_datetime'].tzinfo is not None: + raise NotImplementedError( + f"unexpected non-None tzinfo of " + f"{py2_state['events_publishing_datetime']=!a}") + if py2_state['samples_publishing_datetime'].tzinfo is not None: + raise NotImplementedError( + f"unexpected non-None tzinfo of " + f"{py2_state['samples_publishing_datetime']=!a}") + + @staticmethod + def __as_utc(naive_local_dt: datetime) -> datetime: + # Let's obtain the local timezone (hopefully, the same in which + # `naive_local_dt` was made in Py2 using `datetime.now()`...). + tz = gettz() + if tz is None: + raise RuntimeError('could not get the local timezone') + naive_utc_dt = datetime_with_tz_to_utc( + naive_local_dt, + tz, # (<- its DST, if any, will be applied as appropriate) + on_ambiguous_time=ReactionToProblematicTime.PICK_THE_EARLIER, + on_non_existent_time=ReactionToProblematicTime.PICK_THE_EARLIER) + assert naive_utc_dt.tzinfo is None + return naive_utc_dt + # * Activity phase #1: preparations and event collection: def _make_misp_client(self) -> PyMISP: diff --git a/N6DataSources/n6datasources/data/conf/60_blueliv.conf b/N6DataSources/n6datasources/data/conf/60_blueliv.conf deleted file mode 100644 index 8432129..0000000 --- a/N6DataSources/n6datasources/data/conf/60_blueliv.conf +++ /dev/null @@ -1,8 +0,0 @@ -[BluelivMapCollector] -base_url = https://freeapi.blueliv.com/v1/ -endpoint_name = /crimeserver/last -token = -download_retries = 0 - -[BluelivMapParser] -prefetch_count = 1 diff --git a/N6DataSources/n6datasources/data/conf/60_cesnet_cz.conf b/N6DataSources/n6datasources/data/conf/60_cesnet_cz.conf index 45f598f..d0fddc6 100644 --- a/N6DataSources/n6datasources/data/conf/60_cesnet_cz.conf +++ b/N6DataSources/n6datasources/data/conf/60_cesnet_cz.conf @@ -1,13 +1,15 @@ -# collectors +# collector [CesnetCzWardenCollector] url = https://warden-hub.cesnet.cz/warden3/getEvents?nocat=Test -cert_file_path = /path/to/cert.pem -key_file_path = /path/to/key.pem download_retries = 1 +# The following options need to be customized in your actual configuration file. +cert_file_path = +key_file_path = -# parsers + +# parser [CesnetCzWardenParser] prefetch_count = 1 diff --git a/N6DataSources/n6datasources/data/conf/60_darklist_de.conf b/N6DataSources/n6datasources/data/conf/60_darklist_de.conf deleted file mode 100644 index e8ecece..0000000 --- a/N6DataSources/n6datasources/data/conf/60_darklist_de.conf +++ /dev/null @@ -1,11 +0,0 @@ -# collector - -[DarklistDeBlCollector] -url = https://darklist.de/raw.php -download_retries = 3 - - -# parser - -[DarklistDeBlParser] -prefetch_count = 1 diff --git a/N6DataSources/n6datasources/data/conf/60_misp.conf b/N6DataSources/n6datasources/data/conf/60_misp.conf index 436fdb6..d98092d 100644 --- a/N6DataSources/n6datasources/data/conf/60_misp.conf +++ b/N6DataSources/n6datasources/data/conf/60_misp.conf @@ -74,7 +74,7 @@ days_for_first_run = 15 # A standard collector-state-loading-and-saving-related setting; # its default value value should be OK in nearly all cases. -;state_dir = ~/.n6state :: path +;state_dir = ~/.n6state diff --git a/N6DataSources/n6datasources/data/conf/60_shadowserver.conf b/N6DataSources/n6datasources/data/conf/60_shadowserver.conf index 29e9612..b02dcb6 100644 --- a/N6DataSources/n6datasources/data/conf/60_shadowserver.conf +++ b/N6DataSources/n6datasources/data/conf/60_shadowserver.conf @@ -16,71 +16,62 @@ item_url_pattern = (https?://dl.shadowserver.org/[?a-zA-Z0-9_-]+) # the corresponding *source channels* (note that this collector # collaborates with multiple parsers...): subject_to_channel = { - "Poland Sinkhole HTTP Referrer": "sinkhole-referrer", - "Poland Sinkhole HTTP Drone": "sinkhole-drone", - "Poland Spam URL": "spam-url", - "Poland Drone": "drone", - "Poland Sandbox URL": "sandbox-url", - "Poland Command and Control": "cnc", - "Poland Microsoft Sinkhole": "sinkhole-microsoft", + "Poland Accessible Android Debug Bridge": "adb", + "Poland Accessible Apple Filing Protocol": "afp", + "Poland Accessible AMQP": "amqp", + "Poland Accessible Apple Remote Desktop": "ard", + "Poland Open Chargen": "chargen", + "Poland Accessible Cisco Smart Install": "cisco-smart-install", + "Poland Accessible CoAP": "coap", "Poland Compromised Website": "compromised-website", - "Poland Open Proxy": "open-proxy", - "Poland DNS Open Resolvers": "open-resolver", - "Poland IPMI": "ipmi", - "Poland Chargen": "chargen", - "Poland Netbios": "netbios", - "Poland NTP Version": "ntp-version", - "Poland Accessible SMB Service": "smb", - "Poland SNMP": "snmp", - "Poland SSDP": "ssdp", - "Poland QOTD": "qotd", - "Poland Netcore/Netis Router Vulnerability Scan": "netis", - "Poland SSLv3/Poodle Vulnerable Servers": "ssl-poodle", - "Poland Open Redis Server": "redis", - "Poland Open Memcached Server": "memcached", - "Poland Accessible/Open MongoDB Service": "mongodb", - "Poland Vulnerable NAT-PMP Systems": "natpmp", - "Poland Open MS-SQL Server Resolution Service": "mssql", - "Poland Open Elasticsearch Server": "elasticsearch", - "Poland SSL/Freak Vulnerable Servers": "ssl-freak", - "Poland NTP Monitor": "ntp-monitor", - "Poland Open Portmapper Scan": "portmapper", - "Poland SYNful Knock": "synfulknock", - "Poland Open mDNS Servers": "mdns", - "Poland Accessible XDMCP Service": "xdmcp", - "Poland Open DB2 Discovery Service": "db2", - "Poland Accessible RDP Services": "rdp", - "Poland Open TFTP Servers": "tftp", - "Poland ISAKMP Vulnerability Scan": "isakmp", - "Poland Accessible Telnet Service": "telnet", "Poland Accessible CWMP": "cwmp", - "Poland Open LDAP Services": "ldap", - "Poland IPv6 Sinkhole HTTP Drone": "ipv6-sinkhole-drone", - "Poland Accessible VNC Service": "vnc", - "Poland Sinkhole HTTP Events": "sinkhole-http", - "Poland Sinkhole Events": "sinkhole", "Poland Darknet Events": "darknet", - "Poland Accessible Modbus": "modbus", - "Poland Accessible ICS": "ics", - "Poland Accessible CoAP": "coap", - "Poland Accessible Ubiquiti Discovery Service": "ubiquiti", - "Poland Accessible Apple Remote Desktop": "ard", - "Poland Accessible MS-RDPEUDP": "rdpeudp", + "Poland Open DB2 Discovery Service": "db2", "Poland Accessible DVR DHCPDiscover": "dvr-dhcpdiscover", - "Poland Vulnerable HTTP": "http", + "Poland Open Elasticsearch Server": "elasticsearch", + "Poland Vulnerable Exchange Server": "exchange", "Poland Accessible FTP Service": "ftp", - "Poland Open MQTT": "mqtt", + "Poland Accessible Hadoop Service": "hadoop", + "Poland Vulnerable HTTP": "http", + "Poland Accessible ICS": "ics", + "Poland Open IPMI": "ipmi", + "Poland Open IPP": "ipp", + "Poland Vulnerable ISAKMP": "isakmp", + "Poland Open LDAP Services": "ldap", "Poland Open LDAP (TCP) Services": "ldap-tcp", - "Poland Accessible Rsync Service": "rsync", + "Poland Open mDNS": "mdns", + "Poland Open Memcached Server": "memcached", + "Poland Open MongoDB Service": "mongodb", + "Poland Open MQTT": "mqtt", + "Poland Accessible MSMQ Service": "msmq", + "Poland Open MS-SQL Server Resolution Service": "mssql", + "Poland Vulnerable NAT-PMP Systems": "natpmp", + "Poland Open Netbios": "netbios", + "Poland Netcore/Netis Router Vulnerability Scan": "netis", + "Poland NTP Monitor": "ntp-monitor", + "Poland NTP Version": "ntp-version", + "Poland DNS Open Resolvers": "open-resolver", + "Poland Open Portmapper Scan": "portmapper", + "Poland Open QOTD": "qotd", "Poland Accessible Radmin": "radmin", - "Poland Accessible Android Debug Bridge": "adb", - "Poland Accessible Apple Filing Protocol": "afp", - "Poland Accessible Cisco Smart Install": "cisco-smart-install", - "Poland Open IPP": "ipp", - "Poland Accessible Hadoop Service": "hadoop", - "Poland Vulnerable Exchange Server": "exchange", + "Poland Accessible RDP": "rdp", + "Poland Accessible MS-RDPEUDP": "rdpeudp", + "Poland Open Redis Server": "redis", + "Poland Accessible Rsync Service": "rsync", + "Poland Sandbox URL": "sandbox-url", + "Poland Sinkhole Events": "sinkhole", + "Poland Sinkhole HTTP Events": "sinkhole-http", + "Poland Accessible SMB Service": "smb", "Poland Vulnerable SMTP": "smtp", - "Poland Accessible AMQP": "amqp", + "Poland Open SNMP": "snmp", + "Poland Open SSDP": "ssdp", + "Poland SSL/Freak Vulnerable Servers": "ssl-freak", + "Poland SSLv3/Poodle Vulnerable Servers": "ssl-poodle", + "Poland Accessible Telnet Service": "telnet", + "Poland Open TFTP Servers": "tftp", + "Poland Accessible Ubiquiti Discovery Service": "ubiquiti", + "Poland Accessible VNC Service": "vnc", + "Poland Accessible XDMCP Service": "xdmcp", } # A Python dictionary that maps, for all corresponding parsers, @@ -113,9 +104,9 @@ channel_to_raw_format_version_tag = { "ldap-tcp": "202204", "mdns": "201412", "memcached": "201412", - "modbus": "202203", "mongodb": "201412", "mqtt": "202204", + "msmq": "202308", "mssql": "201412", "natpmp": "201412", "netbios": "201412", @@ -230,15 +221,15 @@ prefetch_count = 1 [ShadowserverMemcached201412Parser] prefetch_count = 1 -[ShadowserverModbus202203Parser] -prefetch_count = 1 - [ShadowserverMongodb201412Parser] prefetch_count = 1 [ShadowserverMqtt202204Parser] prefetch_count = 1 +[ShadowserverMsmq202308Parser] +prefetch_count = 1 + [ShadowserverMssql201412Parser] prefetch_count = 1 diff --git a/N6DataSources/n6datasources/data/conf/60_spam404_com.conf b/N6DataSources/n6datasources/data/conf/60_spam404_com.conf index 2ecda0b..b7dfff2 100644 --- a/N6DataSources/n6datasources/data/conf/60_spam404_com.conf +++ b/N6DataSources/n6datasources/data/conf/60_spam404_com.conf @@ -1,4 +1,4 @@ -# collectors +# collector [Spam404ComScamListBlCollector] url = https://raw.githubusercontent.com/Dawsey21/Lists/master/main-blacklist.txt diff --git a/N6DataSources/n6datasources/data/conf/60_spamhaus.conf b/N6DataSources/n6datasources/data/conf/60_spamhaus.conf index 82d5dca..2ba7917 100644 --- a/N6DataSources/n6datasources/data/conf/60_spamhaus.conf +++ b/N6DataSources/n6datasources/data/conf/60_spamhaus.conf @@ -2,9 +2,11 @@ [SpamhausBotsCollector] url = https://cert-data.spamhaus.org/api/bots? +download_retries = 1 + +# The following options need to be customized in your actual configuration file. cert = api_key = -download_retries = 1 [SpamhausDropCollector] url = https://www.spamhaus.org/drop/drop.txt diff --git a/N6DataSources/n6datasources/parsers/base.py b/N6DataSources/n6datasources/parsers/base.py index 505bb1a..2606dfc 100644 --- a/N6DataSources/n6datasources/parsers/base.py +++ b/N6DataSources/n6datasources/parsers/base.py @@ -672,7 +672,7 @@ def get_output_message_id(self, parsed): Make the id of the output message (aka `id`). Args/kwargs: - `parsed` (dict): + `parsed` (a RecordDict instance): As yielded by parse(). Returns: @@ -745,7 +745,7 @@ def iter_output_id_base_items(self, parsed): Generate items to become the base for the output message id. Args/kwargs: - `parsed` (dict): + `parsed` (a RecordDict instance): As yielded by parse(). Yields: diff --git a/N6DataSources/n6datasources/parsers/blueliv.py b/N6DataSources/n6datasources/parsers/blueliv.py deleted file mode 100644 index 555ec4b..0000000 --- a/N6DataSources/n6datasources/parsers/blueliv.py +++ /dev/null @@ -1,69 +0,0 @@ -# Copyright (c) 2015-2023 NASK. All rights reserved. - -""" -Parser: `blueliv.map`. -""" - -import datetime -import json - -from n6datasources.parsers.base import ( - BlackListParser, - add_parser_entry_point_functions, -) -from n6lib.log_helpers import get_logger -from n6lib.datetime_helpers import parse_iso_datetime_to_utc - - -LOGGER = get_logger(__name__) - - -CATEGORIES = { - 'MALWARE': 'malurl', - 'C_AND_C': 'cnc', - 'BACKDOOR': 'backdoor', - 'EXPLOIT_KIT': 'malurl', - 'PHISHING': 'phish', -} - -IGNORED_TYPES = ['TOR_IP'] - -NAMES = {'MALWARE': 'binary', 'EXPLOIT_KIT': 'exploit-kit'} - -EXPIRES_DAYS = 2 - - -class BluelivMapParser(BlackListParser): - - default_binding_key = "blueliv.map" - constant_items = { - 'restriction': 'public', - 'confidence': 'low', - '_do_not_resolve_fqdn_to_ip': True, - } - - def parse(self, data): - raw_events = json.loads(data['raw']) - for event in raw_events: - with self.new_record_dict(data) as parsed: - parsed['time'] = data['properties.timestamp'] - parsed['url'] = event['url'] - category = CATEGORIES.get(event['type']) - if not category: - if event['type'] not in IGNORED_TYPES: - LOGGER.warning('Unknown type received: %a. The event will be ignored.', - event['type']) - continue - parsed['category'] = category - if parsed['category'] == 'malurl': - parsed['name'] = NAMES[event['type']] - try: - parsed['address'] = {'ip': event['ip']} - except KeyError: - LOGGER.warning("No ip in data") - parsed['expires'] = (parse_iso_datetime_to_utc(data['properties.timestamp']) + - datetime.timedelta(days=EXPIRES_DAYS)) - yield parsed - - -add_parser_entry_point_functions(__name__) diff --git a/N6DataSources/n6datasources/parsers/darklist_de.py b/N6DataSources/n6datasources/parsers/darklist_de.py deleted file mode 100644 index ce8ba62..0000000 --- a/N6DataSources/n6datasources/parsers/darklist_de.py +++ /dev/null @@ -1,69 +0,0 @@ -# Copyright (c) 2020-2023 NASK. All rights reserved. - -""" -Parser: `darklist-de.bl`. -""" - -import csv -import re -from datetime import timedelta - -from n6datasources.parsers.base import ( - BlackListParser, - add_parser_entry_point_functions, -) -from n6lib.log_helpers import get_logger - - -LOGGER = get_logger(__name__) - -EXPIRES_DAYS = 7 - -DARKLIST_DE_IP_REGEX = re.compile(r''' - ( - (\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}) # IP - (/\d{1,2})? # network - ) - ''', re.VERBOSE) - -# Sample datetime from source: `21.04.2020 08:15` -DARKLIST_DE_DATETIME_REGEX = re.compile(r''' - (\d{1,2}\.\d{1,2}\.\d{4} # date - [ ] - \d{2}:\d{2}) # time - ''', re.VERBOSE) - - -class DarklistDeBlParser(BlackListParser): - - default_binding_key = "darklist-de.bl" - constant_items = { - "restriction": "public", - "confidence": "low", - "category": "scanning", - } - - bl_current_time_regex_group = 1 - bl_current_time_format = '%d.%m.%Y %H:%M' - bl_current_time_regex = DARKLIST_DE_DATETIME_REGEX - - def parse(self, data): - raw_events = csv.reader(data['csv_raw_rows'], quotechar='"') - for record in raw_events: - ip_record = DARKLIST_DE_IP_REGEX.search("".join(record)) - if not ip_record: - continue - with self.new_record_dict(data) as parsed: - parsed['address'] = {'ip': ip_record.group(2)} - # simple duck check to see if we might have ip - # in cidr notation (full validation will be - # done by record dict) - if ip_record.group(3): - parsed['ip_network'] = ip_record.group(1) - darklist_time = self.get_bl_current_time_from_data(data, parsed) - parsed['time'] = darklist_time - parsed['expires'] = darklist_time + timedelta(days=EXPIRES_DAYS) - yield parsed - - -add_parser_entry_point_functions(__name__) diff --git a/N6DataSources/n6datasources/parsers/dataplane.py b/N6DataSources/n6datasources/parsers/dataplane.py index 06a354b..2724709 100644 --- a/N6DataSources/n6datasources/parsers/dataplane.py +++ b/N6DataSources/n6datasources/parsers/dataplane.py @@ -34,14 +34,12 @@ class _DataplaneBaseParser(BlackListParser): name_item = None def parse(self, data): - rows = csv.reader(data['csv_raw_rows'], delimiter=',', quotechar='"') + rows = csv.reader(data['csv_raw_rows'], delimiter='|', quotechar='"') for row in rows: - row = row[0] - if row.startswith("#"): + if row[0].startswith("#"): continue - # fields: "ASN", "ASname", "ipaddr", "lastseen", "category" - _, _, ip, lastseen, _ = strip_fields(split_csv_row(row, delimiter="|")) + _, _, ip, lastseen, _ = strip_fields(row) # we skip rows with invalid IP address if not self._is_ip_valid(ip): diff --git a/N6DataSources/n6datasources/parsers/sblam.py b/N6DataSources/n6datasources/parsers/sblam.py index c8fd15f..d0f36cc 100644 --- a/N6DataSources/n6datasources/parsers/sblam.py +++ b/N6DataSources/n6datasources/parsers/sblam.py @@ -25,7 +25,7 @@ (\d{4}-\d{1,2}-\d{1,2} # date [ ] \d{2}:\d{2}:\d{2}) # time - ''', re.VERBOSE) + ''', re.ASCII | re.VERBOSE) class SblamSpamParser(BlackListParser): diff --git a/N6DataSources/n6datasources/parsers/shadowserver.py b/N6DataSources/n6datasources/parsers/shadowserver.py index cf081c3..305e199 100644 --- a/N6DataSources/n6datasources/parsers/shadowserver.py +++ b/N6DataSources/n6datasources/parsers/shadowserver.py @@ -28,9 +28,9 @@ * `shadowserver.ldap` * `shadowserver.mdns` * `shadowserver.memcached` -* `shadowserver.modbus` * `shadowserver.mongodb` * `shadowserver.mqtt` +* `shadowserver.msmq` * `shadowserver.mssql` * `shadowserver.natpmp` * `shadowserver.netbios` @@ -63,6 +63,7 @@ import csv import datetime +import re from collections.abc import MutableMapping from n6datasources.parsers.base import ( @@ -93,8 +94,9 @@ def _handle_address_field(parsed, address_mapping, row): class _BaseShadowserverParser(_ShadowserverAddressFieldsMixin, BaseParser): + """ - Abstract class parsers. + Base class for most of the Shadowserver parsers. `parse()` method uses dictionary `n6_field_to_data_key_mapping` to translate field names contained in data from collector @@ -106,6 +108,8 @@ class _BaseShadowserverParser(_ShadowserverAddressFieldsMixin, BaseParser): If 'time' is to be mapped to `data['properties.timestamp']`, do not include 'time' key in `n6_field_to_data_key_mapping`. + *** + *Important*: an "address" field is a special field, containing other linked fields: "ip", "asn" and "cc", where the "ip" field is mandatory. If an "address" field maps to a single @@ -114,20 +118,57 @@ class _BaseShadowserverParser(_ShadowserverAddressFieldsMixin, BaseParser): to all keys, that should be assigned to "address" field, e.g.: 'address': {'ip': value1, 'asn': value2, 'cc': value3} - This class can be easily used - if the data is to be parsed in simple way, - i.e.: `parsed['n6_key'] = row['source_key']`. - If at least one field must be parsed - non-standard way (e.g. parsed['proto'] = 'tcp'), - extend `parse()` method or use standard inheriting - from BaseParser and implement your own `parse()` method. + + *Important*: Handling of `amplifier` category events **and** + CVE-related data in the **tag** field. + + The base parser behaves typically for most events, however, in + situations when the event category is `amplifier` and + the `SHADOWSERVER_TAG_CVE_REGEX` match (later called a "CVE + pattern/match") is found within the "tag" field of the data, a + special handling mechanism is initiated. CVE matches are identified + by checking if the content of the "tag" field adheres to the CVE + pattern. + + When these conditions (the category being `amplifier` and a CVE match + in the "tag" field) are met, the parser generates two separate + events. The first event preserves the original category and data. + The second event has the CVE match as its name and its + category is updated to `vulnerable`. + + This dual event generation occurs automatically when the specified + conditions are met. This feature is built to work seamlessly with + Shadowserver parsers/data, including those that do not currently + have a "tag" field but may introduce one in the future. + + *** + + This class can be easily used if the data is to be parsed in simple + way, i.e.: `parsed['n6_key'] = row['source_key']`. If at least one + field must be parsed non-standard way (e.g. parsed['proto'] = + 'tcp') or if a different behavior is desired when encountering a CVE + pattern/match in the `amplifier` category, extend `parse()` method + or use standard inheriting from BaseParser and implement your own + `parse()` method. """ + # example of shadowserver cve tag's value: + # `cve-2000-1234567` + SHADOWSERVER_TAG_CVE_REGEX = re.compile( + r'\b(cve|CVE)' + r'-' + r'\d{4}' + r'-' + r'\d+', + re.ASCII) + + additional_standard_items = {} n6_field_to_data_key_mapping = NotImplemented delimiter = ',' quotechar = '"' def parse(self, data): + self._verify_items_are_unique() rows = csv.DictReader( data['csv_raw_rows'], delimiter=self.delimiter, @@ -135,19 +176,46 @@ def parse(self, data): ) for row in rows: with self.new_record_dict(data) as parsed: - for n6_field, data_field in self.n6_field_to_data_key_mapping.items(): - if n6_field == 'address': - self._handle_address_field(parsed, data_field, row) - else: - value = row.get(data_field) - if value: - parsed[n6_field] = value - if 'time' not in self.n6_field_to_data_key_mapping: - parsed['time'] = data.get('properties.timestamp') + self._populate_parsed(row, data, parsed) yield parsed + if 'amplifier' in (self.constant_items.get('category'), + self.additional_standard_items.get('category')): + tag = row.get('tag') + if tag is None: + continue + cve_match = self.SHADOWSERVER_TAG_CVE_REGEX.search(tag) + if cve_match is not None: + cve_id = cve_match.group() + with self.new_record_dict(data) as parsed_vuln: + self._populate_parsed(row, data, parsed_vuln) + parsed_vuln['category'] = 'vulnerable' + parsed_vuln['name'] = cve_id.lower() + yield parsed_vuln + + def _populate_parsed(self, row, data, parsed): + parsed.update(self.additional_standard_items) + for n6_field, data_field in self.n6_field_to_data_key_mapping.items(): + if n6_field == 'address': + self._handle_address_field(parsed, data_field, row) + else: + value = row.get(data_field) + if value: + parsed[n6_field] = value + if 'time' not in self.n6_field_to_data_key_mapping: + parsed['time'] = data.get('properties.timestamp') + + def _verify_items_are_unique(self): + shared_keys = set(self.constant_items).intersection(self.additional_standard_items) + if shared_keys: + raise ValueError( + f"The following key(s): {list(shared_keys)} exist(s) in both " + f"`additional_standard_items` and `constant_items`. " + f"Fix it manually.") + class ShadowserverFtp202204Parser(BaseParser): + """ Due to a different parsing logic, this parser does not inherit from the `ShadowserverBasicParserBaseClass`. @@ -255,9 +323,13 @@ class ShadowserverIpmi201412Parser(_BaseShadowserverParser): class ShadowserverChargen201412Parser(_BaseShadowserverParser): default_binding_key = 'shadowserver.chargen.201412' + constant_items = { 'restriction': 'need-to-know', 'confidence': 'medium', + } + + additional_standard_items = { 'category': 'amplifier', 'name': 'chargen', } @@ -273,9 +345,13 @@ class ShadowserverChargen201412Parser(_BaseShadowserverParser): class ShadowserverNetbios201412Parser(_BaseShadowserverParser): default_binding_key = 'shadowserver.netbios.201412' + constant_items = { 'restriction': 'need-to-know', 'confidence': 'medium', + } + + additional_standard_items = { 'category': 'amplifier', 'name': 'netbios', } @@ -309,6 +385,7 @@ class ShadowserverNetis201412Parser(_BaseShadowserverParser): class ShadowserverNtpVersion201412Parser(_BaseShadowserverParser): default_binding_key = 'shadowserver.ntp-version.201412' + constant_items = { 'restriction': 'need-to-know', 'confidence': 'medium', @@ -365,9 +442,13 @@ class ShadowserverSnmp201412Parser(_BaseShadowserverParser): class ShadowserverQotd201412Parser(_BaseShadowserverParser): default_binding_key = 'shadowserver.qotd.201412' + constant_items = { 'restriction': 'need-to-know', 'confidence': 'medium', + } + + additional_standard_items = { 'category': 'amplifier', 'name': 'qotd', } @@ -383,9 +464,13 @@ class ShadowserverQotd201412Parser(_BaseShadowserverParser): class ShadowserverSsdp201412Parser(_BaseShadowserverParser): default_binding_key = 'shadowserver.ssdp.201412' + constant_items = { 'restriction': 'need-to-know', 'confidence': 'medium', + } + + additional_standard_items = { 'category': 'amplifier', 'name': 'ssdp', } @@ -536,9 +621,13 @@ class ShadowserverNatpmp201412Parser(_BaseShadowserverParser): class ShadowserverMssql201412Parser(_BaseShadowserverParser): default_binding_key = 'shadowserver.mssql.201412' + constant_items = { 'restriction': 'need-to-know', 'confidence': 'medium', + } + + additional_standard_items = { 'category': 'amplifier', 'name': 'mssql', } @@ -619,6 +708,9 @@ class ShadowserverPortmapper201412Parser(_BaseShadowserverParser): constant_items = { 'restriction': 'need-to-know', 'confidence': 'medium', + } + + additional_standard_items = { 'category': 'amplifier', 'name': 'portmapper', } @@ -630,7 +722,7 @@ class ShadowserverPortmapper201412Parser(_BaseShadowserverParser): } def parse(self, data): - parsed_gen = super(ShadowserverPortmapper201412Parser, self).parse(data) + parsed_gen = super().parse(data) for item in parsed_gen: item['proto'] = 'udp' yield item @@ -639,9 +731,13 @@ def parse(self, data): class ShadowserverMdns201412Parser(_BaseShadowserverParser): default_binding_key = 'shadowserver.mdns.201412' + constant_items = { 'restriction': 'need-to-know', 'confidence': 'medium', + } + + additional_standard_items = { 'category': 'amplifier', 'name': 'mdns', } @@ -654,7 +750,7 @@ class ShadowserverMdns201412Parser(_BaseShadowserverParser): } def parse(self, data): - parsed_gen = super(ShadowserverMdns201412Parser, self).parse(data) + parsed_gen = super().parse(data) for item in parsed_gen: if item['proto'].lower() != 'udp': LOGGER.warning('Protocol is different from UDP - %r', item['proto']) @@ -664,9 +760,13 @@ def parse(self, data): class ShadowserverXdmcp201412Parser(_BaseShadowserverParser): default_binding_key = 'shadowserver.xdmcp.201412' + constant_items = { 'restriction': 'need-to-know', 'confidence': 'medium', + } + + additional_standard_items = { 'category': 'amplifier', 'name': 'xdmcp', } @@ -679,7 +779,7 @@ class ShadowserverXdmcp201412Parser(_BaseShadowserverParser): } def parse(self, data): - parsed_gen = super(ShadowserverXdmcp201412Parser, self).parse(data) + parsed_gen = super().parse(data) for item in parsed_gen: if item['proto'].lower() != 'udp': LOGGER.warning('Protocol is different from UDP - %r', item['proto']) @@ -891,27 +991,6 @@ class ShadowserverDarknet202203Parser(_BaseShadowserverParser): } -class ShadowserverModbus202203Parser(_BaseShadowserverParser): - - default_binding_key = 'shadowserver.modbus.202203' - constant_items = { - 'restriction': 'need-to-know', - 'confidence': 'medium', - 'category': 'vulnerable', - 'name': 'modbus', - } - - n6_field_to_data_key_mapping = { - 'time': 'timestamp', - 'address': 'ip', - 'dport': 'port', - 'proto': 'protocol', - 'vendor': 'vendor', - 'revision': 'revision', - 'product_code': 'product_code', - } - - class ShadowserverIcs202204Parser(_BaseShadowserverParser): default_binding_key = 'shadowserver.ics.202204' @@ -938,9 +1017,13 @@ class ShadowserverIcs202204Parser(_BaseShadowserverParser): class ShadowserverCoap202204Parser(_BaseShadowserverParser): default_binding_key = 'shadowserver.coap.202204' + constant_items = { 'restriction': 'need-to-know', 'confidence': 'medium', + } + + additional_standard_items = { 'category': 'amplifier', 'name': 'coap', } @@ -956,9 +1039,13 @@ class ShadowserverCoap202204Parser(_BaseShadowserverParser): class ShadowserverUbiquiti202204Parser(_BaseShadowserverParser): default_binding_key = 'shadowserver.ubiquiti.202204' + constant_items = { 'restriction': 'need-to-know', 'confidence': 'medium', + } + + additional_standard_items = { 'category': 'amplifier', 'name': 'ubiquiti', } @@ -974,9 +1061,13 @@ class ShadowserverUbiquiti202204Parser(_BaseShadowserverParser): class ShadowserverArd202204Parser(_BaseShadowserverParser): default_binding_key = 'shadowserver.ard.202204' + constant_items = { 'restriction': 'need-to-know', 'confidence': 'medium', + } + + additional_standard_items = { 'category': 'amplifier', 'name': 'ard', } @@ -992,9 +1083,13 @@ class ShadowserverArd202204Parser(_BaseShadowserverParser): class ShadowserverRdpeudp202204Parser(_BaseShadowserverParser): default_binding_key = 'shadowserver.rdpeudp.202204' + constant_items = { 'restriction': 'need-to-know', 'confidence': 'medium', + } + + additional_standard_items = { 'category': 'amplifier', 'name': 'rdpeudp', } @@ -1010,9 +1105,13 @@ class ShadowserverRdpeudp202204Parser(_BaseShadowserverParser): class ShadowserverDvrDhcpdiscover202204Parser(_BaseShadowserverParser): default_binding_key = 'shadowserver.dvr-dhcpdiscover.202204' + constant_items = { 'restriction': 'need-to-know', 'confidence': 'medium', + } + + additional_standard_items = { 'category': 'amplifier', 'name': 'dvr-dhcpdiscover', } @@ -1272,4 +1371,22 @@ class ShadowserverAmqp202204Parser(_BaseShadowserverParser): } +class ShadowserverMsmq202308Parser(_BaseShadowserverParser): + + default_binding_key = 'shadowserver.msmq.202308' + constant_items = { + 'restriction': 'need-to-know', + 'confidence': 'medium', + 'category': 'vulnerable', + 'name': 'msmq', + } + + n6_field_to_data_key_mapping = { + 'time': 'timestamp', + 'address': 'ip', + 'dport': 'port', + 'proto': 'protocol', + } + + add_parser_entry_point_functions(__name__) diff --git a/N6DataSources/n6datasources/parsers/spamhaus.py b/N6DataSources/n6datasources/parsers/spamhaus.py index 38244cc..25597dc 100644 --- a/N6DataSources/n6datasources/parsers/spamhaus.py +++ b/N6DataSources/n6datasources/parsers/spamhaus.py @@ -37,10 +37,10 @@ class _BaseSpamhausBlacklistParser(BlackListParser): EXPIRES_DAYS = 2 - bl_current_time_regex = re.compile(r"(?:Last-Modified:[ ]*)" + bl_current_time_regex = re.compile(r"Last-Modified:[ ]*" r"(?P\w{3},[ ]*\d{1,2}[ ]*\w{3}[ ]*" r"\d{4}[ ]*(\d{2}:?){3}[ ]*GMT)", - re.VERBOSE | re.IGNORECASE) + re.ASCII | re.IGNORECASE) bl_current_time_format = "%a, %d %b %Y %H:%M:%S GMT" def parse(self, data): diff --git a/N6DataSources/n6datasources/tests/collectors/test_abuse_ch.py b/N6DataSources/n6datasources/tests/collectors/test_abuse_ch.py index 7eac842..69f8c72 100644 --- a/N6DataSources/n6datasources/tests/collectors/test_abuse_ch.py +++ b/N6DataSources/n6datasources/tests/collectors/test_abuse_ch.py @@ -302,44 +302,6 @@ def cases(cls): }, ) - yield param( - config_content=''' - [AbuseChSslBlacklistCollector] - row_count_mismatch_is_fatal = False - url=https://www.example.com - download_retries=5 - ''', - initial_state={ - # legacy form of state - 'time': '2019-08-20 02:00:00', - }, - orig_data=cls.EXAMPLE_ORIG_DATA, - expected_publish_output_calls=[ - call( - # routing_key - 'abuse-ch.ssl-blacklist.201902', - - # body - ( - b'2019-08-20 02:00:00,f0a0k0e0d0s0h0a010000000000a0a0a00000000,ExampleName3\n' - b'2019-08-20 03:00:00,f0a0k0e0d0s0h0a010000000000a0a0a00000000,ExampleName4\n' - b'2019-08-20 03:00:00,f0a0k0e0d0s0h0a010000000000a0a0a00000000,ExampleName5' - ), - - # prop_kwargs - cls.DEFAULT_PROP_KWARGS, - ), - ], - expected_saved_state={ - 'newest_row_time': '2019-08-20 03:00:00', - 'newest_rows': { - '2019-08-20 03:00:00,f0a0k0e0d0s0h0a010000000000a0a0a00000000,ExampleName5', - '2019-08-20 03:00:00,f0a0k0e0d0s0h0a010000000000a0a0a00000000,ExampleName4' - }, - 'rows_count': 6 - }, - ) - @foreach(cases) def test(self, **kwargs): self._perform_test(**kwargs) diff --git a/N6DataSources/n6datasources/tests/collectors/test_blueliv.py b/N6DataSources/n6datasources/tests/collectors/test_blueliv.py deleted file mode 100644 index be284eb..0000000 --- a/N6DataSources/n6datasources/tests/collectors/test_blueliv.py +++ /dev/null @@ -1,163 +0,0 @@ -# Copyright (c) 2023 NASK. All rights reserved. - -import json - -from unittest.mock import call -from unittest_expander import ( - expand, - foreach, - param, - paramseq, -) - -from n6datasources.collectors.base import BaseDownloadingCollector -from n6datasources.collectors.blueliv import BluelivMapCollector -from n6datasources.tests.collectors._collector_test_helpers import BaseCollectorTestCase -from n6lib.unit_test_helpers import ( - AnyInstanceOf, - AnyMatchingRegex, -) - - -@expand -class TestBluelivMapCollector(BaseCollectorTestCase): - - COLLECTOR_CLASS = BluelivMapCollector - - EXPECTED_PROP_KWARGS = { - 'timestamp': AnyInstanceOf(int), - 'message_id': AnyMatchingRegex(r'\A[0-9a-f]{32}\Z'), - 'type': 'blacklist', - 'content_type': 'application/json', - 'headers': {}, - } - - def _perform_test(self, - config_content, - orig_data, - expected_download_calls, - expected_publish_output_calls, - **kwargs): - - download_mock = self.patch_object( - BaseDownloadingCollector, - 'download', - return_value=orig_data) - collector = self.prepare_collector( - self.COLLECTOR_CLASS, - config_content=config_content) - - collector.run_collection() - self.assertEqual(download_mock.mock_calls, expected_download_calls) - self.assertEqual(self.publish_output_mock.mock_calls, expected_publish_output_calls) - - @paramseq - def cases(cls): - yield param( - config_content=''' - [BluelivMapCollector] - base_url = https://example.com/ - token = a1a1a1a1a1a1a1a1a1 - endpoint_name = some-endpoint/name1 - download_retries = 1 - ''', - orig_data=json.dumps( - { - "name1": [ - { - "url": "http://example-url1.com", - "type": "MALWARE", - "country": "ES", - "status": "ONLINE", - "latitude": 1.1, - "longitude": -1.1, - "ip": "1.1.1.1", - "updatedAt": "2023-03-29T12:00:02+0000", - "firstSeenAt": "2022-12-30T10:10:10+0000", - "lastSeenAt": "2023-03-29T11:55:52+0000", - }, - ], - }, sort_keys=True).encode('utf-8'), - expected_download_calls=[ - call( - method='GET', - url='https://example.com/some-endpoint/name1', - custom_request_headers={ - 'Content-Type': 'application/json', - 'Authorization': 'bearer a1a1a1a1a1a1a1a1a1', - 'User-Agent': 'SDK v2', - 'X-API-CLIENT': 'a1a1a1a1a1a1a1a1a1', - 'Accept-Encoding': 'gzip, deflate', - }, - ), - ], - expected_publish_output_calls=[ - call( - # routing_key - 'blueliv.map', - - # body - json.dumps([ - { - "url": "http://example-url1.com", - "type": "MALWARE", - "country": "ES", - "status": "ONLINE", - "latitude": 1.1, - "longitude": -1.1, - "ip": "1.1.1.1", - "updatedAt": "2023-03-29T12:00:02+0000", - "firstSeenAt": "2022-12-30T10:10:10+0000", - "lastSeenAt": "2023-03-29T11:55:52+0000", - }, - ], sort_keys=True).encode('utf-8'), - - cls.EXPECTED_PROP_KWARGS, - ), - ], - ).label('ok case') - - yield param( - config_content=''' - [BluelivMapCollector] - base_url = https://example.com/ - token = a1a1a1a1a1a1a1a1a1 - endpoint_name = some-endpoint/name1 - download_retries = 1 - ''', - orig_data=json.dumps( - { - "wrong_endpoint": [ - { - "url": "http://example-url1.com", - "type": "MALWARE", - "country": "ES", - "status": "ONLINE", - "latitude": 1.1, - "longitude": -1.1, - "ip": "1.1.1.1", - "updatedAt": "2023-03-29T12:00:02+0000", - "firstSeenAt": "2022-12-30T10:10:10+0000", - "lastSeenAt": "2023-03-29T11:55:52+0000" - }, - ], - }, sort_keys=True).encode('utf-8'), - expected_download_calls=[ - call( - method='GET', - url='https://example.com/some-endpoint/name1', - custom_request_headers={ - 'Content-Type': 'application/json', - 'Authorization': 'bearer a1a1a1a1a1a1a1a1a1', - 'User-Agent': 'SDK v2', - 'X-API-CLIENT': 'a1a1a1a1a1a1a1a1a1', - 'Accept-Encoding': 'gzip, deflate', - }, - ), - ], - expected_publish_output_calls=[], - ).label('No key specified with `endpoint_name` option') - - @foreach(cases) - def test(self, **kwargs): - self._perform_test(**kwargs) diff --git a/N6DataSources/n6datasources/tests/collectors/test_darklist_de.py b/N6DataSources/n6datasources/tests/collectors/test_darklist_de.py deleted file mode 100644 index 4674c20..0000000 --- a/N6DataSources/n6datasources/tests/collectors/test_darklist_de.py +++ /dev/null @@ -1,87 +0,0 @@ -# Copyright (c) 2020-2023 NASK. All rights reserved. - -from unittest.mock import ( - ANY, - call, -) - -from unittest_expander import ( - expand, - foreach, - param, - paramseq, -) - -from n6datasources.collectors.darklist_de import DarklistDeBlCollector -from n6datasources.collectors.base import BaseDownloadingCollector -from n6datasources.tests.collectors._collector_test_helpers import BaseCollectorTestCase - - -@expand -class TestDarklistDeBlCollector(BaseCollectorTestCase): - - COLLECTOR_CLASS = DarklistDeBlCollector - - def _perform_test(self, - config_content, - orig_data, - expected_output, - **kwargs): - - self.patch_object(BaseDownloadingCollector, 'download', return_value=orig_data) - collector = self.prepare_collector( - self.COLLECTOR_CLASS, - config_content=config_content) - - collector.run_collection() - - self.assertEqual(self.publish_output_mock.mock_calls, expected_output) - - - @paramseq - def cases(): - yield param( - config_content=''' - [DarklistDeBlCollector] - url=https://www.example.com - download_retries=1 - ''', - orig_data=( - b"# darklist.de - blacklisted raw IPs\n" - b"# generated on 21.04.2020 08:15\n" - b"\n" - b"1.1.1.0/24\n" - b"2.2.2.0/24\n" - b"3.3.3.0/24\n" - b"4.4.4.4\n" - b"5.5.5.5\n" - b"6.6.6.6\n" - ), - expected_output=[ - call( - 'darklist-de.bl', - ( - b"# darklist.de - blacklisted raw IPs\n" - b"# generated on 21.04.2020 08:15\n" - b"\n" - b"1.1.1.0/24\n" - b"2.2.2.0/24\n" - b"3.3.3.0/24\n" - b"4.4.4.4\n" - b"5.5.5.5\n" - b"6.6.6.6\n" - ), - { - 'timestamp': ANY, - 'message_id': ANY, - 'type': 'blacklist', - 'content_type': 'text/plain', - 'headers': {}, - }, - ) - ], - ) - - @foreach(cases) - def test(self, **kwargs): - self._perform_test(**kwargs) diff --git a/N6DataSources/n6datasources/tests/collectors/test_misp.py b/N6DataSources/n6datasources/tests/collectors/test_misp.py index 6029750..41df111 100644 --- a/N6DataSources/n6datasources/tests/collectors/test_misp.py +++ b/N6DataSources/n6datasources/tests/collectors/test_misp.py @@ -1,10 +1,12 @@ # Copyright (c) 2017-2023 NASK. All rights reserved. import contextlib +import copy import datetime import inspect import json import re +import unittest from collections.abc import ( Callable, Generator, @@ -19,6 +21,7 @@ from urllib.parse import urlsplit import requests +from dateutil.tz import gettz from unittest_expander import ( expand, foreach, @@ -38,6 +41,7 @@ AnyInstanceOf, AnyMatchingRegex, JSONWhoseContentIsEqualTo, + TestCaseMixin, ) @@ -1362,3 +1366,153 @@ def test__do_publish_sample(self): 'samples_last_proc_datetime': datetime.datetime(2017, 2, 9, 12), 'already_processed_sample_ids': {314159}, } + + +@expand +class TestMispCollector_adjust_state_from_py2_pickle(TestCaseMixin, unittest.TestCase): # noqa + + def setUp(self): + self.collector = object.__new__(MispCollector) + self.patch('n6datasources.collectors.misp.gettz', lambda: gettz('Europe/Warsaw')) + + + _EXAMPLE_PY2_STATE = { + 'events_publishing_datetime': datetime.datetime(2024, 1, 14, 22, 30, 59), + 'samples_publishing_datetime': datetime.datetime(2024, 1, 14, 22, 30, 59), + 'last_published_samples': [1, 2, 3], + } + _EXAMPLE_PY3_STATE = { + 'events_last_proc_datetime': datetime.datetime(2024, 1, 14, 21, 30, 59), + 'samples_last_proc_datetime': datetime.datetime(2024, 1, 14, 21, 30, 59), + 'already_processed_sample_ids': {1, 2, 3}, + } + + @foreach( + param( + py2_state=_EXAMPLE_PY2_STATE, + expected_py3_state=_EXAMPLE_PY3_STATE, + ).label('Winter time'), + + param( + py2_state=_EXAMPLE_PY2_STATE | { + 'events_publishing_datetime': datetime.datetime(2024, 3, 31, 3, 30, 59), + 'samples_publishing_datetime': datetime.datetime(2024, 3, 31, 1, 45, 1), + }, + expected_py3_state=_EXAMPLE_PY3_STATE | { + 'events_last_proc_datetime': datetime.datetime(2024, 3, 31, 1, 30, 59), + 'samples_last_proc_datetime': datetime.datetime(2024, 3, 31, 0, 45, 1), + }, + ).label('Winter-to-summer-time transition (DST start)'), + + param( + py2_state=_EXAMPLE_PY2_STATE | { + 'events_publishing_datetime': datetime.datetime(2023, 8, 14, 13, 30, 59), + 'samples_publishing_datetime': datetime.datetime(2023, 8, 14, 1, 45, 1), + }, + expected_py3_state=_EXAMPLE_PY3_STATE | { + 'events_last_proc_datetime': datetime.datetime(2023, 8, 14, 11, 30, 59), + 'samples_last_proc_datetime': datetime.datetime(2023, 8, 13, 23, 45, 1), + }, + ).label('Summer time'), + + param( + py2_state=_EXAMPLE_PY2_STATE | { + 'events_publishing_datetime': datetime.datetime(2023, 10, 29, 3, 30, 59), + 'samples_publishing_datetime': datetime.datetime(2023, 10, 29, 2, 45, 1), + }, + expected_py3_state=_EXAMPLE_PY3_STATE | { + 'events_last_proc_datetime': datetime.datetime(2023, 10, 29, 2, 30, 59), + 'samples_last_proc_datetime': datetime.datetime(2023, 10, 29, 0, 45, 1), + }, + ).label('Summer-to-winter-time transition (DST end)'), + ) + def test_ok(self, py2_state, expected_py3_state): + py2_state = copy.deepcopy(py2_state) # (<- just defensive programming) + + py3_state = self.collector.adjust_state_from_py2_pickle(py2_state) + + assert py3_state == expected_py3_state + + + _EX = _EXAMPLE_PY2_STATE + + @foreach( + param( + py2_state=_EX | {'illegal_key': 42}, + expected_error_msg=( + "unexpected set of Py2 state keys: " + "'events_publishing_datetime', 'samples_publishing_datetime', " + "'last_published_samples', 'illegal_key'" + ), + ), + param( + py2_state={ + k: v for k, v in _EX.items() + if k != 'samples_publishing_datetime' + }, + expected_error_msg=( + "unexpected set of Py2 state keys: " + "'events_publishing_datetime', 'last_published_samples'" + ), + ), + param( + py2_state=_EXAMPLE_PY3_STATE, + expected_error_msg=( + "unexpected set of Py2 state keys: " + "'events_last_proc_datetime', 'samples_last_proc_datetime', " + "'already_processed_sample_ids'" + ), + ), + param( + py2_state=_EX | {'events_publishing_datetime': 42}, + expected_error_msg=( + "unexpected type(py2_state['events_publishing_datetime'])=" + ), + ), + param( + py2_state=_EX | {'samples_publishing_datetime': '2023-10-29T02:45:01'}, + expected_error_msg=( + "unexpected type(py2_state['samples_publishing_datetime'])=" + ), + ), + param( + py2_state=_EX | {'last_published_samples': {1, 2, 3}}, + expected_error_msg=( + "unexpected type(py2_state['last_published_samples'])=" + ), + ), + param( + py2_state=_EX | {'last_published_samples': [1, '2', 3]}, + expected_error_msg=( + "unexpected non-int value(s) found in " + "py2_state['last_published_samples']=[1, '2', 3]" + ), + ), + param( + py2_state=_EX | { + 'events_publishing_datetime': datetime.datetime(2024, 1, 14, 22, 30, 59, + tzinfo=datetime.timezone.utc), + }, + expected_error_msg=( + "unexpected non-None tzinfo of py2_state['events_publishing_datetime']=" + "datetime.datetime(2024, 1, 14, 22, 30, 59, tzinfo=datetime.timezone.utc)" + ), + ), + param( + py2_state=_EX | { + 'samples_publishing_datetime': datetime.datetime(2024, 1, 14, 22, 30, 59, + tzinfo=datetime.timezone.utc), + }, + expected_error_msg=( + "unexpected non-None tzinfo of py2_state['samples_publishing_datetime']=" + "datetime.datetime(2024, 1, 14, 22, 30, 59, tzinfo=datetime.timezone.utc)" + ), + ), + ) + def test_error_for_unexpected_py2_state_content(self, py2_state, expected_error_msg): + py2_state = copy.deepcopy(py2_state) # (<- just defensive programming) + with self.assertRaises(NotImplementedError) as exc_context: + + self.collector.adjust_state_from_py2_pickle(py2_state) + + assert str(exc_context.exception) == expected_error_msg diff --git a/N6DataSources/n6datasources/tests/parsers/_parser_test_mixin.py b/N6DataSources/n6datasources/tests/parsers/_parser_test_mixin.py index 64548d5..ba092af 100644 --- a/N6DataSources/n6datasources/tests/parsers/_parser_test_mixin.py +++ b/N6DataSources/n6datasources/tests/parsers/_parser_test_mixin.py @@ -73,12 +73,20 @@ class ParserTestMixin(TestCaseMixin): def test_basics(self): self.assertIn(self.PARSER_BASE_CLASS, self.PARSER_CLASS.__bases__) - source_from_default_binding_key = '.'.join(self.PARSER_CLASS.default_binding_key. - split(".")[:2]) + splitted_default_binding_key = self.PARSER_CLASS.default_binding_key.split(".") + source_from_default_binding_key = '.'.join(splitted_default_binding_key[:2]) self.assertEqual(source_from_default_binding_key, self.PARSER_SOURCE) self.assertEqual(self.PARSER_CLASS.constant_items, self.PARSER_CONSTANT_ITEMS) + # Check if default_binding_key have version tag in it + # default binding key looks like this: + # - .. e.g. 'abuse-ch.feodotracker.202110' + # - with RAW_FORMAT_VERSION_TAG only present when there is more than one version of given parser + if len(splitted_default_binding_key) == 3: + # assert that PARSER_RAW_FORMAT_VERSION_TAG is correct + self.assertEqual(self.PARSER_RAW_FORMAT_VERSION_TAG, + splitted_default_binding_key[2]) def test__input_callback(self): # (We want to make `LegacyQueuedBase.__new__()`'s stuff isolated diff --git a/N6DataSources/n6datasources/tests/parsers/test_abuse_ch.py b/N6DataSources/n6datasources/tests/parsers/test_abuse_ch.py index 471938d..82654ad 100644 --- a/N6DataSources/n6datasources/tests/parsers/test_abuse_ch.py +++ b/N6DataSources/n6datasources/tests/parsers/test_abuse_ch.py @@ -15,6 +15,7 @@ class TestAbuseChFeodotracker202110Parser(ParserTestMixin, unittest.TestCase): PARSER_SOURCE = 'abuse-ch.feodotracker' + PARSER_RAW_FORMAT_VERSION_TAG = '202110' PARSER_CLASS = AbuseChFeodoTracker202110Parser PARSER_BASE_CLASS = BaseParser PARSER_CONSTANT_ITEMS = { @@ -25,26 +26,26 @@ class TestAbuseChFeodotracker202110Parser(ParserTestMixin, unittest.TestCase): def cases(self): yield ( - b'2019-05-27 13:36:27,0.0.0.0,447,online,2019-05-28,TrickBot\n' + b'2019-05-27 13:36:27,0.0.0.1,447,online,2019-05-28,TrickBot\n' b'this, is, one, very, wrong, line\n' - b'2019-05-25 01:30:36,0.0.0.0,443,online,2019-05-27,Heodo\n' - b'2019-05-16 19:43:27,0.0.0.0,8080,online,2019-05-22,Heodo\n', + b'2019-05-25 01:30:36,0.0.0.1,443,online,2019-05-27,Heodo\n' + b'2019-05-16 19:43:27,0.0.0.1,8080,online,2019-05-22,Heodo\n', [ { 'name': 'trickbot', - 'address': [{'ip': '0.0.0.0'}], + 'address': [{'ip': '0.0.0.1'}], 'dport': 447, 'time': '2019-05-27 13:36:27', }, { 'name': 'heodo', - 'address': [{'ip': '0.0.0.0'}], + 'address': [{'ip': '0.0.0.1'}], 'dport': 443, 'time': '2019-05-25 01:30:36', }, { 'name': 'heodo', - 'address': [{'ip': '0.0.0.0'}], + 'address': [{'ip': '0.0.0.1'}], 'dport': 8080, 'time': '2019-05-16 19:43:27', }, @@ -57,9 +58,10 @@ def cases(self): ) -class TestAbuseChSSLBlacklists201902Parser(ParserTestMixin, unittest.TestCase): +class TestAbuseChSslBlacklists201902Parser(ParserTestMixin, unittest.TestCase): PARSER_SOURCE = 'abuse-ch.ssl-blacklist' + PARSER_RAW_FORMAT_VERSION_TAG = '201902' PARSER_CLASS = AbuseChSslBlacklist201902Parser PARSER_BASE_CLASS = BaseParser PARSER_CONSTANT_ITEMS = { @@ -101,6 +103,7 @@ def cases(self): class TestAbuseChUrlhausUrls202001Parser(ParserTestMixin, unittest.TestCase): PARSER_SOURCE = 'abuse-ch.urlhaus-urls' + PARSER_RAW_FORMAT_VERSION_TAG = '202001' PARSER_CLASS = AbuseChUrlhausUrls202001Parser PARSER_BASE_CLASS = BaseParser PARSER_CONSTANT_ITEMS = { diff --git a/N6DataSources/n6datasources/tests/parsers/test_base.py b/N6DataSources/n6datasources/tests/parsers/test_base.py index 7e0462a..e07679d 100644 --- a/N6DataSources/n6datasources/tests/parsers/test_base.py +++ b/N6DataSources/n6datasources/tests/parsers/test_base.py @@ -35,6 +35,7 @@ from n6lib.common_helpers import ( FilePagedSequence, PlainNamespace, + as_unicode, ipv4_to_str, ) from n6lib.config import ( @@ -1406,6 +1407,73 @@ def _parsed_content_and_expected_hash_base_cases_for__get_output_message_id(): b"{\"'\": \"'\", '\"': '\"', '\"\\'': '\"\\'', '\\'\"': '\\'\"', '\\\\': '\\\\', '\\x00': '\\n'}," # noqa b"{\"'\": \"'\", '\"': '\"', '\"\\'': '\"\\'', '\\'\"': '\\'\"', '\\\\': '\\\\', '\\x00': '\\n'}" # noqa ), + # containing the 'name' key + ( + { + 'source': 'foo.bar', + 'category': 'bots', + # pure ASCII str: + 'name': '\x01 ????.\x00tralala: ?', + }, + b'category,bots\n' + b'name,\x01 ????.\x00tralala: ?\n' + b'source,foo.bar' + ), + ( + { + 'source': 'foo.bar', + 'category': 'bots', + # pure ASCII bytes: + 'name': b'\x01 ????.\x00tralala: ?', + }, + b'category,bots\n' + b'name,\x01 ????.\x00tralala: ?\n' + b'source,foo.bar' + ), + ( + { + 'source': 'foo.bar', + 'category': 'cnc', + # non-ASCII str: + 'name': '\x01 żółć.\x00tralala: \U0010ffff', + }, + b'category,cnc\n' + b'name,\x01 \xc5\xbc\xc3\xb3\xc5\x82\xc4\x87.\x00tralala: \xf4\x8f\xbf\xbf\n' + b'source,foo.bar' + ), + ( + { + 'source': 'foo.bar', + 'category': 'cnc', + # non-ASCII (UTF-8) bytes: + 'name': b'\x01 \xc5\xbc\xc3\xb3\xc5\x82\xc4\x87.\x00tralala: \xf4\x8f\xbf\xbf', + }, + b'category,cnc\n' + b'name,\x01 \xc5\xbc\xc3\xb3\xc5\x82\xc4\x87.\x00tralala: \xf4\x8f\xbf\xbf\n' + b'source,foo.bar' + ), + ( + { + 'category': 'other', + 'source': b'foo.bar', + # non-ASCII str containing a surrogate: + 'name': '\x01 ŻÓŁĆ.\x00tralala: \udcdd', + }, + b'category,other\n' + b'name,\x01 \xc5\xbb\xc3\x93\xc5\x81\xc4\x86.\x00tralala: \xed\xb3\x9d\n' + b'source,foo.bar' + ), + ( + { + 'category': 'other', + 'source': b'foo.bar', + # non-ASCII (UTF-8-like) bytes containing a surrogate + # -- not a valid `name` => *not* being set! + 'name': b'\x01 \xc5\xbb\xc3\x93\xc5\x81\xc4\x86.\x00tralala: \xed\xb3\x9d', + }, + b'category,other\n' + b'source,foo.bar' + ), ] @foreach(_parsed_content_and_expected_hash_base_cases_for__get_output_message_id) @@ -1415,6 +1483,8 @@ class _RecordDict(RecordDict): optional_keys = RecordDict.optional_keys | {'key1', 'key2'} parser = BaseParser.__new__(BaseParser) record_dict = _RecordDict(parsed_content) + if 'name' in record_dict: + assert record_dict['name'] == as_unicode(parsed_content['name']) expected_result = hashlib.md5(expected_hash_base, usedforsecurity=False).hexdigest() result = parser.get_output_message_id(record_dict) @@ -1481,22 +1551,45 @@ class _RecordDict(RecordDict): @foreach( param( - parsed_content={'source': 'provider.channel'}, - expected_result=[('source', 'provider.channel')] + parsed_content={'category': 'cnc'}, + expected_result=[('category', 'cnc')] + ), + param( + parsed_content={ + 'source': 'provider.channel', + 'category': 'bots', + }, + expected_result=[ + ('category', 'bots'), + ('source', 'provider.channel'), + ] ), param( - parsed_content={'source': 'provider.channel', - 'time': '2023-01-10 11:12:13', - '_do_not_resolve_fqdn_to_ip': True, - '_group': 'whatever'}, + parsed_content={ + 'source': 'provider.channel', + 'time': '2023-01-10 11:12:13', + 'category': 'other', + '_do_not_resolve_fqdn_to_ip': True, + '_group': 'whatever', + }, expected_result=[ - ('source', 'provider.channel'), - ('time', '2023-01-10 11:12:13') + ('category', 'other'), + ('source', 'provider.channel'), + ('time', '2023-01-10 11:12:13') ] ) ) - def test__iter_output_id_base_items(self, parsed_content, expected_result): + @foreach( + param(add_nonascii_name=False), + param(add_nonascii_name=True), + ) + def test__iter_output_id_base_items(self, parsed_content, expected_result, add_nonascii_name): parsed = RecordDict(parsed_content) + if add_nonascii_name: + input_name = 20 * 'zażółć - jaźń!\n\x00\U0010ffff' + parsed['name'] = input_name + self.assertEqual(parsed['name'], input_name[:255]) + expected_result = expected_result + [('name', input_name[:255])] result = self.meth.iter_output_id_base_items(parsed) result_as_list = list(result) diff --git a/N6DataSources/n6datasources/tests/parsers/test_blueliv.py b/N6DataSources/n6datasources/tests/parsers/test_blueliv.py deleted file mode 100644 index e038bb7..0000000 --- a/N6DataSources/n6datasources/tests/parsers/test_blueliv.py +++ /dev/null @@ -1,179 +0,0 @@ -# Copyright (c) 2015-2023 NASK. All rights reserved. - -import datetime -import json -import unittest - -from n6datasources.parsers.base import BlackListParser -from n6datasources.parsers.blueliv import BluelivMapParser -from n6datasources.tests.parsers._parser_test_mixin import ParserTestMixin -from n6lib.datetime_helpers import parse_iso_datetime_to_utc -from n6lib.record_dict import BLRecordDict - - -class TestBluelivMapParser(ParserTestMixin, unittest.TestCase): - - RECORD_DICT_CLASS = BLRecordDict - - PARSER_SOURCE = 'blueliv.map' - PARSER_CLASS = BluelivMapParser - PARSER_BASE_CLASS = BlackListParser - PARSER_CONSTANT_ITEMS = { - 'restriction': 'public', - 'confidence': 'low', - '_do_not_resolve_fqdn_to_ip': True - } - message_expires = str(parse_iso_datetime_to_utc(ParserTestMixin.message_created) + - datetime.timedelta(days=2)) - - def cases(self): - yield ( - json.dumps( - [ - { - 'status': 'ONLINE', - 'url': 'http://www.example1.com', - 'country': 'US', - 'updatedAt': '2015-08-26T01:00:00+0000', - 'longitude': -11.1111, - 'firstSeenAt': '2015-05-24T04:03:22+0000', - 'lastSeenAt': '2015-08-26T00:55:02+0000', - 'ip': '1.1.1.1', - 'latitude': 11.1111, - 'type': 'MALWARE', - }, - { - 'status': 'ONLINE', - 'url': 'http://www.example2.com', - 'country': 'US', - 'updatedAt': '2015-08-26T01:00:04+0000', - 'longitude': -11.1111, - 'firstSeenAt': '2015-08-11T10:48:59+0000', - 'lastSeenAt': '2015-08-26T00:54:35+0000', - 'ip': '2.2.2.2', - 'latitude': 11.1111, - 'type': 'C_AND_C', - }, - { - 'status': 'ONLINE', - 'url': 'http://www.example3.com', - 'country': 'US', - 'updatedAt': '2015-08-26T01:00:04+0000', - 'longitude': -11.1111, - 'firstSeenAt': '2015-08-11T10:48:59+0000', - 'lastSeenAt': '2015-08-26T00:54:35+0000', - 'ip': '3.3.3.3', - 'latitude': 11.1111, - 'type': 'TOR_IP', - }, - { - 'status': 'ONLINE', - 'url': 'http://www.example4.com', - 'country': 'US', - 'updatedAt': '2015-08-26T01:00:04+0000', - 'longitude': -11.1111, - 'firstSeenAt': '2015-08-11T10:48:59+0000', - 'lastSeenAt': '2015-08-26T00:54:35+0000', - 'ip': '3.3.3.3', - 'latitude': 11.1111, - 'type': 'NEW_UNKNOWN', - }, - { - 'status': 'ONLINE', - 'url': 'http://www.example5.com', - 'country': 'US', - 'updatedAt': '2015-08-26T01:00:04+0000', - 'longitude': 11.1111, - 'firstSeenAt': '2015-08-11T10:48:59+0000', - 'lastSeenAt': '2015-08-26T00:54:35+0000', - 'ip': '3.3.3.3', - 'latitude': 38.0, - 'type': 'PHISHING', - }, - { - 'status': 'ONLINE', - 'url': 'http://www.example6.com', - 'country': 'US', - 'updatedAt': '2015-08-26T01:00:04+0000', - 'longitude': -11.1111, - 'firstSeenAt': '2015-06-01T12:00:07+0000', - 'lastSeenAt': '2015-08-26T00:55:04+0000', - 'ip': '4.4.4.4', - 'latitude': 11.1111, - 'type': 'BACKDOOR', - }, - { - 'status': 'ONLINE', - 'url': 'http://www.example7.com', - 'country': 'US', - 'updatedAt': '2015-08-26T01:00:08+0000', - 'longitude': -11.1111, - 'firstSeenAt': '2015-05-09T22:01:00+0000', - 'lastSeenAt': '2015-08-26T00:55:10+0000', - 'ip': '5.5.5.5', - 'latitude': 11.1111, - 'type': 'EXPLOIT_KIT', - }, - { - 'status': 'ONLINE', - 'url': 'http://www.example8.com', - 'lastSeenAt': '2015-09-04T13:43:21+0000', - 'firstSeenAt': '2015-08-17T08:03:23+0000', - 'updatedAt': '2015-09-04T13:49:08+0000', - 'type': 'MALWARE', - }, - ], - ).encode('utf-8'), - [ - dict( - self.get_bl_items(1, 6), - category='malurl', - name='binary', - address=[{'ip': '1.1.1.1'}], - url='http://www.example1.com', - time=self.message_created, - expires=self.message_expires, - ), - dict( - self.get_bl_items(2, 6), - category='cnc', - address=[{'ip': '2.2.2.2'}], - url='http://www.example2.com', - time=self.message_created, - expires=self.message_expires, - ), - dict( - self.get_bl_items(3, 6), - category='phish', - address=[{'ip': '3.3.3.3'}], - url='http://www.example5.com', - time=self.message_created, - expires=self.message_expires, - ), - dict( - self.get_bl_items(4, 6), - category='backdoor', - address=[{'ip': '4.4.4.4'}], - url='http://www.example6.com', - time=self.message_created, - expires=self.message_expires, - ), - dict( - self.get_bl_items(5, 6), - category='malurl', - name='exploit-kit', - address=[{'ip': '5.5.5.5'}], - url='http://www.example7.com', - time=self.message_created, - expires=self.message_expires, - ), - dict( - self.get_bl_items(6, 6), - category='malurl', - name='binary', - url='http://www.example8.com', - time=self.message_created, - expires=self.message_expires, - ), - ], - ) diff --git a/N6DataSources/n6datasources/tests/parsers/test_darklist_de.py b/N6DataSources/n6datasources/tests/parsers/test_darklist_de.py deleted file mode 100644 index 38fdd1d..0000000 --- a/N6DataSources/n6datasources/tests/parsers/test_darklist_de.py +++ /dev/null @@ -1,95 +0,0 @@ -# Copyright (c) 2020-2023 NASK. All rights reserved. - -import unittest - -from n6datasources.parsers.darklist_de import DarklistDeBlParser -from n6datasources.parsers.base import BlackListParser -from n6datasources.tests.parsers._parser_test_mixin import ParserTestMixin - -from n6lib.record_dict import BLRecordDict - - -class TestDarklistDeBlParser(ParserTestMixin, unittest.TestCase): - - RECORD_DICT_CLASS = BLRecordDict - - PARSER_SOURCE = 'darklist-de.bl' - PARSER_CLASS = DarklistDeBlParser - PARSER_BASE_CLASS = BlackListParser - PARSER_CONSTANT_ITEMS = { - 'restriction': 'public', - 'confidence': 'low', - 'category': 'scanning', - } - - ips_time = '2020-04-21 08:15:00' - - # This value should be changed alongside - # `EXPIRES_DAYS` variable in the parsers module - expires_time = '2020-04-28 08:15:00' - - def cases(self): - # Typical cases, we expect to yield 6 events - # (last one is not a valid IP record) - yield ( - ( - b"# darklist.de - blacklisted raw IPs\n" - b"# generated on 21.04.2020 08:15\n" - b"\n" - b"1.1.1.0/24\n" - b"2.2.2.0/24\n" - b"3.3.3.0/24\n" - b"4.4.4.4\n" - b"5.5.5.5\n" - b"6.6.6.6\n" - b"1111.1111.1111.1111 not_IP_record\n" - ), - [ - dict( - self.get_bl_items(1, 6, bl_current_time=self.ips_time), - time=self.ips_time, - address=[{'ip': "1.1.1.0"}], - ip_network='1.1.1.0/24', - expires=self.expires_time, - ), - dict( - self.get_bl_items(2, 6, bl_current_time=self.ips_time), - time=self.ips_time, - address=[{'ip': "2.2.2.0"}], - ip_network='2.2.2.0/24', - expires=self.expires_time, - ), - dict( - self.get_bl_items(3, 6, bl_current_time=self.ips_time), - time=self.ips_time, - address=[{'ip': "3.3.3.0"}], - ip_network='3.3.3.0/24', - expires=self.expires_time, - ), - dict( - self.get_bl_items(4, 6, bl_current_time=self.ips_time), - time=self.ips_time, - address=[{'ip': "4.4.4.4"}], - expires=self.expires_time, - ), - dict( - self.get_bl_items(5, 6, bl_current_time=self.ips_time), - time=self.ips_time, - address=[{'ip': "5.5.5.5"}], - expires=self.expires_time, - ), - dict( - self.get_bl_items(6, 6, bl_current_time=self.ips_time), - time=self.ips_time, - address=[{'ip': "6.6.6.6"}], - expires=self.expires_time, - ) - ], - ) - - # Invalid data - yield ( - b"# darklist.de - blacklisted raw IPs\n" - b"# generated on 21.04.2020 08:15\n", - ValueError - ) diff --git a/N6DataSources/n6datasources/tests/parsers/test_dataplane.py b/N6DataSources/n6datasources/tests/parsers/test_dataplane.py index 92243e2..9809ea4 100644 --- a/N6DataSources/n6datasources/tests/parsers/test_dataplane.py +++ b/N6DataSources/n6datasources/tests/parsers/test_dataplane.py @@ -39,7 +39,8 @@ def cases(self): b"# addresses seen in the current report. \n" b"#\n" b"174 | Example name 1 | 1.1.1.1 | 2021-05-20 20:17:02 | sshpwauth\n" - b"174 | Example name 2 | 2.2.2.2 | 2021-05-17 03:02:55 | telnetlogin \n" + # with a `,` in the `ASname` field + b"174 | Example, name 2 | 2.2.2.2 | 2021-05-17 03:02:55 | telnetlogin \n" b"174 | Example name 3 | wrong.ip.address | 2021-05-17 03:02:55 | telnetlogin \n", [ dict( @@ -63,7 +64,8 @@ def cases(self): b"# addresses seen in the current report. \n" b"#\n" b"174 | Example name 1 | 1.1.1.1 | 2021-05-20 20:17:02 | category\n" - b"174 | Example name 2 | 2.2.2.2 | 2021-05-17 03:02:55 | category \n" + # with a `,` in the `ASname` field + b"174 | Example, name 2 | 2.2.2.2 | 2021-05-17 03:02:55 | category \n" b"174 | Example name 3 | wrong.ip.address | 2021-05-17 03:02:55 | category \n", [ dict( diff --git a/N6DataSources/n6datasources/tests/parsers/test_malwarepatrol.py b/N6DataSources/n6datasources/tests/parsers/test_malwarepatrol.py index d89321a..be84f11 100644 --- a/N6DataSources/n6datasources/tests/parsers/test_malwarepatrol.py +++ b/N6DataSources/n6datasources/tests/parsers/test_malwarepatrol.py @@ -15,6 +15,7 @@ class TestMalwarepatrolMalurl201406Parser(ParserTestMixin, unittest.TestCase): RECORD_DICT_CLASS = BLRecordDict PARSER_SOURCE = 'malwarepatrol.malurl' + PARSER_RAW_FORMAT_VERSION_TAG = '201406' PARSER_CLASS = MalwarepatrolMalurl201406Parser PARSER_BASE_CLASS = BlackListParser PARSER_CONSTANT_ITEMS = { diff --git a/N6DataSources/n6datasources/tests/parsers/test_shadowserver.py b/N6DataSources/n6datasources/tests/parsers/test_shadowserver.py index bba7a44..01458fe 100644 --- a/N6DataSources/n6datasources/tests/parsers/test_shadowserver.py +++ b/N6DataSources/n6datasources/tests/parsers/test_shadowserver.py @@ -11,8 +11,8 @@ BlackListParser, ) from n6datasources.parsers.shadowserver import ( - ShadowserverVnc201412Parser, _BaseShadowserverParser, + ShadowserverVnc201412Parser, ShadowserverCompromisedWebsite201412Parser, ShadowserverIpmi201412Parser, ShadowserverChargen201412Parser, @@ -47,7 +47,6 @@ ShadowserverSinkholeHttp202203Parser, ShadowserverSinkhole202203Parser, ShadowserverDarknet202203Parser, - ShadowserverModbus202203Parser, ShadowserverIcs202204Parser, ShadowserverCoap202204Parser, ShadowserverUbiquiti202204Parser, @@ -68,15 +67,17 @@ ShadowserverExchange202204Parser, ShadowserverSmtp202204Parser, ShadowserverAmqp202204Parser, + ShadowserverMsmq202308Parser, ) from n6datasources.tests.parsers._parser_test_mixin import ParserTestMixin from n6lib.datetime_helpers import parse_iso_datetime_to_utc -class TestShadowserverIpmi201412Parse(ParserTestMixin, unittest.TestCase): +class TestShadowserverIpmi201412Parser(ParserTestMixin, unittest.TestCase): PARSER_SOURCE = 'shadowserver.ipmi' + PARSER_RAW_FORMAT_VERSION_TAG = '201412' PARSER_CLASS = ShadowserverIpmi201412Parser PARSER_BASE_CLASS = _BaseShadowserverParser PARSER_CONSTANT_ITEMS = { @@ -109,11 +110,12 @@ def cases(self): ) -class TestShadowserverCompromisedWebsiteParser(ParserTestMixin, unittest.TestCase): +class TestShadowserverCompromisedWebsite201412Parser(ParserTestMixin, unittest.TestCase): RECORD_DICT_CLASS = BLRecordDict PARSER_SOURCE = 'shadowserver.compromised-website' + PARSER_RAW_FORMAT_VERSION_TAG = '201412' PARSER_CLASS = ShadowserverCompromisedWebsite201412Parser PARSER_BASE_CLASS = BlackListParser PARSER_CONSTANT_ITEMS = { @@ -236,16 +238,65 @@ def cases(self): class TestShadowserverChargen201412Parser(ParserTestMixin, unittest.TestCase): PARSER_SOURCE = 'shadowserver.chargen' + PARSER_RAW_FORMAT_VERSION_TAG = '201412' PARSER_CLASS = ShadowserverChargen201412Parser PARSER_BASE_CLASS = _BaseShadowserverParser PARSER_CONSTANT_ITEMS = { 'restriction': 'need-to-know', 'confidence': 'medium', - 'category': 'amplifier', - 'name': 'chargen', } def cases(self): + yield ( + b'"timestamp","ip","protocol","port","hostname","tag","size","asn",' + b'"geo","region","city"\n' + + # we have cve match in tag field -> we yield 2 events + # cve (all characters lowered) + b'"2014-03-24 04:16:38","1.1.1.1","udp","19",' + b'"example.pl","cve-2000-111111","","11111","PL",' + b'"EXAMPLE_LOCATION_1","EXAMPLE_LOCATION_2"\n' + + # we have cve match in tag field -> we yield 2 events + b'"2014-03-24 04:16:38","2.2.2.2","udp","19",' + b'"example.pl","CVE-2000-222222","","22222","PL",' + b'"EXAMPLE_LOCATION_3","EXAMPLE_LOCATION_4"\n' + , + [ + dict( + category='amplifier', + name='chargen', + time='2014-03-24 04:16:38', + address=[{'ip': '1.1.1.1'}], + dport=19, + proto='udp', + ), + dict( + category='vulnerable', + name='cve-2000-111111', + time='2014-03-24 04:16:38', + address=[{'ip': '1.1.1.1'}], + dport=19, + proto='udp', + ), + dict( + category='amplifier', + name='chargen', + time='2014-03-24 04:16:38', + address=[{'ip': '2.2.2.2'}], + dport=19, + proto='udp', + ), + dict( + category='vulnerable', + name='cve-2000-222222', + time='2014-03-24 04:16:38', + address=[{'ip': '2.2.2.2'}], + dport=19, + proto='udp', + ), + ] + ) yield ( b'"timestamp","ip","protocol","port","hostname","tag","size","asn",' b'"geo","region","city"\n' @@ -255,12 +306,13 @@ def cases(self): , [ dict( + category='amplifier', + name='chargen', time='2014-03-24 04:16:38', address=[{'ip': '1.1.1.1'}], dport=19, proto='udp', ), - ] ) @@ -268,6 +320,7 @@ def cases(self): class TestShadowserverMemcached201412Parser(ParserTestMixin, unittest.TestCase): PARSER_SOURCE = 'shadowserver.memcached' + PARSER_RAW_FORMAT_VERSION_TAG = '201412' PARSER_CLASS = ShadowserverMemcached201412Parser PARSER_BASE_CLASS = _BaseShadowserverParser PARSER_CONSTANT_ITEMS = { @@ -297,9 +350,10 @@ def cases(self): ) -class TestShadowserverMongodbParser(ParserTestMixin, unittest.TestCase): +class TestShadowserverMongodb201412Parser(ParserTestMixin, unittest.TestCase): PARSER_SOURCE = 'shadowserver.mongodb' + PARSER_RAW_FORMAT_VERSION_TAG = '201412' PARSER_CLASS = ShadowserverMongodb201412Parser PARSER_BASE_CLASS = _BaseShadowserverParser PARSER_CONSTANT_ITEMS = { @@ -333,6 +387,7 @@ def cases(self): class TestShadowserverNatpmp201412Parser(ParserTestMixin, unittest.TestCase): PARSER_SOURCE = 'shadowserver.natpmp' + PARSER_RAW_FORMAT_VERSION_TAG = '201412' PARSER_CLASS = ShadowserverNatpmp201412Parser PARSER_BASE_CLASS = _BaseShadowserverParser PARSER_CONSTANT_ITEMS = { @@ -362,13 +417,12 @@ def cases(self): class TestShadowserverMssql201412Parser(ParserTestMixin, unittest.TestCase): PARSER_SOURCE = 'shadowserver.mssql' + PARSER_RAW_FORMAT_VERSION_TAG = '201412' PARSER_CLASS = ShadowserverMssql201412Parser PARSER_BASE_CLASS = _BaseShadowserverParser PARSER_CONSTANT_ITEMS = { 'restriction': 'need-to-know', 'confidence': 'medium', - 'category': 'amplifier', - 'name': 'mssql', } def cases(self): @@ -381,6 +435,8 @@ def cases(self): b'"WHATEVER","INSERTGT",2283,"\\WHATEVER\\example",310,"6.89"\n', [ dict( + category='amplifier', + name='mssql', time='2015-03-14 06:38:42', address=[{'ip': '1.1.1.1'}], dport=1434, @@ -388,18 +444,79 @@ def cases(self): ), ] ) + yield ( + b'"timestamp","ip","protocol","port","hostname","tag","version","asn","geo","region",' + b'"city","naics","sic","server_name","instance_name","tcp_port","named_pipe",' + b'"response_length","amplification"\n' + + b'"2015-03-14 06:38:42","1.1.1.1","udp",1434,"example.pl",' + b'"mssql","10.10.2500.10",11111,"PL","ExampleLoc1","ExampleLoc2",111111,222222,' + b'"WHATEVER","INSERTGT",2283,"\\WHATEVER\\example",310,"6.89"\n' + + # we have cve match in tag field -> we yield 2 events + # cve (all characters lowered) + b'"2015-03-14 06:38:42","2.2.2.2","udp",1434,"example.pl",' + b'"cve-2000-111111","10.10.2500.10",22222,"PL","ExampleLoc3","ExampleLoc4",333333,444444,' + b'"WHATEVER","INSERTGT",2283,"\\WHATEVER\\example",310,"6.89"\n' + + # we have cve match in tag field -> we yield 2 events + b'"2015-03-14 06:38:42","3.3.3.3","udp",1434,"example.pl",' + b'"cve-2000-222222","10.10.2500.10",33333,"PL","ExampleLoc5","ExampleLoc6",555555,666666,' + b'"WHATEVER","INSERTGT",2283,"\\WHATEVER\\example",310,"6.89"\n', + [ + dict( + category='amplifier', + name='mssql', + time='2015-03-14 06:38:42', + address=[{'ip': '1.1.1.1'}], + dport=1434, + proto='udp', + ), + dict( + category='amplifier', + name='mssql', + time='2015-03-14 06:38:42', + address=[{'ip': '2.2.2.2'}], + dport=1434, + proto='udp', + ), + dict( + category='vulnerable', + name='cve-2000-111111', + time='2015-03-14 06:38:42', + address=[{'ip': '2.2.2.2'}], + dport=1434, + proto='udp', + ), + dict( + category='amplifier', + name='mssql', + time='2015-03-14 06:38:42', + address=[{'ip': '3.3.3.3'}], + dport=1434, + proto='udp', + ), + dict( + category='vulnerable', + name='cve-2000-222222', + time='2015-03-14 06:38:42', + address=[{'ip': '3.3.3.3'}], + dport=1434, + proto='udp', + ), + ] + ) class TestShadowserverNetbios201412Parser(ParserTestMixin, unittest.TestCase): PARSER_SOURCE = 'shadowserver.netbios' + PARSER_RAW_FORMAT_VERSION_TAG = '201412' PARSER_CLASS = ShadowserverNetbios201412Parser PARSER_BASE_CLASS = _BaseShadowserverParser PARSER_CONSTANT_ITEMS = { 'restriction': 'need-to-know', 'confidence': 'medium', - 'category': 'amplifier', - 'name': 'netbios', } def cases(self): @@ -412,6 +529,8 @@ def cases(self): , [ dict( + category='amplifier', + name='netbios', time='2014-04-22 00:12:57', address=[{'ip': '1.1.1.1'}], dport=137, @@ -422,10 +541,66 @@ def cases(self): ] ) + yield ( + b'"timestamp","ip","protocol","port","hostname","tag","mac_address","asn","geo",' + b'"region","city","workgroup","machine_name","username"\n' + + # we have cve match in tag field -> we yield 2 events + # cve (all characters lowered) + b'"2014-04-22 00:12:57","1.1.1.1","udp",137,"example.pl","cve-2000-11111",' + b'"00-00-00-00-00-00",111111,"PL","ExampleLoc1","ExampleLoc2","WORKGROUP",' + b'"Example-ABC12345",\n' + + # we have cve match in tag field -> we yield 2 events + b'"2014-04-22 00:12:57","2.2.2.2","udp",137,"example.pl","CVE-2000-22222",' + b'"00-00-00-00-00-00",222222,"PL","ExampleLoc2","ExampleLoc3","WORKGROUP",' + b'"Example-ABC12345",\n' + , + [ + dict( + category='amplifier', + name='netbios', + time='2014-04-22 00:12:57', + address=[{'ip': '1.1.1.1'}], + dport=137, + mac_address='00-00-00-00-00-00', + proto='udp', + ), + dict( + category='vulnerable', + name='cve-2000-11111', + time='2014-04-22 00:12:57', + address=[{'ip': '1.1.1.1'}], + dport=137, + mac_address='00-00-00-00-00-00', + proto='udp', + ), + dict( + category='amplifier', + name='netbios', + time='2014-04-22 00:12:57', + address=[{'ip': '2.2.2.2'}], + dport=137, + mac_address='00-00-00-00-00-00', + proto='udp', + ), + dict( + category='vulnerable', + name='cve-2000-22222', + time='2014-04-22 00:12:57', + address=[{'ip': '2.2.2.2'}], + dport=137, + mac_address='00-00-00-00-00-00', + proto='udp', + ), + ] + ) + class TestShadowserverNetis201412Parser(ParserTestMixin, unittest.TestCase): PARSER_SOURCE = 'shadowserver.netis' + PARSER_RAW_FORMAT_VERSION_TAG = '201412' PARSER_CLASS = ShadowserverNetis201412Parser PARSER_BASE_CLASS = _BaseShadowserverParser PARSER_CONSTANT_ITEMS = { @@ -454,13 +629,14 @@ def cases(self): class TestShadowserverNtpVersion201412Parser(ParserTestMixin, unittest.TestCase): PARSER_SOURCE = 'shadowserver.ntp-version' + PARSER_RAW_FORMAT_VERSION_TAG = '201412' PARSER_CLASS = ShadowserverNtpVersion201412Parser PARSER_BASE_CLASS = _BaseShadowserverParser PARSER_CONSTANT_ITEMS = { - 'restriction': 'need-to-know', - 'confidence': 'medium', 'category': 'amplifier', 'name': 'ntp', + 'restriction': 'need-to-know', + 'confidence': 'medium', } def cases(self): @@ -474,6 +650,36 @@ def cases(self): b'"2.2.2.2","0xABCDEF01.23456789","0.000","0.000",,,4,"UNIX",,10\n', [ dict( + category='amplifier', + name='ntp', + time='2014-03-24 02:14:37', + address=[{'ip': '1.1.1.1'}], + dport=123, + proto='udp', + ), + ] + ) + # this source still does not provide `tag` field, but we can accept it nevertheless + yield ( + b'"timestamp","ip","protocol","port","hostname","asn","geo","region","city","version",' + b'"clk_wander","clock","error","frequency","jitter","leap","mintc","noise","offset",' + b'"peer","phase","poll","precision","processor","refid","reftime","rootdelay",' + b'"rootdispersion","stability","state","stratum","system","tai","tc","tag"\n' + b'"2014-03-24 02:14:37","1.1.1.1","udp",123,,11111,"PL","ExampleLoc1","ExampleLoc2",' + b'4,"0.000","0x01234567.89ABCDEF",,"0.000","0.000",0,3,,"0.000",,,,"-10","unknown",' + b'"2.2.2.2","0xABCDEF01.23456789","0.000","0.000",,,4,"UNIX",,10, "cve-2000-11111\n', + [ + dict( + category='amplifier', + name='ntp', + time='2014-03-24 02:14:37', + address=[{'ip': '1.1.1.1'}], + dport=123, + proto='udp', + ), + dict( + category='vulnerable', + name='cve-2000-11111', time='2014-03-24 02:14:37', address=[{'ip': '1.1.1.1'}], dport=123, @@ -483,16 +689,16 @@ def cases(self): ) + class TestShadowserverQotd201412Parser(ParserTestMixin, unittest.TestCase): PARSER_SOURCE = 'shadowserver.qotd' + PARSER_RAW_FORMAT_VERSION_TAG = '201412' PARSER_CLASS = ShadowserverQotd201412Parser PARSER_BASE_CLASS = _BaseShadowserverParser PARSER_CONSTANT_ITEMS = { 'restriction': 'need-to-know', 'confidence': 'medium', - 'category': 'amplifier', - 'name': 'qotd', } def cases(self): @@ -503,6 +709,8 @@ def cases(self): b'"qotd","Example_Example" ??",1111,"PL",ExampleLoc","ExampleLoc"\n', [ dict( + category='amplifier', + name='qotd', time='2014-12-01 12:09:00', address=[{'ip': '1.1.1.1'}], dport=17, @@ -510,11 +718,70 @@ def cases(self): ), ] ) + yield ( + b'"timestamp","ip","protocol","port","hostname","tag","quote","asn","geo","region",' + b'"city"\n' + + b'"2014-12-01 12:09:00","1.1.1.1","udp",17,"example-host.example.com",' + b'"qotd","Example_Example" ??",1111,"PL",ExampleLoc","ExampleLoc"\n' + + # we have cve match in tag field -> we yield 2 events + # cve (all characters lowered) + b'"2014-12-01 12:09:00","2.2.2.2","udp",17,"example-host.example.com",' + b'"CVE-2020-222222","Example_Example" ??",2222,"PL",ExampleLoc","ExampleLoc"\n' + + # we have cve match in tag field -> we yield 2 events + b'"2014-12-01 12:09:00","3.3.3.3","udp",17,"example-host.example.com",' + b'"CVE-2020-333333","Example_Example" ??",3333,"PL",ExampleLoc","ExampleLoc"\n', + [ + dict( + category='amplifier', + name='qotd', + time='2014-12-01 12:09:00', + address=[{'ip': '1.1.1.1'}], + dport=17, + proto='udp', + ), + dict( + category='amplifier', + name='qotd', + time='2014-12-01 12:09:00', + address=[{'ip': '2.2.2.2'}], + dport=17, + proto='udp', + ), + dict( + category='vulnerable', + name='cve-2020-222222', + time='2014-12-01 12:09:00', + address=[{'ip': '2.2.2.2'}], + dport=17, + proto='udp', + ), + dict( + category='amplifier', + name='qotd', + time='2014-12-01 12:09:00', + address=[{'ip': '3.3.3.3'}], + dport=17, + proto='udp', + ), + dict( + category='vulnerable', + name='cve-2020-333333', + time='2014-12-01 12:09:00', + address=[{'ip': '3.3.3.3'}], + dport=17, + proto='udp', + ), + ] + ) class TestShadowserverRedis201412Parser(ParserTestMixin, unittest.TestCase): PARSER_SOURCE = 'shadowserver.redis' + PARSER_RAW_FORMAT_VERSION_TAG = '201412' PARSER_CLASS = ShadowserverRedis201412Parser PARSER_BASE_CLASS = _BaseShadowserverParser PARSER_CONSTANT_ITEMS = { @@ -548,6 +815,7 @@ def cases(self): class TestShadowserverSmb201412Parser(ParserTestMixin, unittest.TestCase): PARSER_SOURCE = 'shadowserver.smb' + PARSER_RAW_FORMAT_VERSION_TAG = '201412' PARSER_CLASS = ShadowserverSmb201412Parser PARSER_BASE_CLASS = _BaseShadowserverParser PARSER_CONSTANT_ITEMS = { @@ -602,6 +870,7 @@ def cases(self): class TestShadowserverSnmp201412Parser(ParserTestMixin, unittest.TestCase): PARSER_SOURCE = 'shadowserver.snmp' + PARSER_RAW_FORMAT_VERSION_TAG = '201412' PARSER_CLASS = ShadowserverSnmp201412Parser PARSER_BASE_CLASS = _BaseShadowserverParser PARSER_CONSTANT_ITEMS = { @@ -628,18 +897,44 @@ def cases(self): ), ] ) + # this source still does not provide `tag` field, but we can accept it nevertheless + yield ( + b'"timestamp","ip","protocol","port","hostname","sysdesc","sysname","asn","geo",' + b'"region","city","version","tag"\n' + b'"2014-03-24 04:13:12","1.1.1.1","udp","10448","1.1.1.1.example.com' + b'-example","EX-Example1234","","11111","PL",ExampleLoc","ExampleLoc","2","cve-2000-11111\n', + [ + dict( + time='2014-03-24 04:13:12', + address=[{'ip': '1.1.1.1'}], + dport=10448, + proto='udp', + sysdesc='EX-Example1234', + version='2', + ), + dict( + category='vulnerable', + name='cve-2000-11111', + time='2014-03-24 04:13:12', + address=[{'ip': '1.1.1.1'}], + dport=10448, + proto='udp', + sysdesc='EX-Example1234', + version='2', + ), + ] + ) -class TestShadowserverSsdpParser(ParserTestMixin, unittest.TestCase): +class TestShadowserverSsdp201412Parser(ParserTestMixin, unittest.TestCase): PARSER_SOURCE = 'shadowserver.ssdp' + PARSER_RAW_FORMAT_VERSION_TAG = '201412' PARSER_CLASS = ShadowserverSsdp201412Parser PARSER_BASE_CLASS = _BaseShadowserverParser PARSER_CONSTANT_ITEMS = { 'restriction': 'need-to-know', 'confidence': 'medium', - 'category': 'amplifier', - 'name': 'ssdp', } def cases(self): @@ -653,6 +948,8 @@ def cases(self): b'asdasdasasadevice",,,\n', [ dict( + category='amplifier', + name='ssdp', time='2014-12-02 09:12:54', address=[{'ip': '1.1.1.1'}], dport=1900, @@ -661,11 +958,82 @@ def cases(self): ), ] ) + yield ( + b'"timestamp","ip","protocol","port","hostname","tag","header","asn","geo","region",' + b'"city","systime","cache_control","location","server","search_target",' + b'"unique_service_name","host","nts","nt"\n' + + # we have cve match in tag field -> we yield 2 events + # cve (all characters lowered) + b'"2014-12-02 09:12:54","1.1.1.1","udp",1900,"example.pl",' + b'"cve-2000-111111","HTTP/1.1 200 OK",1111,"PL","SL","EXAMPLE_CITY",,"max-age=1200","http://1.1.1.1.' + b'example.com,"qwertyuiopASDFGHJKLzxcvbnm","upnp:rootdevice","' + b'asdasdasasadevice",,,\n' + + # we have cve match in tag field -> we yield 2 events + b'"2014-12-02 09:12:54","2.2.2.2","udp",1900,"example.pl",' + b'"CVE-2000-222222","HTTP/1.1 200 OK",2222,"PL","SL","EXAMPLE_CITY",,"max-age=1200","http://2.2.2.2.' + b'example.com,"qwertyuiopASDFGHJKLzxcvbnm","upnp:rootdevice","' + b'asdasdasasadevice",,,\n' + + b'"2014-12-02 09:12:54","3.3.3.3","udp",1900,"example.pl",' + b'"ssdp","HTTP/1.1 200 OK",33333,"PL","SL","EXAMPLE_CITY",,"max-age=1200","http://3.3.3.3.' + b'example.com,"qwertyuiopASDFGHJKLzxcvbnm","upnp:rootdevice","' + b'asdasdasasadevice",,,\n', + [ + dict( + category='amplifier', + name='ssdp', + time='2014-12-02 09:12:54', + address=[{'ip': '1.1.1.1'}], + dport=1900, + proto='udp', + header='HTTP/1.1 200 OK', + ), + dict( + category='vulnerable', + name='cve-2000-111111', + time='2014-12-02 09:12:54', + address=[{'ip': '1.1.1.1'}], + dport=1900, + proto='udp', + header='HTTP/1.1 200 OK', + ), + dict( + category='amplifier', + name='ssdp', + time='2014-12-02 09:12:54', + address=[{'ip': '2.2.2.2'}], + dport=1900, + proto='udp', + header='HTTP/1.1 200 OK', + ), + dict( + category='vulnerable', + name='cve-2000-222222', + time='2014-12-02 09:12:54', + address=[{'ip': '2.2.2.2'}], + dport=1900, + proto='udp', + header='HTTP/1.1 200 OK', + ), + dict( + category='amplifier', + name='ssdp', + time='2014-12-02 09:12:54', + address=[{'ip': '3.3.3.3'}], + dport=1900, + proto='udp', + header='HTTP/1.1 200 OK', + ), + ] + ) class TestShadowserverSslPoodle201412Parser(ParserTestMixin, unittest.TestCase): PARSER_SOURCE = 'shadowserver.ssl-poodle' + PARSER_RAW_FORMAT_VERSION_TAG = '201412' PARSER_CLASS = ShadowserverSslPoodle201412Parser PARSER_BASE_CLASS = _BaseShadowserverParser PARSER_CONSTANT_ITEMS = { @@ -710,6 +1078,7 @@ def cases(self): class TestShadowserverSandboxUrl201412Parser(ParserTestMixin, unittest.TestCase): PARSER_SOURCE = 'shadowserver.sandbox-url' + PARSER_RAW_FORMAT_VERSION_TAG = '201412' PARSER_CLASS = ShadowserverSandboxUrl201412Parser PARSER_BASE_CLASS = _BaseShadowserverParser PARSER_CONSTANT_ITEMS = { @@ -759,6 +1128,7 @@ def cases(self): class TestShadowserverOpenResolver201412Parser(ParserTestMixin, unittest.TestCase): PARSER_SOURCE = 'shadowserver.open-resolver' + PARSER_RAW_FORMAT_VERSION_TAG = '201412' PARSER_CLASS = ShadowserverOpenResolver201412Parser PARSER_BASE_CLASS = _BaseShadowserverParser PARSER_CONSTANT_ITEMS = { @@ -798,12 +1168,52 @@ def cases(self): ), ] ) + # this source still does not provide `tag` field, but we can accept it nevertheless + yield ( + b'"timestamp","ip","asn","geo","region","city","port","protocol",' + b'"hostname","min_amplification","dns_version","p0f_genre","p0f_detail","tag"\n' + + b'"2013-08-22 00:00:00","1.1.1.1",11111,"PL","ExampleLoc","ExampleLoc",53,"udp",' + b',"1.3810","DNS_VERSION",,,cve-2000-111111\n' + b'"2013-08-22 00:00:01","2.2.2.2",22222,"PL","ExampleLoc","ExampleLoc",53,"udp",' + b'"host.example.pl","1.3810",,,\n' + , + [ + dict( + address=[{'ip': '1.1.1.1'}], + dport=53, + proto='udp', + min_amplification='1.3810', + dns_version='DNS_VERSION', + time='2013-08-22 00:00:00', + ), + dict( + category='vulnerable', + name='cve-2000-111111', + address=[{'ip': '1.1.1.1'}], + dport=53, + proto='udp', + min_amplification='1.3810', + dns_version='DNS_VERSION', + time='2013-08-22 00:00:00', + ), + dict( + address=[{'ip': '2.2.2.2'}], + dport=53, + fqdn='host.example.pl', + proto='udp', + min_amplification='1.3810', + time='2013-08-22 00:00:01', + ), + ] + ) class TestShadowserverElasticsearch201412Parser(ParserTestMixin, unittest.TestCase): PARSER_SOURCE = 'shadowserver.elasticsearch' + PARSER_RAW_FORMAT_VERSION_TAG = '201412' PARSER_CLASS = ShadowserverElasticsearch201412Parser PARSER_BASE_CLASS = _BaseShadowserverParser PARSER_CONSTANT_ITEMS = { @@ -844,6 +1254,7 @@ def cases(self): class TestShadowserverSslFreak201412Parser(ParserTestMixin, unittest.TestCase): PARSER_SOURCE = 'shadowserver.ssl-freak' + PARSER_RAW_FORMAT_VERSION_TAG = '201412' PARSER_CLASS = ShadowserverSslFreak201412Parser PARSER_BASE_CLASS = _BaseShadowserverParser PARSER_CONSTANT_ITEMS = { @@ -933,6 +1344,7 @@ def cases(self): class TestShadowserverNtpMonitor201412Parser(ParserTestMixin, unittest.TestCase): PARSER_SOURCE = 'shadowserver.ntp-monitor' + PARSER_RAW_FORMAT_VERSION_TAG = '201412' PARSER_CLASS = ShadowserverNtpMonitor201412Parser PARSER_BASE_CLASS = _BaseShadowserverParser PARSER_CONSTANT_ITEMS = { @@ -954,8 +1366,37 @@ def cases(self): b'"2015-09-23 06:09:27","2.2.2.2","udp",123,"2-2-2-2.example.pl",' b'1,80,22222,"PL","ExampleLoc","ExampleLoc",0,0\n' - b'"2015-09-23 06:09:46","3.3.3.3","udp",123,"example.pl",11,' - b'33333,3333,"PL","ExampleLoc","ExampleLoc",111111,222222\n' + b'"2015-09-23 06:09:46","3.3.3.3","udp",123,"example.pl",11,' + b'33333,3333,"PL","ExampleLoc","ExampleLoc",111111,222222\n' + , + [ + dict( + time='2015-09-23 06:09:24', + address=[{'ip': '1.1.1.1'}, ], + proto='udp', + dport=123, + ), + dict( + time='2015-09-23 06:09:27', + address=[{'ip': '2.2.2.2'}, ], + proto='udp', + dport=123, + ), + dict( + time='2015-09-23 06:09:46', + address=[{'ip': '3.3.3.3'}, ], + proto='udp', + dport=123, + ), + ] + ) + # this source still does not provide `tag` field, but we can accept it nevertheless + yield ( + b'"timestamp","ip","protocol","port","hostname","packets","size","asn","geo","region",' + b'"city","naics","sic","tag"\n' + + b'"2015-09-23 06:09:24","1.1.1.1","udp",123,"example.pl",' + b'80,11111,1111,"PL","ExampleLoc","ExampleLoc",111111,222222,"cve-2000-11111"\n' , [ dict( @@ -965,14 +1406,10 @@ def cases(self): dport=123, ), dict( - time='2015-09-23 06:09:27', - address=[{'ip': '2.2.2.2'}, ], - proto='udp', - dport=123, - ), - dict( - time='2015-09-23 06:09:46', - address=[{'ip': '3.3.3.3'}, ], + category='vulnerable', + name='cve-2000-11111', + time='2015-09-23 06:09:24', + address=[{'ip': '1.1.1.1'}, ], proto='udp', dport=123, ), @@ -989,13 +1426,12 @@ def cases(self): class TestShadowserverPortmapper201412Parser(ParserTestMixin, unittest.TestCase): PARSER_SOURCE = 'shadowserver.portmapper' + PARSER_RAW_FORMAT_VERSION_TAG = '201412' PARSER_CLASS = ShadowserverPortmapper201412Parser PARSER_BASE_CLASS = _BaseShadowserverParser PARSER_CONSTANT_ITEMS = { 'restriction': 'need-to-know', 'confidence': 'medium', - 'category': 'amplifier', - 'name': 'portmapper', } MESSAGE_EXTRA_HEADERS = {'meta': {'mail_time': '2016-02-03 08:21:13'}} @@ -1008,24 +1444,73 @@ def cases(self): b'"portmapper",11111,"PL","ExampleLoc","ExampleLoc",0,0,"100000 2 111/udp; 100000 2 ' b'111/udp; 100003 2 100003 1 333/udp; 100004 1 333/udp; 100011 1 222/udp; 100011 2 ' b'333/udp; 100011 1 777/udp; 100011 2 777/udp;",789,"/ 3.3.3.3;"\n' + + # no proto at all, we assume it's udp + b'"2015-10-03 04:11:31","2.2.2.2","",111,"example.net",' + b'"portmapper",11111,"PL","ExampleLoc","ExampleLoc",0,0,"100000 2 111/udp; 100000 2 ' + b'111/udp; 100003 2 100003 1 333/udp; 100004 1 333/udp; 100011 1 222/udp; 100011 2 ' + b'333/udp; 100011 1 777/udp; 100011 2 777/udp;",789,"/ 3.3.3.3;"\n' + + # we have cve match in tag field -> we yield 2 events + # cve (all characters lowered) + b'"2015-10-03 04:11:32","4.4.4.4","udp",111,"example.net",' + b'"cve-2000-444444",444444,"PL","ExampleLoc","ExampleLoc",0,0,"100000 2 111/udp; 100000 2 ' + b'111/udp; 100004 1 54321/udp; 100004 1 56789/udp;",,\n' - b'"2015-10-03 04:11:32","2.2.2.2","udp",111,"example.net",' - b'"portmapper",44444,"PL","ExampleLoc","ExampleLoc",0,0,"100000 2 111/udp; 100000 2 ' + # we have cve match in tag field -> we yield 2 events + b'"2015-10-03 04:11:32","5.5.5.5","udp",111,"example.net",' + b'"CVE-2000-555555",555555,"PL","ExampleLoc","ExampleLoc",0,0,"100000 2 111/udp; 100000 2 ' b'111/udp; 100004 1 54321/udp; 100004 1 56789/udp;",,\n' , [ dict( + category='amplifier', + name='portmapper', time='2015-10-03 04:11:31', address=[{'ip': '1.1.1.1'}, ], proto='udp', dport=111, ), dict( - time='2015-10-03 04:11:32', + category='amplifier', + name='portmapper', + time='2015-10-03 04:11:31', address=[{'ip': '2.2.2.2'}, ], proto='udp', dport=111, ), + dict( + category='amplifier', + name='portmapper', + time='2015-10-03 04:11:32', + address=[{'ip': '4.4.4.4'}, ], + proto='udp', + dport=111, + ), + dict( + category='vulnerable', + name='cve-2000-444444', + time='2015-10-03 04:11:32', + address=[{'ip': '4.4.4.4'}, ], + proto='udp', + dport=111, + ), + dict( + category='amplifier', + name='portmapper', + time='2015-10-03 04:11:32', + address=[{'ip': '5.5.5.5'}, ], + proto='udp', + dport=111, + ), + dict( + category='vulnerable', + name='cve-2000-555555', + time='2015-10-03 04:11:32', + address=[{'ip': '5.5.5.5'}, ], + proto='udp', + dport=111, + ), ] ) yield ( @@ -1039,13 +1524,12 @@ def cases(self): class TestShadowserverMdns201412Parser(ParserTestMixin, unittest.TestCase): PARSER_SOURCE = 'shadowserver.mdns' + PARSER_RAW_FORMAT_VERSION_TAG = '201412' PARSER_CLASS = ShadowserverMdns201412Parser PARSER_BASE_CLASS = _BaseShadowserverParser PARSER_CONSTANT_ITEMS = { 'restriction': 'need-to-know', 'confidence': 'medium', - 'category': 'amplifier', - 'name': 'mdns', } MESSAGE_EXTRA_HEADERS = {'meta': {'mail_time': '2016-02-10 19:00:03'}} @@ -1060,23 +1544,70 @@ def cases(self): b'"2016-03-21 07:38:47","1.1.1.1","udp",5353,"example.com","mdns",11111,' b'"PL","Example","Example",0,0,,,,"_workstation._tcp.local.;",,,,,,,,,,,\n' - b'"2016-03-21 07:38:48","2.2.2.2","udp",5353,,"mdns",22222,"PL","ExampleLoc",' + # we have cve match in tag field -> we yield 2 events + # cve (all characters lowered) + b'"2016-03-21 07:38:48","2.2.2.2","udp",5353,,"cve-2000-222222",22222,"PL","ExampleLoc",' + b'"Example ExampleA",0,0,,,,"_workstation._tcp.local.; _http._tcp.local.; ' + b'_smb._tcp.local.; _qdiscover._tcp.local.;",,,,,,,,,,,\n' + + # we have cve match in tag field -> we yield 2 events + b'"2016-03-21 07:38:48","3.3.3.3","udp",5353,,"CVE-2000-333333",333333,"PL","ExampleLoc",' b'"Example ExampleA",0,0,,,,"_workstation._tcp.local.; _http._tcp.local.; ' b'_smb._tcp.local.; _qdiscover._tcp.local.;",,,,,,,,,,,\n' + + # protocol other than UDP (should be nevertheless acknowledged) + b'"2016-03-21 07:38:47","4.4.4.4","tcp",5353,"example.com","mdns",11111,' + b'"PL","Example","Example",0,0,,,,"_workstation._tcp.local.;",,,,,,,,,,,\n' , [ dict( + category='amplifier', + name='mdns', time='2016-03-21 07:38:47', address=[{'ip': '1.1.1.1'}], proto='udp', dport=5353, ), dict( + category='amplifier', + name='mdns', + time='2016-03-21 07:38:48', + address=[{'ip': '2.2.2.2'}, ], + proto='udp', + dport=5353, + ), + dict( + category='vulnerable', + name='cve-2000-222222', time='2016-03-21 07:38:48', address=[{'ip': '2.2.2.2'}, ], proto='udp', dport=5353, ), + dict( + category='amplifier', + name='mdns', + time='2016-03-21 07:38:48', + address=[{'ip': '3.3.3.3'}, ], + proto='udp', + dport=5353, + ), + dict( + category='vulnerable', + name='cve-2000-333333', + time='2016-03-21 07:38:48', + address=[{'ip': '3.3.3.3'}, ], + proto='udp', + dport=5353, + ), + dict( + category='amplifier', + name='mdns', + time='2016-03-21 07:38:47', + address=[{'ip': '4.4.4.4'}], + proto='tcp', + dport=5353, + ), ] ) yield ( @@ -1090,13 +1621,12 @@ def cases(self): class TestShadowserverXdmcp201412Parser(ParserTestMixin, unittest.TestCase): PARSER_SOURCE = 'shadowserver.xdmcp' + PARSER_RAW_FORMAT_VERSION_TAG = '201412' PARSER_CLASS = ShadowserverXdmcp201412Parser PARSER_BASE_CLASS = _BaseShadowserverParser PARSER_CONSTANT_ITEMS = { 'restriction': 'need-to-know', 'confidence': 'medium', - 'category': 'amplifier', - 'name': 'xdmcp', } MESSAGE_EXTRA_HEADERS = {'meta': {'mail_time': '2016-05-03 11:00:03'}} @@ -1109,29 +1639,65 @@ def cases(self): b'"xdmcp",111111,"PL","example","example",0,0,"example","example4238","0 user, ' b'load: 0.00, 0.00, 0.00",48\n' - b'"2016-07-21 02:10:43","1.1.1.1","udp",177,"1-1-1-1.example.net",' - b'"xdmcp",1111111,"PL","example","example",0,0,"example","linux-example",' + # we have cve match in tag field -> we yield 2 events + # cve (all characters lowered) + b'"2016-07-21 02:10:43","2.2.2.2","udp",177,"2.2.2.2.example.net",' + b'"cve-2000-222222",222222,"PL","example","example",0,0,"example","linux-example",' + b'"Linux 1.2.345-6.78-abc",49\n' + + # we have cve match in tag field -> we yield 2 events + b'"2016-07-21 02:10:43","3.3.3.3","udp",177,"3.3.3.3.example.net",' + b'"cve-2000-333333",333333,"PL","example","example",0,0,"example","linux-example",' b'"Linux 1.2.345-6.78-abc",49\n' - # 3rd case - protocol other than UDP (should be - # nevertheless acknowledged) + # protocol other than UDP (should be nevertheless acknowledged) b'"2016-07-21 02:10:43","1.1.1.1","tcp",177,"1-1-1-1.example.net",' b'"xdmcp",111111,"PL","example","example",0,0,"example","linux-example",' b'"Linux 1.2.345-6.78-abc",49\n', [ dict( + category='amplifier', + name='xdmcp', time='2016-07-21 02:10:01', address=[{'ip': '1.1.1.1'}], proto='udp', dport=177, ), dict( + category='amplifier', + name='xdmcp', time='2016-07-21 02:10:43', - address=[{'ip': '1.1.1.1'}], + address=[{'ip': '2.2.2.2'}], + proto='udp', + dport=177, + ), + dict( + category='vulnerable', + name='cve-2000-222222', + time='2016-07-21 02:10:43', + address=[{'ip': '2.2.2.2'}], + proto='udp', + dport=177, + ), + dict( + category='amplifier', + name='xdmcp', + time='2016-07-21 02:10:43', + address=[{'ip': '3.3.3.3'}], + proto='udp', + dport=177, + ), + dict( + category='vulnerable', + name='cve-2000-333333', + time='2016-07-21 02:10:43', + address=[{'ip': '3.3.3.3'}], proto='udp', dport=177, ), dict( + category='amplifier', + name='xdmcp', time='2016-07-21 02:10:43', address=[{'ip': '1.1.1.1'}], proto='tcp', @@ -1150,6 +1716,7 @@ def cases(self): class TestShadowserverDb2201412Parser(ParserTestMixin, unittest.TestCase): PARSER_SOURCE = 'shadowserver.db2' + PARSER_RAW_FORMAT_VERSION_TAG = '201412' PARSER_CLASS = ShadowserverDb2201412Parser PARSER_BASE_CLASS = _BaseShadowserverParser PARSER_CONSTANT_ITEMS = { @@ -1198,6 +1765,7 @@ def cases(self): class TestShadowserverRdp201412Parser(ParserTestMixin, unittest.TestCase): PARSER_SOURCE = 'shadowserver.rdp' + PARSER_RAW_FORMAT_VERSION_TAG = '201412' PARSER_CLASS = ShadowserverRdp201412Parser PARSER_BASE_CLASS = _BaseShadowserverParser PARSER_CONSTANT_ITEMS = { @@ -1288,6 +1856,7 @@ def cases(self): class TestShadowserverTftp201412Parser(ParserTestMixin, unittest.TestCase): PARSER_SOURCE = 'shadowserver.tftp' + PARSER_RAW_FORMAT_VERSION_TAG = '201412' PARSER_CLASS = ShadowserverTftp201412Parser PARSER_BASE_CLASS = _BaseShadowserverParser PARSER_CONSTANT_ITEMS = { @@ -1336,6 +1905,7 @@ def cases(self): class TestShadowserverIsakmp201412Parser(ParserTestMixin, unittest.TestCase): PARSER_SOURCE = 'shadowserver.isakmp' + PARSER_RAW_FORMAT_VERSION_TAG = '201412' PARSER_CLASS = ShadowserverIsakmp201412Parser PARSER_BASE_CLASS = _BaseShadowserverParser PARSER_CONSTANT_ITEMS = { @@ -1406,6 +1976,7 @@ def cases(self): class TestShadowserverTelnet201412Parser(ParserTestMixin, unittest.TestCase): PARSER_SOURCE = 'shadowserver.telnet' + PARSER_RAW_FORMAT_VERSION_TAG = '201412' PARSER_CLASS = ShadowserverTelnet201412Parser PARSER_BASE_CLASS = _BaseShadowserverParser PARSER_CONSTANT_ITEMS = { @@ -1450,6 +2021,7 @@ def cases(self): class TestShadowserverCwmp201412Parser(ParserTestMixin, unittest.TestCase): PARSER_SOURCE = 'shadowserver.cwmp' + PARSER_RAW_FORMAT_VERSION_TAG = '201412' PARSER_CLASS = ShadowserverCwmp201412Parser PARSER_BASE_CLASS = _BaseShadowserverParser PARSER_CONSTANT_ITEMS = { @@ -1498,6 +2070,7 @@ def cases(self): class TestShadowserverLdap201412Parser(ParserTestMixin, unittest.TestCase): PARSER_SOURCE = 'shadowserver.ldap' + PARSER_RAW_FORMAT_VERSION_TAG = '201412' PARSER_CLASS = ShadowserverLdap201412Parser PARSER_BASE_CLASS = _BaseShadowserverParser PARSER_CONSTANT_ITEMS = { @@ -1599,6 +2172,7 @@ def cases(self): class TestShadowserverVnc201412Parser(ParserTestMixin, unittest.TestCase): PARSER_SOURCE = 'shadowserver.vnc' + PARSER_RAW_FORMAT_VERSION_TAG = '201412' PARSER_CLASS = ShadowserverVnc201412Parser PARSER_BASE_CLASS = _BaseShadowserverParser PARSER_CONSTANT_ITEMS = { @@ -1639,6 +2213,7 @@ def cases(self): class TestShadowserverSinkholeHttp202203Parser(ParserTestMixin, unittest.TestCase): PARSER_SOURCE = 'shadowserver.sinkhole-http' + PARSER_RAW_FORMAT_VERSION_TAG = '202203' PARSER_CLASS = ShadowserverSinkholeHttp202203Parser PARSER_BASE_CLASS = _BaseShadowserverParser PARSER_CONSTANT_ITEMS = { @@ -1677,8 +2252,9 @@ def cases(self): ) -class TestShadowserverSinkholeParser(ParserTestMixin, unittest.TestCase): +class TestShadowserverSinkhole202203Parser(ParserTestMixin, unittest.TestCase): PARSER_SOURCE = 'shadowserver.sinkhole' + PARSER_RAW_FORMAT_VERSION_TAG = '202203' PARSER_CLASS = ShadowserverSinkhole202203Parser PARSER_BASE_CLASS = _BaseShadowserverParser PARSER_CONSTANT_ITEMS = { @@ -1714,6 +2290,7 @@ def cases(self): class TestShadowserverDarknet202203Parser(ParserTestMixin, unittest.TestCase): PARSER_SOURCE = 'shadowserver.darknet' + PARSER_RAW_FORMAT_VERSION_TAG = '202203' PARSER_CLASS = ShadowserverDarknet202203Parser PARSER_BASE_CLASS = _BaseShadowserverParser PARSER_CONSTANT_ITEMS = { @@ -1747,42 +2324,9 @@ def cases(self): ) -class TestShadowserverModbus202203Parser(ParserTestMixin, unittest.TestCase): - PARSER_SOURCE = 'shadowserver.modbus' - PARSER_CLASS = ShadowserverModbus202203Parser - PARSER_BASE_CLASS = _BaseShadowserverParser - PARSER_CONSTANT_ITEMS = { - 'restriction': 'need-to-know', - 'confidence': 'medium', - 'category': 'vulnerable', - 'name': 'modbus', - } - - def cases(self): - yield ( - b'"timestamp","ip","protocol","port","hostname","tag","asn","geo","region","city","naics",' - b'"sic","unit_id","vendor","revision","product_code","function_code","conformity_level",' - b'"object_count","response_length","raw_response","sector"\n' - b'"2022-02-20 02:30:40","1.1.1.1","tcp",1111,"1.1.1.1.example.com","modbus",11111,"PL",' - b'"EXAMPLE VOIVODESHIP","EXAMPLE CITY",,,0,"Example Company","v2.2","ABC DEF 1234",43,129,3,50,' - b'"AaaAa1a1a1aAAa1a1aaaaAAAaaaA11","Communications, Service Provider, and Hosting Service"' - , - [ - dict( - time='2022-02-20 02:30:40', - address=[{'ip': '1.1.1.1'}], - dport=1111, - proto='tcp', - vendor='Example Company', - revision='v2.2', - product_code='ABC DEF 1234', - ), - ] - ) - - class TestShadowserverIcs202204Parser(ParserTestMixin, unittest.TestCase): PARSER_SOURCE = 'shadowserver.ics' + PARSER_RAW_FORMAT_VERSION_TAG = '202204' PARSER_CLASS = ShadowserverIcs202204Parser PARSER_BASE_CLASS = _BaseShadowserverParser PARSER_CONSTANT_ITEMS = { @@ -1820,27 +2364,68 @@ def cases(self): class TestShadowserverCoap202204Parser(ParserTestMixin, unittest.TestCase): PARSER_SOURCE = 'shadowserver.coap' + PARSER_RAW_FORMAT_VERSION_TAG = '202204' PARSER_CLASS = ShadowserverCoap202204Parser PARSER_BASE_CLASS = _BaseShadowserverParser PARSER_CONSTANT_ITEMS = { 'restriction': 'need-to-know', 'confidence': 'medium', - 'category': 'amplifier', - 'name': 'coap', } def cases(self): yield ( b'"timestamp","ip","protocol","port","hostname","tag","asn","geo","region","city","naics","sic","response"\n' - b'"2021-05-18 01:28:59","1.1.1.1","udp",1111,,"coap",11111,"PL","STATE","CITY",,,"..."' + + b'"2021-05-18 01:28:59","1.1.1.1","udp",1111,,"coap",11111,"PL","STATE","CITY",,,"..."\n' + + # we have cve match in tag field -> we yield 2 events + # cve (all characters lowered) + b'"2021-05-18 01:28:59","2.2.2.2","udp",2222,,"cve-2000-222222",222222,"PL","STATE","CITY",,,"..."\n' + + # we have cve match in tag field -> we yield 2 events + b'"2021-05-18 01:28:59","3.3.3.3","udp",3333,,"CVE-2000-333333",333333,"PL","STATE","CITY",,,"..."\n' , [ dict( + category='amplifier', + name='coap', time='2021-05-18 01:28:59', address=[{'ip': '1.1.1.1'}], dport=1111, proto='udp', ), + dict( + category='amplifier', + name='coap', + time='2021-05-18 01:28:59', + address=[{'ip': '2.2.2.2'}], + dport=2222, + proto='udp', + ), + dict( + category='vulnerable', + name='cve-2000-222222', + time='2021-05-18 01:28:59', + address=[{'ip': '2.2.2.2'}], + dport=2222, + proto='udp', + ), + dict( + category='amplifier', + name='coap', + time='2021-05-18 01:28:59', + address=[{'ip': '3.3.3.3'}], + dport=3333, + proto='udp', + ), + dict( + category='vulnerable', + name='cve-2000-333333', + time='2021-05-18 01:28:59', + address=[{'ip': '3.3.3.3'}], + dport=3333, + proto='udp', + ), ] ) @@ -1848,29 +2433,72 @@ def cases(self): class TestShadowserverUbiquiti202204Parser(ParserTestMixin, unittest.TestCase): PARSER_SOURCE = 'shadowserver.ubiquiti' + PARSER_RAW_FORMAT_VERSION_TAG = '202204' PARSER_CLASS = ShadowserverUbiquiti202204Parser PARSER_BASE_CLASS = _BaseShadowserverParser PARSER_CONSTANT_ITEMS = { 'restriction': 'need-to-know', 'confidence': 'medium', - 'category': 'amplifier', - 'name': 'ubiquiti', } def cases(self): yield ( b'"timestamp","ip","protocol","port","hostname","tag","asn","geo","region","city",' b'"naics","sic","mac","radioname","essid","modelshort","modelfull","firmware","size"\n' - b'"2021-05-18 01:19:39","1.1.1.1","udp",1111,,"ubiquiti",11111,"PL","WIELKOPOLSKIE",' - b'"EXAMPLE",,,"111111111111","test1","test2","ABC",,"AA1.aa111.v1.0.1.1111.111111.1111",123' + + b'"2021-05-18 01:19:39","1.1.1.1","udp",1111,,"ubiquiti",11111,"PL","eXampleLoc",' + b'"EXAMPLE",,,"111111111111","test1","test2","ABC",,"AA1.aa111.v1.0.1.1111.111111.1111",123\n' + + # we have cve match in tag field -> we yield 2 events + # cve (all characters lowered) + b'"2021-05-18 01:19:39","2.2.2.2","udp",2222,,"cve-2000-222222",222222,"PL","eXampleLoc",' + b'"EXAMPLE",,,"222222222222","test1","test2","ABC",,"AA1.aa111.v1.0.1.1111.111111.1111",123\n' + + # we have cve match in tag field -> we yield 2 events + b'"2021-05-18 01:19:39","3.3.3.3","udp",3333,,"CVE-2000-333333",333333,"PL","eXampleLoc",' + b'"EXAMPLE",,,"333333333333","test1","test2","ABC",,"AA1.aa111.v1.0.1.1111.111111.1111",123\n' , [ dict( + category='amplifier', + name='ubiquiti', time='2021-05-18 01:19:39', address=[{'ip': '1.1.1.1'}], dport=1111, proto='udp', ), + dict( + category='amplifier', + name='ubiquiti', + time='2021-05-18 01:19:39', + address=[{'ip': '2.2.2.2'}], + dport=2222, + proto='udp', + ), + dict( + category='vulnerable', + name='cve-2000-222222', + time='2021-05-18 01:19:39', + address=[{'ip': '2.2.2.2'}], + dport=2222, + proto='udp', + ), + dict( + category='amplifier', + name='ubiquiti', + time='2021-05-18 01:19:39', + address=[{'ip': '3.3.3.3'}], + dport=3333, + proto='udp', + ), + dict( + category='vulnerable', + name='cve-2000-333333', + time='2021-05-18 01:19:39', + address=[{'ip': '3.3.3.3'}], + dport=3333, + proto='udp', + ), ] ) @@ -1878,29 +2506,72 @@ def cases(self): class TestShadowserverArd202204Parser(ParserTestMixin, unittest.TestCase): PARSER_SOURCE = 'shadowserver.ard' + PARSER_RAW_FORMAT_VERSION_TAG = '202204' PARSER_CLASS = ShadowserverArd202204Parser PARSER_BASE_CLASS = _BaseShadowserverParser PARSER_CONSTANT_ITEMS = { 'restriction': 'need-to-know', 'confidence': 'medium', - 'category': 'amplifier', - 'name': 'ard', } def cases(self): yield ( b'"timestamp","ip","protocol","port","hostname","tag","asn","geo",' b'"region","city","naics","sic","machine_name","response_size"\n' + b'"2021-05-18 10:20:52","1.1.1.1","udp",1111,"1.1.1.1.example.com",' - b'"ard",11111,"PL","SOMEWHERE","SOME CITY",111111,,"Serwer",1006' + b'"ard",11111,"PL","SOMEWHERE","SOME CITY",111111,,"Serwer",1006\n' + + # we have cve match in tag field -> we yield 2 events + # cve (all characters lowered) + b'"2021-05-18 10:20:52","2.2.2.2","udp",2222,"2.2.2.2.example.com",' + b'"cve-2000-222222",22222,"PL","SOMEWHERE","SOME CITY",222222,,"Serwer",1006\n' + + # we have cve match in tag field -> we yield 2 events + b'"2021-05-18 10:20:52","3.3.3.3","udp",3333,"3.3.3.3.example.com",' + b'"CVE-2000-333333",33333,"PL","SOMEWHERE","SOME CITY",333333,,"Serwer",1006\n' , [ dict( + category='amplifier', + name='ard', time='2021-05-18 10:20:52', address=[{'ip': '1.1.1.1'}], dport=1111, proto='udp', ), + dict( + category='amplifier', + name='ard', + time='2021-05-18 10:20:52', + address=[{'ip': '2.2.2.2'}], + dport=2222, + proto='udp', + ), + dict( + category='vulnerable', + name='cve-2000-222222', + time='2021-05-18 10:20:52', + address=[{'ip': '2.2.2.2'}], + dport=2222, + proto='udp', + ), + dict( + category='amplifier', + name='ard', + time='2021-05-18 10:20:52', + address=[{'ip': '3.3.3.3'}], + dport=3333, + proto='udp', + ), + dict( + category='vulnerable', + name='cve-2000-333333', + time='2021-05-18 10:20:52', + address=[{'ip': '3.3.3.3'}], + dport=3333, + proto='udp', + ), ] ) @@ -1908,27 +2579,68 @@ def cases(self): class TestShadowserverRdpeudp202204Parser(ParserTestMixin, unittest.TestCase): PARSER_SOURCE = 'shadowserver.rdpeudp' + PARSER_RAW_FORMAT_VERSION_TAG = '202204' PARSER_CLASS = ShadowserverRdpeudp202204Parser PARSER_BASE_CLASS = _BaseShadowserverParser PARSER_CONSTANT_ITEMS = { 'restriction': 'need-to-know', 'confidence': 'medium', - 'category': 'amplifier', - 'name': 'rdpeudp', } def cases(self): yield ( b'"timestamp","ip","protocol","port","hostname","tag","asn","geo","region","city","naics","sic","sessionid"\n' - b'"2021-05-18 13:18:31","1.1.1.1","udp",1111,"test.example.com","rdpeudp",11111,"PL","STATE","CITY",111111,,"01234567"' + + b'"2021-05-18 13:18:31","1.1.1.1","udp",1111,"test.example1.com","rdpeudp",11111,"PL","STATE","CITY",111111,,"01234567"\n' + + # we have cve match in tag field -> we yield 2 events + # cve (all characters lowered) + b'"2021-05-18 13:18:31","2.2.2.2","udp",2222,"test.example2.com","cve-2000-222222",22222,"PL","STATE","CITY",222222,,"01234567"\n' + + # we have cve match in tag field -> we yield 2 events + b'"2021-05-18 13:18:31","3.3.3.3","udp",3333,"test.example3.com","CVE-2000-333333",33333,"PL","STATE","CITY",333333,,"01234567"\n' , [ dict( + category='amplifier', + name='rdpeudp', time='2021-05-18 13:18:31', address=[{'ip': '1.1.1.1'}], dport=1111, proto='udp', ), + dict( + category='amplifier', + name='rdpeudp', + time='2021-05-18 13:18:31', + address=[{'ip': '2.2.2.2'}], + dport=2222, + proto='udp', + ), + dict( + category='vulnerable', + name='cve-2000-222222', + time='2021-05-18 13:18:31', + address=[{'ip': '2.2.2.2'}], + dport=2222, + proto='udp', + ), + dict( + category='amplifier', + name='rdpeudp', + time='2021-05-18 13:18:31', + address=[{'ip': '3.3.3.3'}], + dport=3333, + proto='udp', + ), + dict( + category='vulnerable', + name='cve-2000-333333', + time='2021-05-18 13:18:31', + address=[{'ip': '3.3.3.3'}], + dport=3333, + proto='udp', + ), ] ) @@ -1936,13 +2648,12 @@ def cases(self): class TestShadowserverDvrDhcpdiscover202204Parser(ParserTestMixin, unittest.TestCase): PARSER_SOURCE = 'shadowserver.dvr-dhcpdiscover' + PARSER_RAW_FORMAT_VERSION_TAG = '202204' PARSER_CLASS = ShadowserverDvrDhcpdiscover202204Parser PARSER_BASE_CLASS = _BaseShadowserverParser PARSER_CONSTANT_ITEMS = { 'restriction': 'need-to-know', 'confidence': 'medium', - 'category': 'amplifier', - 'name': 'dvr-dhcpdiscover', } def cases(self): @@ -1954,24 +2665,95 @@ def cases(self): b'"alarm_output_channels","remote_video_input_channels","mac_address","ipv4_address",' b'"ipv4_gateway","ipv4_subnet_mask","ipv4_dhcp_enable","ipv6_address","ipv6_link_local",' b'"ipv6_gateway","ipv6_dhcp_enable"\n' + b'"2022-04-20 13:29:33","1.1.1.1","udp",1111,"host-1-1-1-1.example.com","dvrdhcpdiscover",' - b'11111,"PL","STATE","CITY",,,,"Private","ABC","ABC1234-5DE6","1.111.111A001.0",' - b'"1234","1A111AAAAAA1AA1","ABC","Private","client.notifyDevInfo",80,22222,0,0,0,0,4,' + b'11111,"PL","STATE","CITY",,,,"Private","ABC","ABC1111-5DE6","1.111.111A001.0",' + b'"1111","1A111AAAAAA1AA1","ABC","Private","client.notifyDevInfo",80,22222,0,0,0,0,4,' b'"00:00:00:00:00:00","1.1.1.1","2.2.2.2","3.3.3.3",0,"/1",' - b'"0000::0000:0000:0000:0000/64",,' + b'"0000::0000:0000:0000:0000/64",,\n' + + # we have cve match in tag field -> we yield 2 events + # cve (all characters lowered) + b'"2022-04-20 13:29:33","4.4.4.4","udp",4444,"host-4-4-4-4.example.com","cve-2000-444444",' + b'4444,"PL","STATE","CITY",,,,"Private","ABC","ABC4444-5DE6","4.444.444A001.0",' + b'"4444","1A111AAAAAA1AA1","ABC","Private","client.notifyDevInfo",80,4444,0,0,0,0,4,' + b'"00:00:00:00:00:00","4.4.4.4","5.5.5.5","6.6.6.6",0,"/1",' + b'"0000::0000:0000:0000:0000/64",,\n' + + # we have cve match in tag field -> we yield 2 events + b'"2022-04-20 13:29:33","7.7.7.7","udp",7777,"host-7-7-7-7.example.com","CVE-2000-777777",' + b'11111,"PL","STATE","CITY",,,,"Private","ABC","ABC7777-5DE6","7.777.777A001.0",' + b'"7777","1A111AAAAAA1AA1","ABC","Private","client.notifyDevInfo",80,7777,0,0,0,0,4,' + b'"00:00:00:00:00:00","7.7.7.7","8.8.8.8","9.9.9.9",0,"/1",' + b'"0000::0000:0000:0000:0000/64",,\n' , [ dict( + category='amplifier', + name='dvr-dhcpdiscover', time='2022-04-20 13:29:33', address=[{'ip': '1.1.1.1'}], dport=1111, proto='udp', device_vendor='Private', device_type='ABC', - device_model='ABC1234-5DE6', + device_model='ABC1111-5DE6', device_version='1.111.111A001.0', - device_id='1234', + device_id='1111', + ), + dict( + category='amplifier', + name='dvr-dhcpdiscover', + time='2022-04-20 13:29:33', + address=[{'ip': '4.4.4.4'}], + dport=4444, + proto='udp', + device_vendor='Private', + device_type='ABC', + device_model='ABC4444-5DE6', + device_version='4.444.444A001.0', + device_id='4444', + ), + dict( + category='vulnerable', + name='cve-2000-444444', + time='2022-04-20 13:29:33', + address=[{'ip': '4.4.4.4'}], + dport=4444, + proto='udp', + device_vendor='Private', + device_type='ABC', + device_model='ABC4444-5DE6', + device_version='4.444.444A001.0', + device_id='4444', + ), + dict( + category='amplifier', + name='dvr-dhcpdiscover', + time='2022-04-20 13:29:33', + address=[{'ip': '7.7.7.7'}], + dport=7777, + proto='udp', + device_vendor='Private', + device_type='ABC', + device_model='ABC7777-5DE6', + device_version='7.777.777A001.0', + device_id='7777', + ), + dict( + category='vulnerable', + name='cve-2000-777777', + time='2022-04-20 13:29:33', + address=[{'ip': '7.7.7.7'}], + dport=7777, + proto='udp', + device_vendor='Private', + device_type='ABC', + device_model='ABC7777-5DE6', + device_version='7.777.777A001.0', + device_id='7777', ), + ] ) @@ -1979,6 +2761,7 @@ def cases(self): class TestShadowserverHttp202204Parser(ParserTestMixin, unittest.TestCase): PARSER_SOURCE = 'shadowserver.http' + PARSER_RAW_FORMAT_VERSION_TAG = '202204' PARSER_CLASS = ShadowserverHttp202204Parser PARSER_BASE_CLASS = _BaseShadowserverParser PARSER_CONSTANT_ITEMS = { @@ -2012,6 +2795,7 @@ def cases(self): class TestShadowserverFtp202204Parserr(ParserTestMixin, unittest.TestCase): PARSER_SOURCE = 'shadowserver.ftp' + PARSER_RAW_FORMAT_VERSION_TAG = '202204' PARSER_CLASS = ShadowserverFtp202204Parser PARSER_BASE_CLASS = BaseParser PARSER_CONSTANT_ITEMS = { @@ -2067,6 +2851,7 @@ def cases(self): class TestShadowserverMqtt202204Parser(ParserTestMixin, unittest.TestCase): PARSER_SOURCE = 'shadowserver.mqtt' + PARSER_RAW_FORMAT_VERSION_TAG = '202204' PARSER_CLASS = ShadowserverMqtt202204Parser PARSER_BASE_CLASS = _BaseShadowserverParser PARSER_CONSTANT_ITEMS = { @@ -2109,6 +2894,7 @@ def cases(self): class TestShadowserverShadowserverLdapTcp202204Parser(ParserTestMixin, unittest.TestCase): PARSER_SOURCE = 'shadowserver.ldap-tcp' + PARSER_RAW_FORMAT_VERSION_TAG = '202204' PARSER_CLASS = ShadowserverLdapTcp202204Parser PARSER_BASE_CLASS = _BaseShadowserverParser PARSER_CONSTANT_ITEMS = { @@ -2144,6 +2930,7 @@ def cases(self): class TestShadowserverRsync202204Parser(ParserTestMixin, unittest.TestCase): PARSER_SOURCE = 'shadowserver.rsync' + PARSER_RAW_FORMAT_VERSION_TAG = '202204' PARSER_CLASS = ShadowserverRsync202204Parser PARSER_BASE_CLASS = _BaseShadowserverParser PARSER_CONSTANT_ITEMS = { @@ -2172,6 +2959,7 @@ def cases(self): class TestShadowserverRadmin202204Parser(ParserTestMixin, unittest.TestCase): PARSER_SOURCE = 'shadowserver.radmin' + PARSER_RAW_FORMAT_VERSION_TAG = '202204' PARSER_CLASS = ShadowserverRadmin202204Parser PARSER_BASE_CLASS = _BaseShadowserverParser PARSER_CONSTANT_ITEMS = { @@ -2200,6 +2988,7 @@ def cases(self): class TestShadowserverAdb202204Parser(ParserTestMixin, unittest.TestCase): PARSER_SOURCE = 'shadowserver.adb' + PARSER_RAW_FORMAT_VERSION_TAG = '202204' PARSER_CLASS = ShadowserverAdb202204Parser PARSER_BASE_CLASS = _BaseShadowserverParser PARSER_CONSTANT_ITEMS = { @@ -2235,6 +3024,7 @@ def cases(self): class TestShadowserverAfp202204Parser(ParserTestMixin, unittest.TestCase): PARSER_SOURCE = 'shadowserver.afp' + PARSER_RAW_FORMAT_VERSION_TAG = '202204' PARSER_CLASS = ShadowserverAfp202204Parser PARSER_BASE_CLASS = _BaseShadowserverParser PARSER_CONSTANT_ITEMS = { @@ -2270,6 +3060,7 @@ def cases(self): class TestShadowserverCiscoSmartInstall202204Parser(ParserTestMixin, unittest.TestCase): PARSER_SOURCE = 'shadowserver.cisco-smart-install' + PARSER_RAW_FORMAT_VERSION_TAG = '202204' PARSER_CLASS = ShadowserverCiscoSmartInstall202204Parser PARSER_BASE_CLASS = _BaseShadowserverParser PARSER_CONSTANT_ITEMS = { @@ -2298,6 +3089,7 @@ def cases(self): class TestShadowserverIpp202204Parser(ParserTestMixin, unittest.TestCase): PARSER_SOURCE = 'shadowserver.ipp' + PARSER_RAW_FORMAT_VERSION_TAG = '202204' PARSER_CLASS = ShadowserverIpp202204Parser PARSER_BASE_CLASS = _BaseShadowserverParser PARSER_CONSTANT_ITEMS = { @@ -2337,6 +3129,7 @@ def cases(self): class TestShadowserverHadoop202204Parser(ParserTestMixin, unittest.TestCase): PARSER_SOURCE = 'shadowserver.hadoop' + PARSER_RAW_FORMAT_VERSION_TAG = '202204' PARSER_CLASS = ShadowserverHadoop202204Parser PARSER_BASE_CLASS = _BaseShadowserverParser PARSER_CONSTANT_ITEMS = { @@ -2369,6 +3162,7 @@ def cases(self): class TestShadowserverExchange202204Parser(ParserTestMixin, unittest.TestCase): PARSER_SOURCE = 'shadowserver.exchange' + PARSER_RAW_FORMAT_VERSION_TAG = '202204' PARSER_CLASS = ShadowserverExchange202204Parser PARSER_BASE_CLASS = _BaseShadowserverParser PARSER_CONSTANT_ITEMS = { @@ -2397,6 +3191,7 @@ def cases(self): class TestShadowserverSmtp202204Parser(ParserTestMixin, unittest.TestCase): PARSER_SOURCE = 'shadowserver.smtp' + PARSER_RAW_FORMAT_VERSION_TAG = '202204' PARSER_CLASS = ShadowserverSmtp202204Parser PARSER_BASE_CLASS = _BaseShadowserverParser PARSER_CONSTANT_ITEMS = { @@ -2425,6 +3220,7 @@ def cases(self): class TestShadowserverAmqp202204Parser(ParserTestMixin, unittest.TestCase): PARSER_SOURCE = 'shadowserver.amqp' + PARSER_RAW_FORMAT_VERSION_TAG = '202204' PARSER_CLASS = ShadowserverAmqp202204Parser PARSER_BASE_CLASS = _BaseShadowserverParser PARSER_CONSTANT_ITEMS = { @@ -2456,3 +3252,34 @@ def cases(self): ), ] ) + + +class TestShadowserverMsmq202308Parser(ParserTestMixin, unittest.TestCase): + + PARSER_SOURCE = 'shadowserver.msmq' + PARSER_RAW_FORMAT_VERSION_TAG = '202308' + PARSER_CLASS = ShadowserverMsmq202308Parser + PARSER_BASE_CLASS = _BaseShadowserverParser + PARSER_CONSTANT_ITEMS = { + 'restriction': 'need-to-know', + 'confidence': 'medium', + 'category': 'vulnerable', + 'name': 'msmq', + } + + def cases(self): + yield ( + b'"timestamp","ip","protocol","port","hostname","tag","asn","geo","region",' + b'"city","naics","sector","response_size"\n' + b'"2023-08-10 01:01:55","1.1.1.1","tcp",1111,"example.com","msmq",11111,"PL","STATE",' + b'"EXMPL CITY",,,111' + , + [ + dict( + time='2023-08-10 01:01:55', + address=[{'ip': '1.1.1.1'}], + dport=1111, + proto='tcp', + ), + ] + ) diff --git a/N6DataSources/n6datasources/tests/parsers/test_spamhaus.py b/N6DataSources/n6datasources/tests/parsers/test_spamhaus.py index b8befe6..8f6d8e8 100644 --- a/N6DataSources/n6datasources/tests/parsers/test_spamhaus.py +++ b/N6DataSources/n6datasources/tests/parsers/test_spamhaus.py @@ -65,6 +65,7 @@ class TestSpamhausEdrop202303Parser(ParserTestMixin, unittest.TestCase): RECORD_DICT_CLASS = BLRecordDict PARSER_SOURCE = 'spamhaus.edrop' + PARSER_RAW_FORMAT_VERSION_TAG = '202303' PARSER_CLASS = SpamhausEdrop202303Parser PARSER_BASE_CLASS = _BaseSpamhausBlacklistParser PARSER_CONSTANT_ITEMS = { diff --git a/N6Lib/n6lib/__init__.py b/N6Lib/n6lib/__init__.py index 5e3f260..1cdcfd5 100644 --- a/N6Lib/n6lib/__init__.py +++ b/N6Lib/n6lib/__init__.py @@ -1,6 +1,5 @@ # Copyright (c) 2020-2023 NASK. All rights reserved. -import atexit import locale import os @@ -8,7 +7,6 @@ # (if any and if not triggered yet). import n6sdk # noqa -from n6lib.common_helpers import cleanup_src from n6lib.config import monkey_patch_configparser_to_provide_some_legacy_defaults from n6lib.log_helpers import early_Formatter_class_monkeypatching @@ -40,11 +38,6 @@ # Monkey-patch configparser.RawConfigParser (with its subclasses)... monkey_patch_configparser_to_provide_some_legacy_defaults() -# Ensure that resource files and directories extracted with -# pkg_resources stuff are removed (or at least tried to be removed). -import logging # <- Must be imported *before* registering cleanup_src(). # noqa -atexit.register(cleanup_src) - # Make output diffs generated by `unittest`'s stuff more readable. import unittest # noqa import unittest.util @@ -52,3 +45,19 @@ if getattr(unittest.util, '_MAX_LENGTH', None) == 80: # (<- let's be conservative here) # Let's get rid of annoying `...[123 chars]...`-like shortening: unittest.util._MAX_LENGTH = 1_000_000 + +# XXX: This is a temporary workaround for `Python >= 3.11` -- +# to make certain old dependencies happy. But, ultimately, +# we need to update (or get rid of) those dependencies! +import sys +if sys.version_info[:2] >= (3, 11): + import collections, collections.abc # noqa + for _name in ['Container', 'Hashable', 'Iterable', 'Iterator', + 'Reversible', 'Generator', 'Sized', 'Callable', + 'Collection', 'Sequence', 'MutableSequence', 'ByteString', + 'Set', 'MutableSet', 'Mapping', 'MutableMapping', + 'MappingView', 'ItemsView', 'KeysView', 'ValuesView', + 'Awaitable', 'Coroutine', 'AsyncIterable', + 'AsyncIterator', 'AsyncGenerator']: + if not hasattr(collections, _name): + setattr(collections, _name, getattr(collections.abc, _name)) diff --git a/N6Lib/n6lib/amqp_helpers.py b/N6Lib/n6lib/amqp_helpers.py index 41eb775..1ae0fae 100644 --- a/N6Lib/n6lib/amqp_helpers.py +++ b/N6Lib/n6lib/amqp_helpers.py @@ -30,7 +30,7 @@ ... ''' -# components (subclasses of n6.base.queue.QueuedBase) +# components (subclasses of `n6datapipeline.base.LegacyQueuedBase`) # which have the `input_queue` attribute set, # but they do not need to have the list of `binding_keys`, # the warning will not be logged for them diff --git a/N6Lib/n6lib/auth_api.py b/N6Lib/n6lib/auth_api.py index 714d388..92cc54c 100644 --- a/N6Lib/n6lib/auth_api.py +++ b/N6Lib/n6lib/auth_api.py @@ -1488,7 +1488,10 @@ def _get_inside_criteria(self, root_node): asn_seq = list(map(int, get_attr_value_list(org, 'n6asn'))) cc_seq = list(get_attr_value_list(org, 'n6cc')) fqdn_seq = list(get_attr_value_list(org, 'n6fqdn')) - ip_min_max_seq = list(map(ip_network_tuple_to_min_max_ip, + convert_to_min_max_ip = functools.partial( + ip_network_tuple_to_min_max_ip, + force_min_ip_greater_than_zero=True) + ip_min_max_seq = list(map(convert_to_min_max_ip, map(ip_network_as_tuple, get_attr_value_list(org, 'n6ip-network')))) url_seq = list(get_attr_value_list(org, 'n6url')) @@ -1840,7 +1843,9 @@ def _iter_crit_conditions(self, criteria_container_items, cond_builder): if name == 'ip-network': for ip_network_str in value_list: ip_network_tuple = ip_network_as_tuple(ip_network_str) - min_ip, max_ip = ip_network_tuple_to_min_max_ip(ip_network_tuple) + min_ip, max_ip = ip_network_tuple_to_min_max_ip( + ip_network_tuple, + force_min_ip_greater_than_zero=True) column = cond_builder['ip'] yield column.between(min_ip, max_ip) else: diff --git a/N6Lib/n6lib/auth_db/scripts.py b/N6Lib/n6lib/auth_db/scripts.py index 89df560..80314e7 100644 --- a/N6Lib/n6lib/auth_db/scripts.py +++ b/N6Lib/n6lib/auth_db/scripts.py @@ -1,4 +1,4 @@ -# Copyright (c) 2018-2021 NASK. All rights reserved. +# Copyright (c) 2018-2023 NASK. All rights reserved. import argparse import ast @@ -14,9 +14,13 @@ import sqlalchemy.orm from alembic import command from alembic.config import Config as AlembicConfig -from pkg_resources import ( - Requirement, - resource_filename, +from importlib_resources import ( + # Note: `importlib_resources-5.12` provides backport of Python-3.12's + # version of `importlib.resources` whose implementation of `as_file()` + # supports directory trees (which is a feature needed by us in this + # module). + as_file, + files, ) from n6lib.auth_db import ( @@ -382,15 +386,13 @@ def stamp_as_alembic_head(self): self.msg( 'Invoking appropriate Alembic tools to stamp the auth database ' 'as being at the `{}` Alembic revision...'.format(revision)) - alembic_ini_path = resource_filename( - Requirement.parse('n6lib'), - 'n6lib/auth_db/alembic.ini') - with self.patched_os_environ_var( + with as_file(files('n6lib.auth_db')) as alembic_conf_dir_path, \ + self.patched_os_environ_var( ALEMBIC_DB_CONFIGURATOR_SETTINGS_DICT_ENVIRON_VAR_NAME, self._prepare_alembic_db_configurator_settings_dict_raw()), \ - self.changed_working_dir(osp.dirname(alembic_ini_path)), \ + self.changed_working_dir(alembic_conf_dir_path), \ self.suppressed_stderr(suppress_only_if_quiet=True): - alembic_cfg = AlembicConfig(alembic_ini_path) + alembic_cfg = AlembicConfig(alembic_conf_dir_path / 'alembic.ini') command.stamp(alembic_cfg, revision) def _prepare_alembic_db_configurator_settings_dict_raw(self): @@ -399,7 +401,7 @@ def _prepare_alembic_db_configurator_settings_dict_raw(self): alembic_db_configurator_settings_dict_raw = repr(alembic_db_configurator_settings_dict) try: ast.literal_eval(alembic_db_configurator_settings_dict_raw) - except Exception: + except Exception as exc: self.msg_error( 'when none of the -A or -o options are used, the ' 'auth database configurator settings dict, if ' @@ -407,7 +409,7 @@ def _prepare_alembic_db_configurator_settings_dict_raw(self): 'only keys and values representable as pure Python ' 'literals (got: {!a})'.format( alembic_db_configurator_settings_dict)) - raise ValueError('settings dict not representable as pure literal') + raise ValueError('settings dict not representable as pure literal') from exc else: alembic_db_configurator_settings_dict_raw = None return alembic_db_configurator_settings_dict_raw @@ -445,28 +447,91 @@ class PopulateAuthDB(BaseAuthDBScript): ANONYMIZED_SOURCE_PREFIX = 'hidden.' DEFAULT_SOURCES = [ - 'abuse-ch.spyeye-doms', - 'abuse-ch.spyeye-ips', - 'abuse-ch.zeus-doms', - 'abuse-ch.zeus-ips', - 'abuse-ch.zeustracker', - 'abuse-ch.palevo-doms', - 'abuse-ch.palevo-ips', 'abuse-ch.feodotracker', - 'abuse-ch.ransomware', 'abuse-ch.ssl-blacklist', - 'abuse-ch.ssl-blacklist-dyre', 'abuse-ch.urlhaus-urls', 'abuse-ch.urlhaus-payloads-urls', - 'abuse-ch.urlhaus-payloads', - 'badips-com.server-exploit-list', + 'abuse-ch.urlhaus-payload-samples', + 'cesnet-cz.warden', + 'cert-pl.shield', 'circl-lu.misp', - 'dns-bh.malwaredomainscom', + 'dan-tv.tor', + 'dataplane.dnsrd', + 'dataplane.dnsrdany', + 'dataplane.dnsversion', + 'dataplane.sipinvitation', + 'dataplane.sipquery', + 'dataplane.sipregistration', + 'dataplane.smtpdata', + 'dataplane.smtpgreet', + 'dataplane.sshclient', + 'dataplane.sshpwauth', + 'dataplane.telnetlogin', + 'dataplane.vncrfb', 'greensnow-co.list-txt', - 'packetmail-net.list', - 'packetmail-net.ratware-list', - 'packetmail-net.others-list', + 'malwarepatrol.malurl', + 'openphish.web-bl', + 'sblam.spam', + 'shadowserver.adb', + 'shadowserver.afp', + 'shadowserver.amqp', + 'shadowserver.ard', + 'shadowserver.chargen', + 'shadowserver.ciscosmartinstall', + 'shadowserver.coap', + 'shadowserver.compromisedwebsite', + 'shadowserver.cwmp', + 'shadowserver.darknet', + 'shadowserver.db2', + 'shadowserver.dvrdhcpdiscover', + 'shadowserver.elasticsearch', + 'shadowserver.exchange', + 'shadowserver.ftp', + 'shadowserver.hadoop', + 'shadowserver.http', + 'shadowserver.ics', + 'shadowserver.ipmi', + 'shadowserver.ipp', + 'shadowserver.isakmp', + 'shadowserver.ldap', + 'shadowserver.ldaptcp', + 'shadowserver.mdns', + 'shadowserver.memcached', + 'shadowserver.mongodb', + 'shadowserver.mqtt', + 'shadowserver.mssql', + 'shadowserver.natpmp', + 'shadowserver.netbios', + 'shadowserver.netis', + 'shadowserver.ntpmonitor', + 'shadowserver.ntpversion', + 'shadowserver.openresolver', + 'shadowserver.portmapper', + 'shadowserver.qotd', + 'shadowserver.radmin', + 'shadowserver.rdp', + 'shadowserver.rdpeudp', + 'shadowserver.redis', + 'shadowserver.rsync', + 'shadowserver.sandboxurl', + 'shadowserver.sinkhole', + 'shadowserver.sinkholehttp', + 'shadowserver.smb', + 'shadowserver.smtp', + 'shadowserver.snmp', + 'shadowserver.ssdp', + 'shadowserver.sslfreak', + 'shadowserver.sslpoodle', + 'shadowserver.telnet', + 'shadowserver.tftp', + 'shadowserver.ubiquiti', + 'shadowserver.vnc', + 'shadowserver.xdmcp', + 'spamhaus.drop', + 'spamhaus.edrop', + 'spamhaus.spam', 'spam404-com.scam-list', + 'stopforum.spam', 'zoneh.rss', ] diff --git a/N6Lib/n6lib/auth_related_test_helpers.py b/N6Lib/n6lib/auth_related_test_helpers.py index 8d304b8..9db81ae 100644 --- a/N6Lib/n6lib/auth_related_test_helpers.py +++ b/N6Lib/n6lib/auth_related_test_helpers.py @@ -100,7 +100,7 @@ def fa_false_cond(condition): # * 'gp8' -- which includes subsource 'p9' # # * six criteria containers: -# * 'c1' -- specifying criteria: asn=1|2|3 or ip-network=10.0.0.0/8|192.168.0.0/24 +# * 'c1' -- specifying criteria: asn=1|2|3 or ip-network=0.0.0.0/30|10.0.0.0/8|192.168.0.0/24 # * 'c2' -- specifying criteria: asn=3|4|5 # * 'c3' -- specifying criteria: cc=PL # * 'c4' -- specifying criteria: category=bot|cnc @@ -763,7 +763,16 @@ def fa_false_cond(condition): EXAMPLE_SEARCH_RAW_RETURN_VALUE += [ _Cri(1, { 'n6asn': ['1', '2', '3'], - 'n6ip-network': ['10.0.0.0/8', '192.168.0.0/24'], + 'n6ip-network': [ + # Note that -- everywhere below -- the '0.0.0.0/30' network + # (which includes the IP 0, i.e., `0.0.0.0`) is translated + # to IP ranges in such a way, that the minimum IP is 1, not + # 0 (because 0 is reserved as the "no IP" placeholder value; + # see: #8861). + '0.0.0.0/30', + '10.0.0.0/8', + '192.168.0.0/24', + ], }), _Cri(2, { 'n6asn': ['3', '4', '5'], @@ -792,6 +801,7 @@ def fa_false_cond(condition): Cond['source'] == 'source.one', Cond.or_( Cond['asn'].in_([1, 2, 3]), + Cond['ip'].between(1, 3), Cond['ip'].between(167772160, 184549375), Cond['ip'].between(3232235520, 3232235775)), Cond.and_()) @@ -804,6 +814,7 @@ def fa_false_cond(condition): Cond.and_( Cond.or_( Cond['asn'].in_([1, 2, 3]), + Cond['ip'].between(1, 3), Cond['ip'].between(167772160, 184549375), Cond['ip'].between(3232235520, 3232235775)), Cond['asn'].in_([3, 4, 5])), @@ -866,6 +877,7 @@ def fa_false_cond(condition): Cond.and_( Cond.not_(Cond.or_( Cond['asn'].in_([1, 2, 3]), + Cond['ip'].between(1, 3), Cond['ip'].between(167772160, 184549375), Cond['ip'].between(3232235520, 3232235775))), Cond.not_(Cond['category'].in_(['bots', 'cnc'])), @@ -1066,6 +1078,7 @@ def fa_false_cond(condition): """ event.source = 'source.one' AND (event.asn IN (1, 2, 3) + OR event.ip BETWEEN 1 AND 3 OR event.ip BETWEEN 167772160 AND 184549375 OR event.ip BETWEEN 3232235520 AND 3232235775) """, @@ -1128,6 +1141,7 @@ def fa_false_cond(condition): """ event.source = 'source.one' AND (event.asn IN (1, 2, 3) + OR event.ip BETWEEN 1 AND 3 OR event.ip BETWEEN 167772160 AND 184549375 OR event.ip BETWEEN 3232235520 AND 3232235775) AND event.asn IN (3, 4, 5) @@ -1200,9 +1214,11 @@ def fa_false_cond(condition): 'search': [prep_sql_str( # P2: + """ event.source = 'source.one' AND (event.asn IN (1, 2, 3) + OR event.ip BETWEEN 1 AND 3 OR event.ip BETWEEN 167772160 AND 184549375 OR event.ip BETWEEN 3232235520 AND 3232235775) AND event.asn IN (3, 4, 5) @@ -1240,6 +1256,7 @@ def fa_false_cond(condition): """ event.source = 'source.one' AND (event.asn IN (1, 2, 3) + OR event.ip BETWEEN 1 AND 3 OR event.ip BETWEEN 167772160 AND 184549375 OR event.ip BETWEEN 3232235520 AND 3232235775) """, @@ -1252,6 +1269,7 @@ def fa_false_cond(condition): AND (event.asn IN (3, 4, 5) AND event.name = 'foo' AND (event.asn IS NULL OR event.asn NOT IN (1, 2, 3)) + AND event.ip NOT BETWEEN 1 AND 3 AND event.ip NOT BETWEEN 167772160 AND 184549375 AND event.ip NOT BETWEEN 3232235520 AND 3232235775 AND event.category NOT IN ('bots', 'cnc') @@ -1267,6 +1285,7 @@ def fa_false_cond(condition): """ event.source = 'source.one' AND (event.asn IN (1, 2, 3) + OR event.ip BETWEEN 1 AND 3 OR event.ip BETWEEN 167772160 AND 184549375 OR event.ip BETWEEN 3232235520 AND 3232235775) """, @@ -1294,6 +1313,7 @@ def fa_false_cond(condition): """ event.source = 'source.one' AND (event.asn IN (1, 2, 3) + OR event.ip BETWEEN 1 AND 3 OR event.ip BETWEEN 167772160 AND 184549375 OR event.ip BETWEEN 3232235520 AND 3232235775) AND event.asn IN (3, 4, 5) @@ -1327,6 +1347,7 @@ def fa_false_cond(condition): """ event.source = 'source.one' AND (event.asn IN (1, 2, 3) + OR event.ip BETWEEN 1 AND 3 OR event.ip BETWEEN 167772160 AND 184549375 OR event.ip BETWEEN 3232235520 AND 3232235775) """, @@ -1349,6 +1370,7 @@ def fa_false_cond(condition): """ event.source = 'source.one' AND (event.asn IN (1, 2, 3) + OR event.ip BETWEEN 1 AND 3 OR event.ip BETWEEN 167772160 AND 184549375 OR event.ip BETWEEN 3232235520 AND 3232235775) """, @@ -1372,6 +1394,7 @@ def fa_false_cond(condition): """ event.source = 'source.one' AND (event.asn IN (1, 2, 3) + OR event.ip BETWEEN 1 AND 3 OR event.ip BETWEEN 167772160 AND 184549375 OR event.ip BETWEEN 3232235520 AND 3232235775) AND event.asn IN (3, 4, 5) @@ -1407,6 +1430,7 @@ def fa_false_cond(condition): """ event.source = 'source.one' AND (event.asn IN (1, 2, 3) + OR event.ip BETWEEN 1 AND 3 OR event.ip BETWEEN 167772160 AND 184549375 OR event.ip BETWEEN 3232235520 AND 3232235775) """, @@ -1427,6 +1451,7 @@ def fa_false_cond(condition): """ event.source = 'source.one' AND (event.asn IN (1, 2, 3) + OR event.ip BETWEEN 1 AND 3 OR event.ip BETWEEN 167772160 AND 184549375 OR event.ip BETWEEN 3232235520 AND 3232235775) """, @@ -1470,6 +1495,7 @@ def fa_false_cond(condition): """ event.source = 'source.one' AND (event.asn IN (1, 2, 3) + OR event.ip BETWEEN 1 AND 3 OR event.ip BETWEEN 167772160 AND 184549375 OR event.ip BETWEEN 3232235520 AND 3232235775) AND event.restriction != 'internal' @@ -1511,6 +1537,7 @@ def fa_false_cond(condition): """ event.source = 'source.one' AND (event.asn IN (1, 2, 3) + OR event.ip BETWEEN 1 AND 3 OR event.ip BETWEEN 167772160 AND 184549375 OR event.ip BETWEEN 3232235520 AND 3232235775) """, @@ -1518,6 +1545,7 @@ def fa_false_cond(condition): """ event.source = 'source.one' AND (event.asn IN (1, 2, 3) + OR event.ip BETWEEN 1 AND 3 OR event.ip BETWEEN 167772160 AND 184549375 OR event.ip BETWEEN 3232235520 AND 3232235775) AND event.asn IN (3, 4, 5) @@ -1548,6 +1576,7 @@ def fa_false_cond(condition): """ event.source = 'source.one' AND (event.asn IN (1, 2, 3) + OR event.ip BETWEEN 1 AND 3 OR event.ip BETWEEN 167772160 AND 184549375 OR event.ip BETWEEN 3232235520 AND 3232235775) """, @@ -1555,6 +1584,7 @@ def fa_false_cond(condition): """ event.source = 'source.one' AND (event.asn IN (1, 2, 3) + OR event.ip BETWEEN 1 AND 3 OR event.ip BETWEEN 167772160 AND 184549375 OR event.ip BETWEEN 3232235520 AND 3232235775) AND event.asn IN (3, 4, 5) @@ -1597,6 +1627,7 @@ def fa_false_cond(condition): """ event.source = 'source.one' AND (event.asn IN (1, 2, 3) + OR event.ip BETWEEN 1 AND 3 OR event.ip BETWEEN 167772160 AND 184549375 OR event.ip BETWEEN 3232235520 AND 3232235775) AND event.asn IN (3, 4, 5) @@ -1636,6 +1667,7 @@ def fa_false_cond(condition): """ event.source = 'source.one' AND (event.asn IN (1, 2, 3) + OR event.ip BETWEEN 1 AND 3 OR event.ip BETWEEN 167772160 AND 184549375 OR event.ip BETWEEN 3232235520 AND 3232235775) AND event.asn IN (3, 4, 5) @@ -1674,6 +1706,7 @@ def fa_false_cond(condition): """ event.source = 'source.one' AND (event.asn IN (1, 2, 3) + OR event.ip BETWEEN 1 AND 3 OR event.ip BETWEEN 167772160 AND 184549375 OR event.ip BETWEEN 3232235520 AND 3232235775) AND event.restriction != 'internal' @@ -1682,6 +1715,7 @@ def fa_false_cond(condition): """ event.source = 'source.one' AND (event.asn IN (1, 2, 3) + OR event.ip BETWEEN 1 AND 3 OR event.ip BETWEEN 167772160 AND 184549375 OR event.ip BETWEEN 3232235520 AND 3232235775) AND event.asn IN (3, 4, 5) @@ -1728,6 +1762,7 @@ def fa_false_cond(condition): """ event.source = 'source.one' AND (event.asn IN (1, 2, 3) + OR event.ip BETWEEN 1 AND 3 OR event.ip BETWEEN 167772160 AND 184549375 OR event.ip BETWEEN 3232235520 AND 3232235775) AND event.restriction != 'internal' @@ -1756,6 +1791,7 @@ def fa_false_cond(condition): """ event.source = 'source.one' AND (event.asn IN (1, 2, 3) + OR event.ip BETWEEN 1 AND 3 OR event.ip BETWEEN 167772160 AND 184549375 OR event.ip BETWEEN 3232235520 AND 3232235775) AND event.asn IN (3, 4, 5) @@ -1768,6 +1804,7 @@ def fa_false_cond(condition): """ event.source = 'source.one' AND (event.asn IN (1, 2, 3) + OR event.ip BETWEEN 1 AND 3 OR event.ip BETWEEN 167772160 AND 184549375 OR event.ip BETWEEN 3232235520 AND 3232235775) AND event.restriction != 'internal' @@ -1800,6 +1837,7 @@ def fa_false_cond(condition): """ event.source = 'source.one' AND (event.asn IN (1, 2, 3) + OR event.ip BETWEEN 1 AND 3 OR event.ip BETWEEN 167772160 AND 184549375 OR event.ip BETWEEN 3232235520 AND 3232235775) AND event.restriction != 'internal' @@ -1810,6 +1848,7 @@ def fa_false_cond(condition): AND event.asn IN (3, 4, 5) AND event.name = 'foo' AND (event.asn IS NULL OR event.asn NOT IN (1, 2, 3)) + AND event.ip NOT BETWEEN 1 AND 3 AND event.ip NOT BETWEEN 167772160 AND 184549375 AND event.ip NOT BETWEEN 3232235520 AND 3232235775 AND event.category NOT IN ('bots', 'cnc') @@ -1819,6 +1858,7 @@ def fa_false_cond(condition): """ event.source = 'source.one' AND (event.asn IN (1, 2, 3) + OR event.ip BETWEEN 1 AND 3 OR event.ip BETWEEN 167772160 AND 184549375 OR event.ip BETWEEN 3232235520 AND 3232235775) AND event.asn IN (3, 4, 5) @@ -1840,6 +1880,7 @@ def fa_false_cond(condition): """ event.source = 'source.one' AND (event.asn IN (1, 2, 3) + OR event.ip BETWEEN 1 AND 3 OR event.ip BETWEEN 167772160 AND 184549375 OR event.ip BETWEEN 3232235520 AND 3232235775) AND event.restriction != 'internal' @@ -1869,6 +1910,7 @@ def fa_false_cond(condition): """ event.source = 'source.one' AND (event.asn IN (1, 2, 3) + OR event.ip BETWEEN 1 AND 3 OR event.ip BETWEEN 167772160 AND 184549375 OR event.ip BETWEEN 3232235520 AND 3232235775) AND event.asn IN (3, 4, 5) @@ -1901,6 +1943,7 @@ def fa_false_cond(condition): """ event.source = 'source.one' AND (event.asn IN (1, 2, 3) + OR event.ip BETWEEN 1 AND 3 OR event.ip BETWEEN 167772160 AND 184549375 OR event.ip BETWEEN 3232235520 AND 3232235775) AND event.restriction != 'internal' @@ -1909,6 +1952,7 @@ def fa_false_cond(condition): """ event.source = 'source.one' AND (event.asn IN (1, 2, 3) + OR event.ip BETWEEN 1 AND 3 OR event.ip BETWEEN 167772160 AND 184549375 OR event.ip BETWEEN 3232235520 AND 3232235775) AND event.asn IN (3, 4, 5) @@ -1939,6 +1983,7 @@ def fa_false_cond(condition): """ event.source = 'source.one' AND (event.asn IN (1, 2, 3) + OR event.ip BETWEEN 1 AND 3 OR event.ip BETWEEN 167772160 AND 184549375 OR event.ip BETWEEN 3232235520 AND 3232235775) AND event.restriction != 'internal' @@ -1947,6 +1992,7 @@ def fa_false_cond(condition): """ event.source = 'source.one' AND (event.asn IN (1, 2, 3) + OR event.ip BETWEEN 1 AND 3 OR event.ip BETWEEN 167772160 AND 184549375 OR event.ip BETWEEN 3232235520 AND 3232235775) AND event.asn IN (3, 4, 5) @@ -1978,6 +2024,7 @@ def fa_false_cond(condition): """ event.source = 'source.one' AND (event.asn IN (1, 2, 3) + OR event.ip BETWEEN 1 AND 3 OR event.ip BETWEEN 167772160 AND 184549375 OR event.ip BETWEEN 3232235520 AND 3232235775) AND event.asn IN (3, 4, 5) @@ -2015,6 +2062,7 @@ def fa_false_cond(condition): """ event.source = 'source.one' AND (event.asn IN (1, 2, 3) + OR event.ip BETWEEN 1 AND 3 OR event.ip BETWEEN 167772160 AND 184549375 OR event.ip BETWEEN 3232235520 AND 3232235775) AND event.restriction != 'internal' @@ -2023,6 +2071,7 @@ def fa_false_cond(condition): """ event.source = 'source.one' AND (event.asn IN (1, 2, 3) + OR event.ip BETWEEN 1 AND 3 OR event.ip BETWEEN 167772160 AND 184549375 OR event.ip BETWEEN 3232235520 AND 3232235775) AND event.asn IN (3, 4, 5) @@ -2044,6 +2093,7 @@ def fa_false_cond(condition): """ event.source = 'source.one' AND (event.asn IN (1, 2, 3) + OR event.ip BETWEEN 1 AND 3 OR event.ip BETWEEN 167772160 AND 184549375 OR event.ip BETWEEN 3232235520 AND 3232235775) AND event.restriction != 'internal' @@ -2072,6 +2122,7 @@ def fa_false_cond(condition): """ event.source = 'source.one' AND (event.asn IN (1, 2, 3) + OR event.ip BETWEEN 1 AND 3 OR event.ip BETWEEN 167772160 AND 184549375 OR event.ip BETWEEN 3232235520 AND 3232235775) AND event.restriction != 'internal' @@ -2080,6 +2131,7 @@ def fa_false_cond(condition): """ event.source = 'source.one' AND (event.asn IN (1, 2, 3) + OR event.ip BETWEEN 1 AND 3 OR event.ip BETWEEN 167772160 AND 184549375 OR event.ip BETWEEN 3232235520 AND 3232235775) AND event.asn IN (3, 4, 5) @@ -2105,6 +2157,7 @@ def fa_false_cond(condition): """ event.source = 'source.one' AND (event.asn IN (1, 2, 3) + OR event.ip BETWEEN 1 AND 3 OR event.ip BETWEEN 167772160 AND 184549375 OR event.ip BETWEEN 3232235520 AND 3232235775) AND event.restriction != 'internal' @@ -2116,6 +2169,7 @@ def fa_false_cond(condition): """ event.source = 'source.one' AND (event.asn IN (1, 2, 3) + OR event.ip BETWEEN 1 AND 3 OR event.ip BETWEEN 167772160 AND 184549375 OR event.ip BETWEEN 3232235520 AND 3232235775) AND event.restriction != 'internal' @@ -2124,6 +2178,7 @@ def fa_false_cond(condition): """ event.source = 'source.one' AND (event.asn IN (1, 2, 3) + OR event.ip BETWEEN 1 AND 3 OR event.ip BETWEEN 167772160 AND 184549375 OR event.ip BETWEEN 3232235520 AND 3232235775) AND event.asn IN (3, 4, 5) @@ -2162,6 +2217,7 @@ def fa_false_cond(condition): """ event.source = 'source.one' AND (event.asn IN (1, 2, 3) + OR event.ip >= 1 AND event.ip <= 3 OR event.ip >= 167772160 AND event.ip <= 184549375 OR event.ip >= 3232235520 AND event.ip <= 3232235775) """, @@ -2169,6 +2225,7 @@ def fa_false_cond(condition): """ event.source = 'source.one' AND (event.asn IN (1, 2, 3) + OR event.ip >= 1 AND event.ip <= 3 OR event.ip >= 167772160 AND event.ip <= 184549375 OR event.ip >= 3232235520 AND event.ip <= 3232235775) AND event.asn IN (3, 4, 5) @@ -2210,6 +2267,7 @@ def fa_false_cond(condition): """ event.source = 'source.one' AND (event.asn IN (1, 2, 3) + OR event.ip >= 1 AND event.ip <= 3 OR event.ip >= 167772160 AND event.ip <= 184549375 OR event.ip >= 3232235520 AND event.ip <= 3232235775) """, @@ -2217,6 +2275,7 @@ def fa_false_cond(condition): """ event.source = 'source.one' AND (event.asn IN (1, 2, 3) + OR event.ip >= 1 AND event.ip <= 3 OR event.ip >= 167772160 AND event.ip <= 184549375 OR event.ip >= 3232235520 AND event.ip <= 3232235775) AND event.asn IN (3, 4, 5) @@ -2263,6 +2322,7 @@ def fa_false_cond(condition): """ event.source = 'source.one' AND (event.asn IN (1, 2, 3) + OR event.ip >= 1 AND event.ip <= 3 OR event.ip >= 167772160 AND event.ip <= 184549375 OR event.ip >= 3232235520 AND event.ip <= 3232235775) AND event.asn IN (3, 4, 5) @@ -2315,6 +2375,7 @@ def fa_false_cond(condition): """ event.source = 'source.one' AND (event.asn IN (1, 2, 3) + OR event.ip >= 1 AND event.ip <= 3 OR event.ip >= 167772160 AND event.ip <= 184549375 OR event.ip >= 3232235520 AND event.ip <= 3232235775) AND event.asn IN (3, 4, 5) @@ -2361,6 +2422,7 @@ def fa_false_cond(condition): """ event.source = 'source.one' AND (event.asn IN (1, 2, 3) + OR event.ip >= 1 AND event.ip <= 3 OR event.ip >= 167772160 AND event.ip <= 184549375 OR event.ip >= 3232235520 AND event.ip <= 3232235775) AND event.restriction != 'internal' @@ -2369,6 +2431,7 @@ def fa_false_cond(condition): """ event.source = 'source.one' AND (event.asn IN (1, 2, 3) + OR event.ip >= 1 AND event.ip <= 3 OR event.ip >= 167772160 AND event.ip <= 184549375 OR event.ip >= 3232235520 AND event.ip <= 3232235775) AND event.asn IN (3, 4, 5) @@ -2415,6 +2478,7 @@ def fa_false_cond(condition): """ event.source = 'source.one' AND (event.asn IN (1, 2, 3) + OR event.ip >= 1 AND event.ip <= 3 OR event.ip >= 167772160 AND event.ip <= 184549375 OR event.ip >= 3232235520 AND event.ip <= 3232235775) AND event.restriction != 'internal' @@ -2443,6 +2507,7 @@ def fa_false_cond(condition): """ event.source = 'source.one' AND (event.asn IN (1, 2, 3) + OR event.ip >= 1 AND event.ip <= 3 OR event.ip >= 167772160 AND event.ip <= 184549375 OR event.ip >= 3232235520 AND event.ip <= 3232235775) AND event.asn IN (3, 4, 5) @@ -2455,6 +2520,7 @@ def fa_false_cond(condition): """ event.source = 'source.one' AND (event.asn IN (1, 2, 3) + OR event.ip >= 1 AND event.ip <= 3 OR event.ip >= 167772160 AND event.ip <= 184549375 OR event.ip >= 3232235520 AND event.ip <= 3232235775) AND event.restriction != 'internal' @@ -2487,6 +2553,7 @@ def fa_false_cond(condition): """ event.source = 'source.one' AND (event.asn IN (1, 2, 3) + OR event.ip >= 1 AND event.ip <= 3 OR event.ip >= 167772160 AND event.ip <= 184549375 OR event.ip >= 3232235520 AND event.ip <= 3232235775) AND event.restriction != 'internal' @@ -2497,6 +2564,7 @@ def fa_false_cond(condition): AND event.asn IN (3, 4, 5) AND event.name IN ('foo') AND NOT (event.asn IN (1, 2, 3) + OR event.ip >= 1 AND event.ip <= 3 OR event.ip >= 167772160 AND event.ip <= 184549375 OR event.ip >= 3232235520 AND event.ip <= 3232235775) AND event.category NOT IN ('bots', 'cnc') @@ -2506,6 +2574,7 @@ def fa_false_cond(condition): """ event.source = 'source.one' AND (event.asn IN (1, 2, 3) + OR event.ip >= 1 AND event.ip <= 3 OR event.ip >= 167772160 AND event.ip <= 184549375 OR event.ip >= 3232235520 AND event.ip <= 3232235775) AND event.asn IN (3, 4, 5) @@ -2535,6 +2604,7 @@ def fa_false_cond(condition): """ event.source = 'source.one' AND (event.asn IN (1, 2, 3) + OR event.ip >= 1 AND event.ip <= 3 OR event.ip >= 167772160 AND event.ip <= 184549375 OR event.ip >= 3232235520 AND event.ip <= 3232235775) AND event.restriction != 'internal' @@ -2572,6 +2642,7 @@ def fa_false_cond(condition): """ event.source = 'source.one' AND (event.asn IN (1, 2, 3) + OR event.ip >= 1 AND event.ip <= 3 OR event.ip >= 167772160 AND event.ip <= 184549375 OR event.ip >= 3232235520 AND event.ip <= 3232235775) AND event.asn IN (3, 4, 5) @@ -2608,6 +2679,7 @@ def fa_false_cond(condition): """ event.source = 'source.one' AND (event.asn IN (1, 2, 3) + OR event.ip >= 1 AND event.ip <= 3 OR event.ip >= 167772160 AND event.ip <= 184549375 OR event.ip >= 3232235520 AND event.ip <= 3232235775) AND event.restriction != 'internal' @@ -2616,6 +2688,7 @@ def fa_false_cond(condition): """ event.source = 'source.one' AND (event.asn IN (1, 2, 3) + OR event.ip >= 1 AND event.ip <= 3 OR event.ip >= 167772160 AND event.ip <= 184549375 OR event.ip >= 3232235520 AND event.ip <= 3232235775) AND event.asn IN (3, 4, 5) @@ -2646,6 +2719,7 @@ def fa_false_cond(condition): """ event.source = 'source.one' AND (event.asn IN (1, 2, 3) + OR event.ip >= 1 AND event.ip <= 3 OR event.ip >= 167772160 AND event.ip <= 184549375 OR event.ip >= 3232235520 AND event.ip <= 3232235775) AND event.restriction != 'internal' @@ -2654,6 +2728,7 @@ def fa_false_cond(condition): """ event.source = 'source.one' AND (event.asn IN (1, 2, 3) + OR event.ip >= 1 AND event.ip <= 3 OR event.ip >= 167772160 AND event.ip <= 184549375 OR event.ip >= 3232235520 AND event.ip <= 3232235775) AND event.asn IN (3, 4, 5) @@ -2690,6 +2765,7 @@ def fa_false_cond(condition): """ event.source = 'source.one' AND (event.asn IN (1, 2, 3) + OR event.ip >= 1 AND event.ip <= 3 OR event.ip >= 167772160 AND event.ip <= 184549375 OR event.ip >= 3232235520 AND event.ip <= 3232235775) AND event.asn IN (3, 4, 5) @@ -2727,6 +2803,7 @@ def fa_false_cond(condition): """ event.source = 'source.one' AND (event.asn IN (1, 2, 3) + OR event.ip >= 1 AND event.ip <= 3 OR event.ip >= 167772160 AND event.ip <= 184549375 OR event.ip >= 3232235520 AND event.ip <= 3232235775) AND event.restriction != 'internal' @@ -2735,6 +2812,7 @@ def fa_false_cond(condition): """ event.source = 'source.one' AND (event.asn IN (1, 2, 3) + OR event.ip >= 1 AND event.ip <= 3 OR event.ip >= 167772160 AND event.ip <= 184549375 OR event.ip >= 3232235520 AND event.ip <= 3232235775) AND event.asn IN (3, 4, 5) @@ -2764,6 +2842,7 @@ def fa_false_cond(condition): """ event.source = 'source.one' AND (event.asn IN (1, 2, 3) + OR event.ip >= 1 AND event.ip <= 3 OR event.ip >= 167772160 AND event.ip <= 184549375 OR event.ip >= 3232235520 AND event.ip <= 3232235775) AND event.restriction != 'internal' @@ -2800,6 +2879,7 @@ def fa_false_cond(condition): """ event.source = 'source.one' AND (event.asn IN (1, 2, 3) + OR event.ip >= 1 AND event.ip <= 3 OR event.ip >= 167772160 AND event.ip <= 184549375 OR event.ip >= 3232235520 AND event.ip <= 3232235775) AND event.restriction != 'internal' @@ -2808,6 +2888,7 @@ def fa_false_cond(condition): """ event.source = 'source.one' AND (event.asn IN (1, 2, 3) + OR event.ip >= 1 AND event.ip <= 3 OR event.ip >= 167772160 AND event.ip <= 184549375 OR event.ip >= 3232235520 AND event.ip <= 3232235775) AND event.asn IN (3, 4, 5) @@ -2841,6 +2922,7 @@ def fa_false_cond(condition): """ event.source = 'source.one' AND (event.asn IN (1, 2, 3) + OR event.ip >= 1 AND event.ip <= 3 OR event.ip >= 167772160 AND event.ip <= 184549375 OR event.ip >= 3232235520 AND event.ip <= 3232235775) AND event.restriction != 'internal' @@ -2852,6 +2934,7 @@ def fa_false_cond(condition): """ event.source = 'source.one' AND (event.asn IN (1, 2, 3) + OR event.ip >= 1 AND event.ip <= 3 OR event.ip >= 167772160 AND event.ip <= 184549375 OR event.ip >= 3232235520 AND event.ip <= 3232235775) AND event.restriction != 'internal' @@ -2860,6 +2943,7 @@ def fa_false_cond(condition): """ event.source = 'source.one' AND (event.asn IN (1, 2, 3) + OR event.ip >= 1 AND event.ip <= 3 OR event.ip >= 167772160 AND event.ip <= 184549375 OR event.ip >= 3232235520 AND event.ip <= 3232235775) AND event.asn IN (3, 4, 5) diff --git a/N6Lib/n6lib/common_helpers.py b/N6Lib/n6lib/common_helpers.py index f9885c2..f3b0c5d 100644 --- a/N6Lib/n6lib/common_helpers.py +++ b/N6Lib/n6lib/common_helpers.py @@ -45,8 +45,6 @@ Union, ) -from pkg_resources import cleanup_resources - # for backward-compatibility and/or for convenience, the following # constants and functions importable from some of the n6sdk.* modules # are also accessible via this module: @@ -58,9 +56,10 @@ from n6sdk.encoding_helpers import ( ascii_str, ascii_py_identifier_str, + as_str_with_minimum_esc, as_unicode, str_to_bool, - try_to_normalize_surrogate_pairs_to_proper_codepoints, + replace_surrogate_pairs_with_proper_codepoints, ) from n6sdk.regexes import ( CC_SIMPLE_REGEX, @@ -428,7 +427,13 @@ class FilePagedSequence(MutableSequence): Temporary files are created lazily. No disk (filesystem) operations are performed at all if all data fit on one page. - The implementation is *not* thread-safe. + Normally, when an instance of this class is garbage-collected or + when the program exits in an undisturbed way (*also* if it exits + due to an unhandled exception), the instance's temporary files + are automatically removed (thanks to a `weakref.finalize()`-based + mechanism used internally by the class). + + The implementation of `FilePagedSequence` is *not* thread-safe. >>> seq = FilePagedSequence([1, 'foo', {'a': None}, ['b']], page_size=3) >>> seq @@ -919,7 +924,7 @@ class FilePagedSequence(MutableSequence): >>> FilePagedSequence(b'abcdef', page_size=4) == bytearray(b'abcdef') False - >>> seq._filesystem_used() # (it's a *non-public method*, never use it in real code!) + >>> '_dir' in seq.__dict__ # (here we use a *non-public stuff*, never do that in real code!) True >>> _dir = seq._dir # (it's a *non-public descriptor*, never use it in real code!) >>> osp.exists(_dir) @@ -933,16 +938,16 @@ class FilePagedSequence(MutableSequence): FilePagedSequence(<0 items...>, page_size=3) >>> list(seq) [] - >>> seq._filesystem_used() + >>> '_dir' in seq.__dict__ # (filesystem no longer used) False >>> osp.exists(_dir) False >>> with seq as cm_target: # (note: reusing the same instance) ... seq is cm_target - ... not seq._filesystem_used() + ... '_dir' not in seq.__dict__ # (filesystem not used yet) ... seq.extend(map(int, '1234567890')) - ... seq._filesystem_used() + ... '_dir' in seq.__dict__ # (filesystem used) ... seq == [1, 2, 3, 4, 5, 6, 7, 8, 9, 0] ... repr(seq) == 'FilePagedSequence(<10 items...>, page_size=3)' ... _dir2 = seq._dir @@ -962,7 +967,7 @@ class FilePagedSequence(MutableSequence): FilePagedSequence(<0 items...>, page_size=3) >>> list(seq) [] - >>> seq._filesystem_used() + >>> '_dir' in seq.__dict__ # (filesystem no longer used) False >>> osp.exists(_dir2) False @@ -1005,10 +1010,10 @@ class FilePagedSequence(MutableSequence): >>> seq2 = FilePagedSequence('abc', page_size=3) >>> list(seq2) ['a', 'b', 'c'] - >>> seq2._filesystem_used() # all items in current page -> no disk op. + >>> '_dir' in seq2.__dict__ # all items in current page -> no disk op. False >>> seq2.extend('d') # (now page 0 must be saved) - >>> seq2._filesystem_used() + >>> '_dir' in seq2.__dict__ # new page created -> filesystem used True >>> _dir = seq2._dir >>> osp.exists(_dir) @@ -1033,7 +1038,7 @@ class FilePagedSequence(MutableSequence): >>> sorted(os.listdir(_dir)) ['0', '1', '2'] >>> seq2.close() - >>> seq2._filesystem_used() + >>> '_dir' in seq2.__dict__ # (filesystem no longer used) False >>> osp.exists(_dir) False @@ -1041,14 +1046,14 @@ class FilePagedSequence(MutableSequence): [] >>> seq3 = FilePagedSequence(page_size=3) - >>> seq3._filesystem_used() + >>> '_dir' in seq3.__dict__ # (filesystem not used yet) False >>> seq3.close() - >>> seq3._filesystem_used() + >>> '_dir' in seq3.__dict__ # (filesystem still not used at all) False >>> with FilePagedSequence(page_size=3) as seq4: - ... not seq4._filesystem_used() + ... '_dir' not in seq4.__dict__ # (filesystem not used yet) ... seq4.append(('foo', 1)) ... list(seq4) == [('foo', 1)] ... seq4[0] = 'bar', 2 @@ -1057,9 +1062,9 @@ class FilePagedSequence(MutableSequence): ... seq4.append({'x'}) ... seq4.append({'z': 3}) ... list(seq4) == [('bar', 2), {'x'}, {'z': 3}] - ... not seq4._filesystem_used() + ... '_dir' not in seq4.__dict__ # (filesystem still not used, yet) ... seq4.append(['d']) - ... seq4._filesystem_used() + ... '_dir' in seq4.__dict__ # (filesystem used) ... _dir = seq4._dir ... osp.exists(_dir) ... sorted(os.listdir(_dir)) == ['0'] @@ -1080,7 +1085,7 @@ class FilePagedSequence(MutableSequence): True True True - >>> seq4._filesystem_used() + >>> '_dir' in seq4.__dict__ # (filesystem no longer used) False >>> osp.exists(_dir) False @@ -1091,6 +1096,8 @@ def __init__(self, iterable=(), page_size=1000): self._cur_len = 0 self._cur_page_no = None self._cur_page_data = [] + self._dir_lifecycle_op_rlock = threading.RLock() + self._dir_finalizer = lambda: None self.extend(iterable) def __repr__(self): @@ -1181,24 +1188,43 @@ def clear(self): def close(self): self.clear() - if self._filesystem_used(): - self._do_filesystem_cleanup() + self._dir_clear() # # Non-public stuff @functools.cached_property def _dir(self): + dir_rlock = self._dir_lifecycle_op_rlock + with dir_rlock: + temp_dir = self.__dict__.get('_dir') + if temp_dir is None: + temp_dir = self._make_temp_dir() + self._dir_finalizer = weakref.finalize( + self, + self._do_filesystem_cleanup, + temp_dir, + dir_rlock) + # Note: the machinery of `weakref.finalize()` automatically + # ensures that the finalizer will be called at program exit + # if it is not called earlier. + assert self._dir_finalizer.atexit + return temp_dir + + def _dir_clear(self): + with self._dir_lifecycle_op_rlock: + self.__dict__.pop('_dir', None) + self._dir_finalizer() + + def _make_temp_dir(self): return tempfile.mkdtemp(prefix='n6-FilePagedSequence-tmp') - def _filesystem_used(self): - return '_dir' in self.__dict__ - - def _do_filesystem_cleanup(self): - for filename in os.listdir(self._dir): - os.remove(osp.join(self._dir, filename)) - os.rmdir(self._dir) - del self._dir # noqa + @staticmethod + def _do_filesystem_cleanup(temp_dir, dir_rlock): + with dir_rlock: + for filename in os.listdir(temp_dir): + os.remove(osp.join(temp_dir, filename)) + os.rmdir(temp_dir) def _local_index(self, index): if isinstance(index, slice): @@ -1305,17 +1331,12 @@ def _instance_mock(iterable=(), *, page_size=1000): class _FakeFilePagedSequence(FilePagedSequence): # noqa - def __init__(self, *args, **kwargs): - self.__fake_filesystem = {} - super().__init__(*args, **kwargs) + def _make_temp_dir(self): + return f'' - @functools.cached_property - def _dir(self, __counter=itertools.count()): - return f'' - - def _do_filesystem_cleanup(self): - self.__fake_filesystem.clear() - del self._dir # noqa + @staticmethod + def _do_filesystem_cleanup(*_): + fake_filesystem.clear() @contextlib.contextmanager def _writable_page_file(self, filename): @@ -1324,16 +1345,18 @@ def _writable_page_file(self, filename): yield f finally: pickled_page = f.getvalue() - self.__fake_filesystem[filename] = pickled_page + fake_filesystem[filename] = pickled_page @contextlib.contextmanager def _readable_page_file(self, filename): - pickled_page = self.__fake_filesystem.get(filename) + pickled_page = fake_filesystem.get(filename) if pickled_page is None: raise FileNotFoundError(errno.ENOENT, 'No such file or directory', filename) with io.BytesIO(pickled_page) as f: yield f + path_placeholder_counter = itertools.count() + fake_filesystem = {} obj = _FakeFilePagedSequence(iterable, page_size) # @@ -4878,30 +4901,13 @@ def limit_str(s, char_limit, cut_indicator='[...]', middle_cut=False): return s -# TODO: docs + tests -def is_pure_ascii(s): - if isinstance(s, str): - return s == s.encode('ascii', 'ignore').decode('ascii', 'ignore') - elif isinstance(s, (bytes, bytearray)): - return s == s.decode('ascii', 'ignore').encode('ascii', 'ignore') - else: - raise TypeError('{!a} is neither a `str` nor a `bytes`/`bytearray`'.format(s)) - - -# TODO: docs + tests -def lower_if_pure_ascii(s): - if is_pure_ascii(s): - return s.lower() - return s - - def as_bytes(obj, encode_error_handling='surrogatepass'): r""" Convert the given object to `bytes`. If the given object is a `str` -- encode it using `utf-8` with the error handler specified as the second argument, `encode_error_handling` - (whose default value is `'surrogatepass'`). # TODO: change the default to 'strict' (adjusting client code where needed...) + (whose default value is `'surrogatepass'`). If the given object is a `bytes`, `bytearray` or `memoryview`, or an object whose type provides the `__bytes__()` special method @@ -4943,31 +4949,32 @@ def as_bytes(obj, encode_error_handling='surrogatepass'): raise TypeError('{!a} cannot be converted to bytes'.format(obj)) -# TODO: doc, tests ### CR: db_event (and maybe some other stuff) uses different implementation ### -- fix it?? (unification needed??) # TODO: support ipaddress.* stuff... def ipv4_to_int(ipv4, accept_no_dot=False): - """ - Return, as int, an IPv4 address specified as a string or integer. + r""" + Return, as `int`, an IPv4 address specified as a `str` or `int`. - Args: + Args/kwargs: `ipv4`: IPv4 as a `str` (formatted as 4 dot-separated decimal numbers - or, if `accept_no_dot` is true, possible also as one decimal + or, if `accept_no_dot` is true, possibly also as one decimal number) or as an `int` number. `accept_no_dot` (bool, default: False): - If true -- accept `ipv4` as a string formatted as one decimal - number. + If true -- accept `ipv4` *also* as a `str` formatted as one + decimal number. Returns: - The IPv4 address as an int number. + The IPv4 address as an `int` number. Raises: - ValueError or TypeError. + `ValueError` or `TypeError`. >>> ipv4_to_int('193.59.204.91') 3241921627 + >>> ipv4_to_int('193.059.0204.91') # (for good or for bad, extra leading `0`s are ignored) + 3241921627 >>> ipv4_to_int('193.59.204.91 ') 3241921627 >>> ipv4_to_int(' 193 . 59 . 204.91') @@ -4976,12 +4983,45 @@ def ipv4_to_int(ipv4, accept_no_dot=False): 3241921627 >>> ipv4_to_int(3241921627) 3241921627 + >>> ipv4_to_int('3241921627', accept_no_dot=True) + 3241921627 + >>> ipv4_to_int(' 000003241921627 ', accept_no_dot=True) + 3241921627 + >>> ipv4_to_int('4294967295 ', accept_no_dot=True) + 4294967295 + >>> ipv4_to_int(4294967295) + 4294967295 + >>> ipv4_to_int('255.255.255.255') + 4294967295 + >>> from n6lib.const import ( + ... LACK_OF_IPv4_PLACEHOLDER_AS_INT, # 0 + ... LACK_OF_IPv4_PLACEHOLDER_AS_STR, # '0.0.0.0' + ... ) + >>> ipv4_to_int(LACK_OF_IPv4_PLACEHOLDER_AS_INT) == LACK_OF_IPv4_PLACEHOLDER_AS_INT + True + >>> ipv4_to_int(LACK_OF_IPv4_PLACEHOLDER_AS_STR) == LACK_OF_IPv4_PLACEHOLDER_AS_INT + True + >>> ipv4_to_int(' 0.\t000000. 0000.0000000000 ') == LACK_OF_IPv4_PLACEHOLDER_AS_INT + True + >>> ipv4_to_int(str(LACK_OF_IPv4_PLACEHOLDER_AS_INT), + ... accept_no_dot=True) == LACK_OF_IPv4_PLACEHOLDER_AS_INT + True + + >>> ipv4_to_int(str(LACK_OF_IPv4_PLACEHOLDER_AS_INT)) # doctest: +IGNORE_EXCEPTION_DETAIL + Traceback (most recent call last): + ... + ValueError: ... >>> ipv4_to_int('3241921627') # doctest: +IGNORE_EXCEPTION_DETAIL Traceback (most recent call last): ... ValueError: ... + >>> ipv4_to_int('3241921627', accept_no_dot=False) # doctest: +IGNORE_EXCEPTION_DETAIL + Traceback (most recent call last): + ... + ValueError: ... + >>> ipv4_to_int('193.59.204.91.123') # doctest: +IGNORE_EXCEPTION_DETAIL Traceback (most recent call last): ... @@ -4992,17 +5032,32 @@ def ipv4_to_int(ipv4, accept_no_dot=False): ... ValueError: ... + >>> ipv4_to_int(-1) # doctest: +IGNORE_EXCEPTION_DETAIL + Traceback (most recent call last): + ... + ValueError: ... + + >>> ipv4_to_int(4294967296) # doctest: +IGNORE_EXCEPTION_DETAIL + Traceback (most recent call last): + ... + ValueError: ... + >>> ipv4_to_int(32419216270000000) # doctest: +IGNORE_EXCEPTION_DETAIL Traceback (most recent call last): ... ValueError: ... - >>> ipv4_to_int('3241921627', accept_no_dot=True) - 3241921627 - >>> ipv4_to_int(' 3241921627 ', accept_no_dot=True) - 3241921627 - >>> ipv4_to_int('3241921627 ', accept_no_dot=True) - 3241921627 + >>> ipv4_to_int('-1', # doctest: +IGNORE_EXCEPTION_DETAIL + ... accept_no_dot=True) + Traceback (most recent call last): + ... + ValueError: ... + + >>> ipv4_to_int('4294967296', # doctest: +IGNORE_EXCEPTION_DETAIL + ... accept_no_dot=True) + Traceback (most recent call last): + ... + ValueError: ... >>> ipv4_to_int('32419216270000000', # doctest: +IGNORE_EXCEPTION_DETAIL ... accept_no_dot=True) @@ -5015,8 +5070,7 @@ def ipv4_to_int(ipv4, accept_no_dot=False): ... TypeError: ... - >>> ipv4_to_int(bytearray(b'3241921627'), - ... accept_no_dot=True) # doctest: +IGNORE_EXCEPTION_DETAIL + >>> ipv4_to_int(b'3241921627', accept_no_dot=True) # doctest: +IGNORE_EXCEPTION_DETAIL Traceback (most recent call last): ... TypeError: ... @@ -5053,26 +5107,28 @@ def ipv4_to_int(ipv4, accept_no_dot=False): ### -- fix it?? (unification needed??) # TODO: support stuff from the `ipaddress` std lib module... def ipv4_to_str(ipv4, accept_no_dot=False): - """ + r""" Return, as a `str`, the IPv4 address specified as a `str` or `int`. - Args: + Args/kwargs: `ipv4`: IPv4 as a `str` (formatted as 4 dot-separated decimal numbers - or, if `accept_no_dot` is true, possible also as one decimal + or, if `accept_no_dot` is true, possibly also as one decimal number) or as an `int` number. `accept_no_dot` (bool, default: False): - If true -- accept `ipv4` as a string formatted as one decimal - number. + If true -- accept `ipv4` *also* as a `str` formatted as one + decimal number. Returns: - The IPv4 address as a `str`. + The IPv4 address, in its normalized form, as a `str`. Raises: - ValueError or TypeError. + `ValueError` or `TypeError`. >>> ipv4_to_str('193.59.204.91') '193.59.204.91' + >>> ipv4_to_str('193.059.0204.91') # (for good or for bad, extra leading `0`s are ignored) + '193.59.204.91' >>> ipv4_to_str('193.59.204.91 ') '193.59.204.91' >>> ipv4_to_str(' 193 . 59 . 204.91') @@ -5081,12 +5137,45 @@ def ipv4_to_str(ipv4, accept_no_dot=False): '193.59.204.91' >>> ipv4_to_str(3241921627) '193.59.204.91' + >>> ipv4_to_str('3241921627', accept_no_dot=True) + '193.59.204.91' + >>> ipv4_to_str(' 000003241921627 ', accept_no_dot=True) + '193.59.204.91' + >>> ipv4_to_str('4294967295 ', accept_no_dot=True) + '255.255.255.255' + >>> ipv4_to_str(4294967295) + '255.255.255.255' + >>> ipv4_to_str('255.255.255.255') + '255.255.255.255' + >>> from n6lib.const import ( + ... LACK_OF_IPv4_PLACEHOLDER_AS_INT, # 0 + ... LACK_OF_IPv4_PLACEHOLDER_AS_STR, # '0.0.0.0' + ... ) + >>> ipv4_to_str(LACK_OF_IPv4_PLACEHOLDER_AS_STR) == LACK_OF_IPv4_PLACEHOLDER_AS_STR + True + >>> ipv4_to_str('\t0000 .\r\n0. 00\t.000000\t') == LACK_OF_IPv4_PLACEHOLDER_AS_STR + True + >>> ipv4_to_str(LACK_OF_IPv4_PLACEHOLDER_AS_INT) == LACK_OF_IPv4_PLACEHOLDER_AS_STR + True + >>> ipv4_to_str(str(LACK_OF_IPv4_PLACEHOLDER_AS_INT), + ... accept_no_dot=True) == LACK_OF_IPv4_PLACEHOLDER_AS_STR + True + + >>> ipv4_to_str(str(LACK_OF_IPv4_PLACEHOLDER_AS_INT)) # doctest: +IGNORE_EXCEPTION_DETAIL + Traceback (most recent call last): + ... + ValueError: ... >>> ipv4_to_str('3241921627') # doctest: +IGNORE_EXCEPTION_DETAIL Traceback (most recent call last): ... ValueError: ... + >>> ipv4_to_str('3241921627', accept_no_dot=False) # doctest: +IGNORE_EXCEPTION_DETAIL + Traceback (most recent call last): + ... + ValueError: ... + >>> ipv4_to_str('193.59.204.91.123') # doctest: +IGNORE_EXCEPTION_DETAIL Traceback (most recent call last): ... @@ -5097,17 +5186,32 @@ def ipv4_to_str(ipv4, accept_no_dot=False): ... ValueError: ... + >>> ipv4_to_str(-1) # doctest: +IGNORE_EXCEPTION_DETAIL + Traceback (most recent call last): + ... + ValueError: ... + + >>> ipv4_to_str(4294967296) # doctest: +IGNORE_EXCEPTION_DETAIL + Traceback (most recent call last): + ... + ValueError: ... + >>> ipv4_to_str(32419216270000000) # doctest: +IGNORE_EXCEPTION_DETAIL Traceback (most recent call last): ... ValueError: ... - >>> ipv4_to_str('3241921627', accept_no_dot=True) - '193.59.204.91' - >>> ipv4_to_str(' 3241921627 ', accept_no_dot=True) - '193.59.204.91' - >>> ipv4_to_str('3241921627 ', accept_no_dot=True) - '193.59.204.91' + >>> ipv4_to_str('-1', # doctest: +IGNORE_EXCEPTION_DETAIL + ... accept_no_dot=True) + Traceback (most recent call last): + ... + ValueError: ... + + >>> ipv4_to_str('4294967296', # doctest: +IGNORE_EXCEPTION_DETAIL + ... accept_no_dot=True) + Traceback (most recent call last): + ... + ValueError: ... >>> ipv4_to_str('32419216270000000', # doctest: +IGNORE_EXCEPTION_DETAIL ... accept_no_dot=True) @@ -5115,7 +5219,7 @@ def ipv4_to_str(ipv4, accept_no_dot=False): ... ValueError: ... - >>> ipv4_to_str(bytearray(b'193.59.204.91')) # doctest: +IGNORE_EXCEPTION_DETAIL + >>> ipv4_to_str(b'193.59.204.91') # doctest: +IGNORE_EXCEPTION_DETAIL Traceback (most recent call last): ... TypeError: ... @@ -5137,53 +5241,6 @@ def ipv4_to_str(ipv4, accept_no_dot=False): return '{0}.{1}.{2}.{3}'.format(*numbers) -# maybe TODO later: more tests -# maybe TODO: support ipaddress.* stuff?... -def is_ipv4(value): - r""" - Check if the given `str` value is a properly formatted IPv4 address. - - Attrs: - `value` (str): the value to be tested. - - Returns: - Whether the value is properly formatted IPv4 address: True or False. - - >>> is_ipv4('255.127.34.124') - True - >>> is_ipv4('192.168.0.1') - True - >>> is_ipv4(' 192.168.0.1 ') - False - >>> is_ipv4('192. 168.0.1') - False - >>> is_ipv4('192.168.0.0.1') - False - >>> is_ipv4('333.127.34.124') - False - >>> is_ipv4('3241921627') - False - >>> is_ipv4('www.nask.pl') - False - >>> is_ipv4('www.jaźń\udcdd.pl') - False - """ - fields = value.split(".") - if len(fields) != 4: - return False - for value in fields: - if not (value == value.strip() and ( - value == '0' or value.strip().lstrip('0'))): ## FIXME: 04.05.06.0333 etc. are accepted, should they??? - return False - try: - intvalue = int(value) - except ValueError: - return False - if intvalue > 255 or intvalue < 0: - return False - return True - - def import_by_dotted_name(dotted_name): """ Import an object specified by the given `dotted_name`. @@ -5779,20 +5836,6 @@ def int_id_to_hex(int_id, min_digit_num=0): return hex_id -def cleanup_src(): - """ - Delete all extracted resource files and directories, - logs a list of the file and directory names that could not be successfully removed. - [see: https://setuptools.readthedocs.io/en/latest/pkg_resources.html?highlight=cleanup_resources#resource-extraction] - """ - from n6lib.log_helpers import get_logger - _LOGGER = get_logger(__name__) - - fail_cleanup = cleanup_resources() - if fail_cleanup: - _LOGGER.warning('Fail cleanup resources: %a', fail_cleanup) - - def make_exc_ascii_str(exc=None): r""" Generate an ASCII-only string representing the (given) exception. diff --git a/N6Lib/n6lib/const.py b/N6Lib/n6lib/const.py index 138aa8b..b6958e3 100644 --- a/N6Lib/n6lib/const.py +++ b/N6Lib/n6lib/const.py @@ -1,4 +1,4 @@ -# Copyright (c) 2013-2021 NASK. All rights reserved. +# Copyright (c) 2013-2023 NASK. All rights reserved. import os.path as osp import re @@ -115,6 +115,11 @@ } +# the value used in Event DB to denote lack of +# IP address where NULL is not an option +LACK_OF_IPv4_PLACEHOLDER_AS_INT = 0 +LACK_OF_IPv4_PLACEHOLDER_AS_STR = '0.0.0.0' + # maximum length of a client organization identifier (related both to # items of the list being the `client` value in a RecordDict / REST API # query params dict / REST API result dict, and to the `org_id` value diff --git a/N6Lib/n6lib/data_backend_api.py b/N6Lib/n6lib/data_backend_api.py index f3fcd27..de91909 100644 --- a/N6Lib/n6lib/data_backend_api.py +++ b/N6Lib/n6lib/data_backend_api.py @@ -1,4 +1,4 @@ -# Copyright (c) 2013-2022 NASK. All rights reserved. +# Copyright (c) 2013-2023 NASK. All rights reserved. import base64 import collections @@ -38,6 +38,7 @@ from n6lib.class_helpers import singleton from n6lib.common_helpers import ( ascii_str, + as_str_with_minimum_esc, iter_grouped_by_attr, make_exc_ascii_str, memoized, @@ -70,6 +71,7 @@ from n6lib.url_helpers import ( PROVISIONAL_URL_SEARCH_KEY_PREFIX, normalize_url, + prepare_norm_brief, ) from n6lib.context_helpers import ( NoContextToExitFrom, @@ -1348,7 +1350,7 @@ def _preprocess_result_dict(self, result): url = result.get('url') if url_data is None: if url is not None and url.startswith(PROVISIONAL_URL_SEARCH_KEY_PREFIX): - LOGGER.warning( + LOGGER.error( '`url` (%a) starts with %a but no `url_data`! ' '(skipping this result dict)\n%s', url, @@ -1366,36 +1368,87 @@ def _preprocess_result_dict(self, result): PROVISIONAL_URL_SEARCH_KEY_PREFIX, event_tag) return None - if (not isinstance(url_data, dict)) or url_data.keys() != {'url_orig', 'url_norm_opts'}: + if (not isinstance(url_data, dict) + # specific set of keys is required: + or (url_data.keys() != {'orig_b64', 'norm_brief'} + and url_data.keys() != {'url_orig', 'url_norm_opts'}) # <- legacy format + # original URL should not be empty: + or (not url_data.get('orig_b64') + and not url_data.get('url_orig'))): LOGGER.error( '`url_data` (%a) is not valid! ' '(skipping this result dict)\n%s', url_data, event_tag) return None + # case of `url_data`-based matching - url_orig = base64.urlsafe_b64decode(url_data['url_orig']) - url_norm_opts = url_data['url_norm_opts'] + + url_orig_b64 = url_data.get('orig_b64') + if url_orig_b64 is not None: + url_norm_brief = url_data['norm_brief'] + else: + # dealing with legacy format (concerning older data stored in db) + url_orig_b64 = url_data['url_orig'] + _legacy_url_norm_opts = url_data['url_norm_opts'] + if _legacy_url_norm_opts != {'transcode1st': True, 'epslash': True, 'rmzone': True}: + raise ValueError(f'unexpected {_legacy_url_norm_opts=!a}') + url_norm_brief = prepare_norm_brief( + unicode_str=True, + merge_surrogate_pairs=True, + empty_path_slash=True, + remove_ipv6_zone=True) + + assert isinstance(url_orig_b64, str) + assert isinstance(url_norm_brief, str) + url_norm_cache = self._url_normalization_data_cache - url_norm_cache_key = tuple(sorted(url_norm_opts.items())) - url_norm_cache_item = url_norm_cache.get(url_norm_cache_key) + url_norm_cache_item = url_norm_cache.get(url_norm_brief) if url_norm_cache_item is not None: normalizer, param_urls_norm = url_norm_cache_item else: - normalizer = functools.partial(normalize_url, **url_norm_opts) + normalizer = functools.partial(normalize_url, norm_brief=url_norm_brief) param_urls = self._filtering_params.get('url.b64') - param_urls_norm = (frozenset(map(normalizer, param_urls)) - if param_urls is not None - else None) - url_norm_cache[url_norm_cache_key] = normalizer, param_urls_norm - result_url_norm = normalizer(url_orig) - if (param_urls_norm is not None and - result_url_norm not in param_urls_norm): + if param_urls is not None: + call_silencing_decode_err = self._call_silencing_decode_err + maybe_urls = (call_silencing_decode_err(normalizer, url) for url in param_urls) + param_urls_norm = frozenset(url for url in maybe_urls + if url is not None) + else: + param_urls_norm = None + url_norm_cache[url_norm_brief] = normalizer, param_urls_norm + + url_orig_bin = base64.urlsafe_b64decode(url_orig_b64) + url_normalized = normalizer(url_orig_bin) + + if param_urls_norm is not None and url_normalized not in param_urls_norm: # application-level filtering return None - result['url'] = result_url_norm + + result['url'] = ( + url_normalized if isinstance(url_normalized, str) + else as_str_with_minimum_esc(url_normalized)) + ## TODO later? + # orig_was_unicode = 'u' in url_norm_brief # ('u' corresponds to `unicode_str=True`) + # if orig_was_unicode: + # url_orig = url_orig_bin.decode('utf-8', 'surrogatepass') + # assert isinstance(url_orig, str) + # else: + # url_orig = url_orig_bin + # assert isinstance(url_orig, bytes) + # + # result['url'] = as_str_with_minimum_esc(url_normalized) + # result['url_orig_ascii'] = ascii(url_orig) + # result['url_orig_b64'] = url_orig_b64 return result + @staticmethod + def _call_silencing_decode_err(normalizer, url): + try: + return normalizer(url) + except UnicodeDecodeError: + return None + @staticmethod def _get_event_tag_for_logging(result): # type: (ResultDict) -> Str diff --git a/N6Lib/n6lib/data_selection_tools.py b/N6Lib/n6lib/data_selection_tools.py index 8d998a2..3386385 100644 --- a/N6Lib/n6lib/data_selection_tools.py +++ b/N6Lib/n6lib/data_selection_tools.py @@ -1,6 +1,12 @@ -# Copyright (c) 2015-2022 NASK. All rights reserved. +# Copyright (c) 2015-2023 NASK. All rights reserved. """ +This module provides tools to express *data selection conditions*; by +formulating a *condition* we specify which event data records shall be +*selected*, that is -- depending on the context -- which ones shall be +*chosen* (when filtering a data stream) or *searched* (when querying a +database). See the docs of the `Cond` base class for more details. + This module contains the following public stuff: * `CondBuilder` @@ -69,7 +75,13 @@ * `CondEqualityMergingTransformer` (base: `CondTransformer`) * `CondDeMorganTransformer` (base: `CondTransformer`) -(For more information -- see the docs of the classes.) +(For more information -- see the docs of those classes.) + +Note: some of the classes provided by this module make heavy use of +the `n6lib.common_helpers.OPSet` class -- which is an order-preserving +implementation of `collections.abc.Set`, i.e., such an implementation +that remembers the element insertion order (to learn more about how it +behaves, see the docs of `OPSet` itself). """ import collections @@ -103,7 +115,6 @@ OPSet, ascii_str, ip_str_to_int, - is_pure_ascii, iter_altered, ) @@ -283,12 +294,12 @@ class -- it offers a mini-DSL to create such instances in a ... TypeError: the `!=` operation is not supported - The rationale is that the behavior of the SQL `!=` operator when + The rationale is that the behavior of the SQL `!=` operator, when *NULL* values are involved (with the *three-valued logic*-based - *NULL*-propagating behavior) is confusingly different from the + *NULL*-propagating behavior), is confusingly different from the behavior of the Python `!=` operator when `None` values are involved (note that even if we banned `None` as the right hand side operand - -- as we do for the rest of the operators -- the problem could still + -- as we do for all supported operators -- the problem could still occur for the left hand side operand, that is, when values of the concerned field in some records being searched through were *NULL*). By leaving the `!=` operator unsupported, we avoid a lot of potential @@ -315,30 +326,30 @@ def __init__(self, rec_key: str): __hash__ = None - def __eq__(self, op_param: Hashable) -> 'EqualCond': + def __eq__(self, op_param: Hashable, /) -> 'EqualCond': return EqualCond._make(self._rec_key, op_param) # noqa # (unsupported; see the rationale in the docs of `CondBuilder`...) def __ne__(self, _) -> NoReturn: raise TypeError('the `!=` operation is not supported') - def __gt__(self, op_param: Hashable) -> 'GreaterCond': + def __gt__(self, op_param: Hashable, /) -> 'GreaterCond': return GreaterCond._make(self._rec_key, op_param) # noqa - def __ge__(self, op_param: Hashable) -> 'GreaterOrEqualCond': + def __ge__(self, op_param: Hashable, /) -> 'GreaterOrEqualCond': return GreaterOrEqualCond._make(self._rec_key, op_param) # noqa - def __lt__(self, op_param: Hashable) -> 'LessCond': + def __lt__(self, op_param: Hashable, /) -> 'LessCond': return LessCond._make(self._rec_key, op_param) # noqa - def __le__(self, op_param: Hashable) -> 'LessOrEqualCond': + def __le__(self, op_param: Hashable, /) -> 'LessOrEqualCond': return LessOrEqualCond._make(self._rec_key, op_param) # noqa - def in_(self, op_param: Iterable[Hashable]) -> 'Union[InCond, EqualCond, FixedCond]': + def in_(self, op_param: Iterable[Hashable], /) -> 'Union[InCond, EqualCond, FixedCond]': return InCond._make(self._rec_key, op_param) # noqa @overload - def between(self, op_param: Iterable[Hashable]) -> 'BetweenCond': + def between(self, op_param: Iterable[Hashable], /) -> 'BetweenCond': # The basic `between()`'s call variant: `op_param` # specified as one positional argument -- expected # to be a `(, )` tuple, or an @@ -346,7 +357,7 @@ def between(self, op_param: Iterable[Hashable]) -> 'BetweenCond': ... @overload - def between(self, min_value: Hashable, max_value: Hashable) -> 'BetweenCond': + def between(self, min_value: Hashable, max_value: Hashable, /) -> 'BetweenCond': # Another call variant, added for convenience: the # `op_param` items, `` and ``, # specified as two separate positional arguments. @@ -364,7 +375,7 @@ def between(self, *args): f'arguments ({arg_count} given)') return BetweenCond._make(self._rec_key, op_param) # noqa - def contains_substring(self, op_param: str) -> 'ContainsSubstringCond': + def contains_substring(self, op_param: str, /) -> 'ContainsSubstringCond': return ContainsSubstringCond._make(self._rec_key, op_param) # noqa def is_null(self) -> 'IsNullCond': @@ -1352,6 +1363,37 @@ class Cond(Hashable): >>> not5 in another_set or not6 in another_set or not7 in another_set False + >>> a_set & another_set == set() + True + >>> yet_another_set = { + ... not3, + ... cond_builder.true(), + ... } + >>> a_set & yet_another_set == { + ... cond_builder.true(), + ... } + True + >>> yet_another_set & another_set == { + ... not3, + ... } + True + >>> a_set - yet_another_set == { + ... cond_builder.or_(and1, not1, simple2), + ... } + True + >>> another_set - yet_another_set == { + ... not2, not4, + ... cond_builder['count'].between(1, 100), + ... } + True + >>> a_set ^ another_set == a_set | another_set == a_set | another_set | yet_another_set == { + ... cond_builder.or_(and1, not1, simple2), + ... cond_builder.true(), + ... not2, not3, not4, + ... cond_builder['count'].between(1, 100), + ... } + True + Specific features of constructors ================================= @@ -1589,13 +1631,13 @@ class Cond(Hashable): GreaterCond._make(key, val), EqualCond._make(key, val)) - (Note: whereas the second and third conditions are logically - equivalent to each other, *none* of them is equivalent to the - first one!) + (Note: the second and third conditions are logically equivalent + to each other, but -- in the *general case* -- *none* of them is + equivalent to the first one!) - TODO: add info about the possibiliy of multi-value items in - data records and about consequences: + TODO: add info about the possibiliy of multi-value items in data + records and about the consequences of that: * why e.g. * `x == 1 AND x == 3 AND x < 1 AND x > 3` may be *TRUE* * `NOT (x == 1) OR NOT (x == 3)` may be *FALSE* @@ -1642,7 +1684,7 @@ def _get_initialized_instance(cls, *init_args): _adapt_init_args: Callable[..., tuple[Hashable, ...]] - __init__: Callable + __init__: Callable[..., None] # @@ -1873,7 +1915,8 @@ def _adapt_init_args(cls, *given_init_args): assert (isinstance(subconditions, OPSet) and all(isinstance(subcond, Cond) for subcond in subconditions)) - # A few obvious reductions: + # A few obvious reductions (see the docs of `AndCond` and + # `OrCond`): # * skipping the *neutral* element, e.g.: # * `TRUE AND x` -> `x` @@ -1890,7 +1933,8 @@ def _adapt_init_args(cls, *given_init_args): subcond.subconditions if isinstance(subcond, cls) else (subcond,))) - # (note: *deduplication* is guaranteed thanks to using `OPSet`) + # (note: *deduplication* of subconditions is guaranteed thanks + # to using `OPSet`) return (subconditions,) @classmethod @@ -1898,26 +1942,27 @@ def _get_initialized_instance(cls, subconditions): assert (isinstance(subconditions, OPSet) and all(isinstance(subcond, Cond) for subcond in subconditions)) - # A few other obvious reductions: + # A few other obvious reductions (see the docs of `AndCond` and + # `OrCond`): - # * unwrapping the subcondition (if only one) + # * unwrapping the subcondition, if there is only one if len(subconditions) == 1: (subcond,) = subconditions return subcond - # * getting the *neutral* element (if no subconditions), e.g.: + # * getting the *neutral* element, if there are no subconditions: # * `ALL of ` -> `TRUE` # * `ANY of ` -> `FALSE` if not subconditions: return FixedCond._make(cls._neutral_truthness) - # * reducing to the *absorbing* element (if present), e.g.: + # * reducing to the *absorbing* element, if it is present: # * `FALSE AND ` -> `FALSE` # * `TRUE OR ` -> `TRUE` if FixedCond._make(cls._absorbing_truthness) in subconditions: return FixedCond._make(cls._absorbing_truthness) - # * making use of the *complement* law (if applicable), e.g.: + # * making use of the *complement* law, if applicable: # * `x AND (NOT x) [AND ]` -> `FALSE` # * `x OR (NOT x) [OR ]` -> `TRUE` negated_subconditions = OPSet(map(NotCond._make, subconditions)) @@ -1997,7 +2042,7 @@ def _adapt_rec_key(cls, rec_key): f"{cls.__qualname__}'s constructor requires `rec_key` " f"being a str (got: {rec_key!r} which is an instance " f"of {rec_key.__class__.__qualname__})")) - if not is_pure_ascii(rec_key): + if not rec_key.isascii(): raise ValueError(ascii_str( f"{cls.__qualname__}'s constructor requires `rec_key` " f"being an ASCII-only str (got: {rec_key!r})")) diff --git a/N6Lib/n6lib/data_spec/_data_spec.py b/N6Lib/n6lib/data_spec/_data_spec.py index a33fc18..a442437 100644 --- a/N6Lib/n6lib/data_spec/_data_spec.py +++ b/N6Lib/n6lib/data_spec/_data_spec.py @@ -1,4 +1,4 @@ -# Copyright (c) 2013-2022 NASK. All rights reserved. +# Copyright (c) 2013-2023 NASK. All rights reserved. # Terminology: some definitions and synonyms @@ -146,6 +146,7 @@ from n6lib.const import ( CATEGORY_ENUMS, CONFIDENCE_ENUMS, + LACK_OF_IPv4_PLACEHOLDER_AS_STR, ORIGIN_ENUMS, PROTO_ENUMS, RESTRICTION_ENUMS, @@ -596,6 +597,11 @@ class N6DataSpec(DataSpec): # special field -- final cleaned results do *not* include it enriched = EnrichedFieldForN6() + ### TODO later? + # # fields added by the *data backend API* (not stored in Event DB) + # url_orig_ascii = UnicodeFieldForN6(in_result='optional') + # url_orig_b64 = URLBase64FieldForN6(in_result='optional') + # fields related to some particular parsers # * of various specialized field types: @@ -726,6 +732,8 @@ class N6DataSpec(DataSpec): 'subject_common_name', 'sysdesc', 'tags', + #'url_orig_ascii', # TODO later? + #'url_orig_b64', # TODO later? 'url_pattern', 'urls_matched', 'user_agent', @@ -1071,37 +1079,56 @@ def _result_with_unpacked_custom(self, result): ### probably must be adjusted when switching to the new DB schema def _preclean_address_related_items(self, result): + LACK_OF_IP = LACK_OF_IPv4_PLACEHOLDER_AS_STR # noqa event_tag = self._get_event_tag_for_logging(result) + address = result.pop('address', None) - address_item = { - key: value for key, value in [ - ('ip', result.pop('ip', None)), - ('asn', result.pop('asn', None)), - ('cc', result.pop('cc', None))] - if value is not None} + lone_ip = result.pop('ip', None) + lone_asn = result.pop('asn', None) + lone_cc = result.pop('cc', None) + if address is not None: if address: - # DEBUGGING #3141 try: new_address = [ - {key: value for key, value in addr.items() - if value is not None} - for addr in address] - if new_address != address: + addr for addr in address + if addr.get('ip') not in (None, LACK_OF_IP)] + if len(new_address) != len(address): LOGGER.warning( - 'values being None in the address: %a\n%s', + f'skipping address items whose `ip` values are ' + f'missing, None or equal to {LACK_OF_IP!a} ' + f'(whole original `address` was: %a)\n%s', address, event_tag) - address = new_address + for i, addr in enumerate(new_address): + new_address[i] = new_addr = { + key: value for key, value in addr.items() + if value is not None} + if len(new_addr) != len(addr): + LOGGER.warning( + 'skipping `asn`/`cc` value(s) being None ' + 'in the address item %a (whole original ' + '`address` was: %a)\n%s', + addr, address, event_tag) + address = new_address or None + # DEBUGGING #3141: except AttributeError as exc: - exc_str = str(exc) - if "no attribute 'items'" in exc_str: - raise AttributeError('{0} [`address`: {1!a}]'.format(exc_str, address)) + from n6sdk.encoding_helpers import ascii_str + exc_str = ascii_str(exc) + if ("no attribute 'get'" in exc_str) or ("no attribute 'items'" in exc_str): + raise AttributeError(f'{exc_str} [{event_tag=!a}; {address=!a}]') from exc else: raise else: LOGGER.warning('empty address: %a\n%s', address, event_tag) address = None - if address_item: + + if lone_ip is not None: + assert lone_ip != LACK_OF_IP # (<- already guaranteed by data backend API stuff) + address_item = {'ip': lone_ip} + if lone_asn is not None: + address_item['asn'] = lone_asn + if lone_cc is not None: + address_item['cc'] = lone_cc if address is None: LOGGER.warning( 'address does not exist but it should and it should ' @@ -1113,6 +1140,7 @@ def _preclean_address_related_items(self, result): LOGGER.error( 'data inconsistency detected: item %a is not in the ' 'address %a\n%s', address_item, address, event_tag) + if address is not None: result['address'] = address diff --git a/N6Lib/n6lib/data_spec/fields.py b/N6Lib/n6lib/data_spec/fields.py index bac3c88..51a948f 100644 --- a/N6Lib/n6lib/data_spec/fields.py +++ b/N6Lib/n6lib/data_spec/fields.py @@ -1,4 +1,4 @@ -# Copyright (c) 2013-2021 NASK. All rights reserved. +# Copyright (c) 2013-2023 NASK. All rights reserved. # Terminology: some definitions and synonyms @@ -13,7 +13,10 @@ from n6lib.common_helpers import ascii_str from n6lib.class_helpers import is_seq_or_set -from n6lib.const import CLIENT_ORGANIZATION_MAX_LENGTH +from n6lib.const import ( + CLIENT_ORGANIZATION_MAX_LENGTH, + LACK_OF_IPv4_PLACEHOLDER_AS_STR, +) from n6sdk.data_spec.fields import ( Field, AddressField, @@ -151,10 +154,27 @@ def _ensure_no_multiple_access_qualifiers(self, a_set): ', '.join(sorted(map(ascii, found_access_qual))))) +# internal helper field class +# (TODO later: to be merged to `IPv4Field` when SDK is merged to `n6lib` +# -- then its tests and tests of `AddressField` should should also +# include the behavior provided here, regarding `0.0.0.0`...) + +class _IPv4FieldExcludingLackOfPlaceholder(IPv4Field): + def _validate_value(self, value): + if value == LACK_OF_IPv4_PLACEHOLDER_AS_STR: + raise FieldValueError(public_message=( + f'IPv4 address "{LACK_OF_IPv4_PLACEHOLDER_AS_STR}" is disallowed')) + super()._validate_value(value) + + # n6lib versions of field classes defined in SDK: class AddressFieldForN6(AddressField, FieldForN6): - pass + key_to_subfield_factory = { + u'ip': _IPv4FieldExcludingLackOfPlaceholder, + u'cc': CCField, + u'asn': ASNField, + } class AnonymizedIPv4FieldForN6(AnonymizedIPv4Field, FieldForN6): pass @@ -192,7 +212,7 @@ class IBANSimplifiedFieldForN6(IBANSimplifiedField, FieldForN6): class IntegerFieldForN6(IntegerField, FieldForN6): pass -class IPv4FieldForN6(IPv4Field, FieldForN6): +class IPv4FieldForN6(_IPv4FieldExcludingLackOfPlaceholder, FieldForN6): pass class IPv4NetFieldForN6(IPv4NetField, FieldForN6): @@ -264,25 +284,35 @@ class URLBase64FieldForN6(UnicodeField, FieldForN6): # want the length limit; probably, in the future, the limit will # be removed also from URLFieldForN6) - # Note: Here the following two `bytes->str` decoding options apply - # *only* to an URL when it has already been *decoded* from Base64. - encoding = 'utf-8' - decode_error_handling = 'utf8_surrogatepass_and_surrogateescape' + encoding = 'ascii' + disallow_empty = True _URLSAFE_B64_VALID_CHARACTERS = frozenset(string.ascii_letters + '0123456789' + '-_=') assert len(_URLSAFE_B64_VALID_CHARACTERS) == 65 # 64 encoding chars and padding char '=' - def _fix_value(self, value): - if isinstance(value, (bytes, bytearray)): - # (note: eventually, only a subset of ASCII will be accepted anyway...) - value = value.decode('utf-8', 'surrogatepass') - assert isinstance(value, str) # (already guaranteed thanks to `UnicodeField`s stuff...) + def clean_param_value(self, value): + # (the input is URL-safe-Base64-encoded + possibly also %-encoded...) + value = super().clean_param_value(value) + assert isinstance(value, str) value = self._stubbornly_unquote(value) + value = value.rstrip('\r\n') # some Base64 encoders like to append a newline... value = self._urlsafe_b64decode(value) - value = super(URLBase64FieldForN6, self)._fix_value(value) - assert isinstance(value, str) # (already guaranteed thanks to `UnicodeField`s stuff...) + assert isinstance(value, bytes) + # (the output is raw/binary) return value + def clean_result_value(self, value): + raise TypeError("it's a param-only field") + ### TODO later? + # # (the input is either raw/binary or already URL-safe-Base64-encoded) + # if not isinstance(value, str): + # value = base64.urlsafe_b64encode(value) + # value = super().clean_result_value(value) + # assert isinstance(value, str) + # self._urlsafe_b64decode(value) # just validate, ignore decoding result + # # (the output is always URL-safe-Base64-encoded) + # return value + def _stubbornly_unquote(self, value): # Note: we can assume that the value has been unquoted (from # %-encoding) by the Pyramid stuff, but the following stubborn @@ -308,19 +338,18 @@ def _stubbornly_unquote(self, value): return value def _urlsafe_b64decode(self, value): - value = value.rstrip('\r\n') # some encoders like to append a newline... try: # `base64.urlsafe_b64decode()` just ignores illegal # characters *but* we want to be *more strict* if not self._URLSAFE_B64_VALID_CHARACTERS.issuperset(value): raise ValueError value = base64.urlsafe_b64decode(value) - except ValueError: - # (^ also `binascii.Error` may be raised but + except ValueError as exc: + # (^ also `binascii.Error` may be raised, but # it is a subclass of `ValueError` anyway) raise FieldValueError(public_message=( - '"{}" is not a valid URL-safe-Base64-encoded string ' - '[see: RFC 4648, section 5]'.format(ascii_str(value)))) + f'"{ascii_str(value)}" is not a valid URL-safe-Base64' + f'-encoded string [see: RFC 4648, section 5]')) from exc return value @@ -367,7 +396,7 @@ def clean_result_value(self, value): # for RecordDict['enriched'] -# (see the comment in the code of n6.utils.enrich.Enricher.enrich()) +# (see the comment in the code of n6datapipeline.enrich.Enricher.enrich()) class EnrichedFieldForN6(FieldForN6): enrich_toplevel_keys = ('fqdn',) diff --git a/N6Lib/n6lib/db_events.py b/N6Lib/n6lib/db_events.py index 828d25f..1640bc8 100644 --- a/N6Lib/n6lib/db_events.py +++ b/N6Lib/n6lib/db_events.py @@ -1,4 +1,4 @@ -# Copyright (c) 2013-2021 NASK. All rights reserved. +# Copyright (c) 2013-2023 NASK. All rights reserved. # # For some portions of the code (marked in the comments as copied from # SQLAlchemy -- which is a library licensed under the MIT license): @@ -35,6 +35,10 @@ ip_network_tuple_to_min_max_ip, ip_str_to_int, ) +from n6lib.const import ( + LACK_OF_IPv4_PLACEHOLDER_AS_INT, + LACK_OF_IPv4_PLACEHOLDER_AS_STR, +) from n6lib.data_spec import N6DataSpec from n6lib.data_spec.typing_helpers import ResultDict from n6lib.datetime_helpers import parse_iso_datetime_to_utc @@ -60,17 +64,15 @@ class IPAddress(sqlalchemy.types.TypeDecorator): # `ip` cannot be NULL as it is part of the primary key ### XXX: but whis field is used also for `dip` -- should it??? (see: #3490) - NONE = 0 - NONE_STR = '0.0.0.0' def process_bind_param(self, value, dialect): if value == -1: ## CR: remove or raise a loud error? (anything uses -1???) - return self.NONE + return LACK_OF_IPv4_PLACEHOLDER_AS_INT if value is None: ## XXX: ensure that process_bind_param() is (not?) called ## by the SQLAlchemy machinery when `ip` value is None - return self.NONE + return LACK_OF_IPv4_PLACEHOLDER_AS_INT if isinstance(value, int): return value try: @@ -79,7 +81,7 @@ def process_bind_param(self, value, dialect): raise ValueError def process_result_value(self, value, dialect): - if value is None or value == self.NONE: + if value is None or value == LACK_OF_IPv4_PLACEHOLDER_AS_INT: return None return socket.inet_ntoa(value.to_bytes(4, 'big')) @@ -221,7 +223,7 @@ def __init__(self, **kwargs): # NULL in our SQL db; and apparently, for unknown reason, # XXX: <- check whether that's true... # IPAddress.process_bind_param() is not called by the # SQLAlchemy machinery if the value of `ip` is just None) - kwargs['ip'] = IPAddress.NONE_STR + kwargs['ip'] = LACK_OF_IPv4_PLACEHOLDER_AS_STR kwargs['time'] = parse_iso_datetime_to_utc(kwargs["time"]) kwargs['expires'] = ( parse_iso_datetime_to_utc(kwargs.get("expires")) @@ -261,12 +263,12 @@ def like_query(cls, key, value): def url_b64_experimental_query(cls, key, value): # *EXPERIMENTAL* (likely to be changed or removed in the future # without any warning/deprecation/etc.) - if key != 'url.b64': - raise AssertionError("key != 'url.b64' (but == {!a})".format(key)) - db_key = 'url' - url_search_keys = list(map(make_provisional_url_search_key, value)) - return or_(getattr(cls, db_key).in_(value), - getattr(cls, db_key).in_(url_search_keys)) + expected_key = 'url.b64' + if key != expected_key: + raise AssertionError(f'key != {expected_key!a} (got: {key = !a})') + assert all(isinstance(url, bytes) for url in value) + url_search_keys = value + list(map(make_provisional_url_search_key, value)) + return cls.url.in_(url_search_keys) @classmethod def ip_net_query(cls, key, value): @@ -275,7 +277,9 @@ def ip_net_query(cls, key, value): raise AssertionError queries = [] for val in value: - min_ip, max_ip = ip_network_tuple_to_min_max_ip(val) + min_ip, max_ip = ip_network_tuple_to_min_max_ip( + val, + force_min_ip_greater_than_zero=True) queries.append(and_(cls.ip >= min_ip, cls.ip <= max_ip)) return or_(*queries) @@ -324,7 +328,11 @@ def to_raw_result_dict(self): # possible "no IP" placeholder values (such that they # cause recording `ip` in db as 0) -- excluding None -_NO_IP_PLACEHOLDERS = frozenset([IPAddress.NONE_STR, IPAddress.NONE, -1]) +_NO_IP_PLACEHOLDERS = frozenset({ + LACK_OF_IPv4_PLACEHOLDER_AS_STR, + LACK_OF_IPv4_PLACEHOLDER_AS_INT, + -1, # <- legacy placeholder +}) def make_raw_result_dict(column_values_source_object, # getattr() will be used on it to get values diff --git a/N6Lib/n6lib/generate_test_events.py b/N6Lib/n6lib/generate_test_events.py index 02f41a8..3449ae5 100644 --- a/N6Lib/n6lib/generate_test_events.py +++ b/N6Lib/n6lib/generate_test_events.py @@ -1,4 +1,4 @@ -# Copyright (c) 2013-2021 NASK. All rights reserved. +# Copyright (c) 2013-2023 NASK. All rights reserved. import copy import datetime @@ -8,19 +8,36 @@ import socket import string import urllib.parse +from collections.abc import ( + Iterator, + Mapping, +) +from typing import Optional import radar +from n6lib.class_helpers import attr_required from n6lib.common_helpers import as_bytes -from n6lib.config import ConfigMixin +from n6lib.config import ( + ConfigError, + ConfigMixin, + ConfigSection, + combined_config_spec, +) from n6lib.const import ( CATEGORY_ENUMS, CONFIDENCE_ENUMS, + LACK_OF_IPv4_PLACEHOLDER_AS_INT, ORIGIN_ENUMS, PROTO_ENUMS, STATUS_ENUMS, ) +from n6lib.data_spec.typing_helpers import ( + ParamsDict, + ResultDict, +) from n6lib.log_helpers import get_logger +from n6lib.typing_helpers import AccessZone LOGGER = get_logger(__name__) @@ -33,7 +50,46 @@ class AttributeCreationError(Exception): """ -class RandomEvent(ConfigMixin): +class RandomEventGeneratorConfigMixin(ConfigMixin): + + # This attribute *must* be set in concrete + # subclasses to a config section name. + generator_config_section: str = None + + # This attribute *may* (but does not need to) + # be extended in subclasses by overriding it + # with another `combined_config_spec(...)`. + config_spec_pattern = combined_config_spec(''' + [{generator_config_section}] + + possible_event_attributes :: list_of_str + required_event_attributes :: list_of_str + dip_categories :: list_of_str + port_attributes :: list_of_str + md5_attributes :: list_of_str + + possible_cc_in_address :: list_of_str + possible_client :: list_of_str + possible_fqdn :: list_of_str + possible_url :: list_of_str + possible_name :: list_of_str + possible_source :: list_of_str + possible_restriction :: list_of_str + possible_target :: list_of_str + + seconds_max = 180000 :: int + expires_days_max = 8 :: int + random_ips_max = 5 :: int + ''') + + @attr_required('generator_config_section', 'config_spec_pattern') + def obtain_config(self, settings: Optional[Mapping] = None) -> ConfigSection: + return self.get_config_section( + settings, + generator_config_section=self.generator_config_section) + + +class RandomEvent(RandomEventGeneratorConfigMixin): """ A class used to generate random events. @@ -55,9 +111,9 @@ class RandomEvent(ConfigMixin): `_POSSIBLE_VALS_PREFIX`, containing a list of possible values. Event will include randomly chosen value from the list. - * By adding the attribute's name to the `port_values` list in - the config, if it is a port number, or the `md5_values` list, if - it is an MD5 hash value. A proper value will be returned + * By adding the attribute's name to the `port_attributes` list in + the config, if it is a port number, or the `md5_attributes` list, + if it is an MD5 hash value. A proper value will be returned for this kind of attribute. In last two cases, values in `_params`, if available, have @@ -70,48 +126,36 @@ class RandomEvent(ConfigMixin): By default, it is randomly chosen, based on the `_RANDOM_CHOICE_CRITERION`, if an attribute will be included in an event. To force it to be always included, - add it to the `required_attributes` list in the config. + add it to the `required_event_attributes` list in the config. """ - config_spec = ''' - [generator_rest_api] - possible_event_attributes :: json - required_attributes :: json - dip_categories :: json - port_values :: json - md5_values :: json - possible_cc_codes :: json - possible_client :: json - possible_domains :: json - possible_url :: json - possible_restriction :: json - possible_source :: json - possible_target :: json - event_name=test event - seconds_max :: int - expires_days_max :: int - random_ips_max :: int - ''' + generator_config_section = 'generator_rest_api' + _GETTER_PREFIX = '_get_' _POSSIBLE_VALS_PREFIX = '_possible_' _RANDOM_CHOICE_CRITERION = (True, True, False) # regexes used for a simple validation of an input data from # special parameters (fqdn.sub and url.sub) - _LEGAL_FQDN_REGEX = re.compile(r'[a-zA-Z0-9\.-]*', re.ASCII) - _LEGAL_URL_REGEX = re.compile(r'[a-zA-Z0-9\.-\/:]*', re.ASCII) - - @staticmethod - def generate_multiple_event_data(num_of_events, - settings=None, - access_zone=None, - client_id=None, - params=None): + _LEGAL_FQDN_REGEX = re.compile(r'[a-zA-Z0-9.-]*', re.ASCII) + _LEGAL_URL_REGEX = re.compile(r'[a-zA-Z0-9.-/:]*', re.ASCII) + + @classmethod + def generate_multiple_event_data(cls, + num_of_events: int, + *, + settings: Optional[Mapping] = None, + access_zone: Optional[AccessZone] = None, + client_id: Optional[str] = None, + params: Optional[ParamsDict] = None, + **kwargs) -> Iterator[ResultDict]: """ Generate a given number of random events. Args/kwargs: `num_of_events`: - A number of random events to generate. + A number of random events to generate. + + Kwargs (keyword-only): `settings` (optional; default: None): A dict containing Pyramid-like settings, that will override a configuration from config files, @@ -122,46 +166,104 @@ def generate_multiple_event_data(num_of_events, Name of a client making the request. `params` (optional; default: None): Parameters from the request. + Any extra keyword arguments: + To be passed (together with most of the above arguments) + to the main constructor. Note that the `RandomEvent`'s + one does *not* accept any extra keyword arguments (yet + hypothetical subclasses may add support for some...). Yields: - Generated random events. + Generated random events (as dicts). """ + ready_config = cls(settings=settings).config for _ in range(num_of_events): - yield RandomEvent( - params=params, - settings=settings, + yield cls( + ready_config=ready_config, access_zone=access_zone, - client_id=client_id).event + client_id=client_id, + params=params, + **kwargs, + ).event + + def __init__(self, *, + ready_config: Optional[ConfigSection] = None, + settings: Optional[Mapping] = None, + access_zone: Optional[AccessZone] = None, + client_id: Optional[str] = None, + params: Optional[ParamsDict] = None, + **kwargs): + """ + Kwargs (keyword-only): + `ready_config` (optional; default: None): + If not `None`, it should be ready `ConfigSection` + mapping; then `settings` must be omitted or `None`. + Other keyword arguments: + See: all keyword-only arguments accepted by the + `generate_multiple_event_data()` class method. + """ - def __init__(self, settings=None, params=None, access_zone=None, client_id=None): - self._config_init(settings) - self._params = {} - if params is not None: - self._params = copy.deepcopy(params) + # (note: in the case of the `RandomEvent` class itself, the + # `kwargs` dict needs to be empty, yet hypothetical subclasses + # may support additional keyword arguments...) + super().__init__(**kwargs) + + self._config_init(ready_config, settings) self._access_zone = access_zone self._client_id = client_id + self._params = copy.deepcopy(params) if params is not None else {} + self._min_ip = 1 # 0.0.0.1 + assert self._min_ip > LACK_OF_IPv4_PLACEHOLDER_AS_INT self._max_ip = 0xfffffffe # 255.255.255.254 self._current_datetime = datetime.datetime.utcnow() self._attributes_init() - def _config_init(self, settings): - self.config = self.get_config_section(settings) - self._possible_attrs = self.config.get('possible_event_attributes') - self._required_attrs = self.config.get('required_attributes') - self._event_name = self.config.get('event_name') - self._dip_categories = self.config.get('dip_categories') - self._possible_cc_codes = self.config.get('possible_cc_codes') - self._possible_client = self.config.get('possible_client') - self._possible_domains = self.config.get('possible_domains') - self._possible_url = self.config.get('possible_url') - self._possible_source = self.config.get('possible_source') - self._possible_restriction = self.config.get('possible_restriction') - self._possible_target = self.config.get('possible_target') - self._port_values = self.config.get('port_values') - self._md5_values = self.config.get('md5_values') - self._seconds_max = self.config.get('seconds_max') - self._expires_days_max = self.config.get('expires_days_max') - self._random_ips_max = self.config.get('random_ips_max') + def _config_init(self, + ready_config: Optional[ConfigSection], + settings: Optional[Mapping]) -> None: + + if ready_config is not None: + if settings is not None: + raise TypeError('specifying both `ready_config` and `settings` is not supported') + self.config = ready_config + else: + self.config = self.obtain_config(settings) + + self._possible_attrs = list(self.config['possible_event_attributes']) + for needed_attr in ['url', 'time', 'category']: # <- needed when creating other attrs + self._move_attr_to_beginning_if_present(needed_attr) + + self._required_attrs = frozenset(self.config['required_event_attributes']) + if illegal := self._required_attrs.difference(self._possible_attrs): + listing = ', '.join(sorted(map(ascii, illegal))) + raise ConfigError( + f'`required_event_attributes` should be a subset of ' + f'`possible_event_attributes` (the items present in ' + f'the former and not in the latter are: {listing})') + + self._dip_categories = frozenset(self.config['dip_categories']) + self._port_attributes = frozenset(self.config['port_attributes']) + self._md5_attributes = frozenset(self.config['md5_attributes']) + + self._possible_cc_in_address = list(self.config['possible_cc_in_address']) + self._possible_client = list(self.config['possible_client']) + self._possible_fqdn = list(self.config['possible_fqdn']) + self._possible_url = list(self.config['possible_url']) + self._possible_name = list(self.config['possible_name']) + self._possible_source = list(self.config['possible_source']) + self._possible_restriction = list(self.config['possible_restriction']) + self._possible_target = list(self.config['possible_target']) + + self._seconds_max = self.config['seconds_max'] + self._expires_days_max = self.config['expires_days_max'] + self._random_ips_max = self.config['random_ips_max'] + + def _move_attr_to_beginning_if_present(self, attr): + try: + self._possible_attrs.remove(attr) + except ValueError: + pass + else: + self._possible_attrs.insert(0, attr) def _attributes_init(self): self._possible_category = CATEGORY_ENUMS @@ -175,8 +277,10 @@ def _attributes_init(self): try: output_attribute = self._create_attribute(attr) except AttributeCreationError: - LOGGER.warning("No method could be assigned for attribute: '%s' and no values " - "were provided in request params.", attr) + LOGGER.warning( + 'No method of value generation could be found ' + 'for attribute %a and no values were provided ' + 'for it in request params (if any).', attr) if output_attribute is not None: self.event[attr] = output_attribute @@ -210,9 +314,9 @@ def _create_attribute(self, attr): # should be included if not self._include_in_event(attr): return None - if attr in self._port_values: + if attr in self._port_attributes: return self._get_value_for_port_attr() - if attr in self._md5_values: + if attr in self._md5_attributes: return self._get_value_for_md5_attr() possible_vals = getattr(self, self._POSSIBLE_VALS_PREFIX + attr, None) # if there is no specific method for a current attribute @@ -270,7 +374,7 @@ def _get_address(self): if param_ip_list: ip = random.choice(param_ip_list) else: - ip = self._int_to_ip(random.randint(1, self._max_ip)) + ip = self._int_to_ip(random.randint(self._min_ip, self._max_ip)) asn = None cc = None # do not include asn or cc if opt.primary param is True @@ -290,7 +394,7 @@ def _get_address(self): if param_cc_list: cc = random.choice(param_cc_list) else: - cc = random.choice(self._possible_cc_codes) + cc = random.choice(self._possible_cc_in_address) address_item = { key: value for key, value in [('ip', ip), ('asn', asn), ('cc', cc)] if value is not None} @@ -304,6 +408,7 @@ def _get_client(self): # assigned only to him attr_name = 'client' if self._access_zone == 'inside': + # XXX: shouldn't it be checked if `self._client_id` is not None? return [self._client_id] if self._attr_in_params(attr_name): return self._params[attr_name] @@ -317,16 +422,25 @@ def _get_dip(self): if self._attr_in_params(attr_name): return random.choice(self._params[attr_name]) if self._include_in_event(attr_name): - return self._int_to_ip(random.randint(1, self._max_ip)) + return self._int_to_ip(random.randint(self._min_ip, self._max_ip)) return None + @staticmethod + def _get_enriched(): + # maybe TODO later: implement it in a more interesting way? + return [[], {}] + def _get_expires(self): + # XXX: shouldn't it be done is such a way that: it is set only + # for bl events (obligatorily!) and then (and only then!) + # also `status` + optionally `replaces` would be set? max_expires = self._current_datetime + datetime.timedelta(days=self._expires_days_max) if self._include_in_event('expires'): return radar.random_datetime(self._current_datetime, max_expires) return None def _get_fqdn(self): + # XXX: is it justified that `fqdn` is not included when `opt.primary` is set? if self.event['category'] in self._dip_categories or self._is_opt_primary(): return None if self._attr_in_params('fqdn'): @@ -334,12 +448,13 @@ def _get_fqdn(self): if self._attr_in_params('fqdn.sub'): sub = random.choice(self._params['fqdn.sub']) cleaned_sub = self._clean_input_value(sub, self._LEGAL_FQDN_REGEX) - return self._get_matching_values(cleaned_sub, self._possible_domains) + return self._get_matching_values(cleaned_sub, self._possible_fqdn) + # XXX: is it justified that `fqdn` is not included when `opt.primary` is set? if self._is_opt_primary() or not self._include_in_event('fqdn'): return None if self._attr_in_params('url'): return self._url_to_domain(self.event['url']) - return random.choice(self._possible_domains) + return random.choice(self._possible_fqdn) def _get_modified(self): if self._include_in_event('modified'): @@ -351,7 +466,7 @@ def _get_name(self): if self._attr_in_params(attr_name): return random.choice(self._params[attr_name]) if self._include_in_event(attr_name): - return self._event_name + return random.choice(self._possible_name) return None def _get_proto(self): @@ -372,6 +487,7 @@ def _get_time(self): time_max = self._current_datetime if self._attr_in_params('time.min'): time_min = self._params['time.min'][0] + # XXX: is this logic related to `time.until` valid? elif self._attr_in_params('time.until'): time_min = self._params['time.min'][0] + hour else: @@ -381,6 +497,7 @@ def _get_time(self): return None def _get_url(self): + # XXX: is it justified that `url` is not included when `opt.primary` is set? if self.event["category"] in self._dip_categories or self._is_opt_primary(): return None if self._attr_in_params('url'): diff --git a/N6Lib/n6lib/http_helpers.py b/N6Lib/n6lib/http_helpers.py index 0db8ca4..c3af506 100644 --- a/N6Lib/n6lib/http_helpers.py +++ b/N6Lib/n6lib/http_helpers.py @@ -215,7 +215,7 @@ class RequestPerformer: # method tries -- as the last attempt -- ISO 8601 parsing) ) - def __init__(self, + def __init__(self, /, method, url, data=None, @@ -248,7 +248,7 @@ def __init__(self, self._chunk_size = chunk_size if stream else None @classmethod - def fetch(cls, *args, **kwargs): + def fetch(cls, /, *args, **kwargs): """ Download all content at once. @@ -384,4 +384,4 @@ def send(self, request, *args, **kwargs): 'specify `data` whose length is discoverable, ' 'or specify `retries=0`)') - return super(_HTTPAdapterForRetries, self).send(request, *args, **kwargs) + return super().send(request, *args, **kwargs) diff --git a/N6Lib/n6lib/mail_parsing_helpers.py b/N6Lib/n6lib/mail_parsing_helpers.py index 194e2a0..94bea70 100644 --- a/N6Lib/n6lib/mail_parsing_helpers.py +++ b/N6Lib/n6lib/mail_parsing_helpers.py @@ -178,16 +178,16 @@ class ParsedEmailMessage(email.message.EmailMessage): * `find_content()` -- get the content of exactly one component of the e-mail message -- such one that matches the given filtering criteria (each optional): - content type(s) and/or content/filename regexes; there is also an - option to get stuff extracted from an attachment in the ZIP, *gzip* - or *bzip2* format; + content type(s) and/or content/filename regexes; there is also a + possibility to get stuff extracted from an attachment in the ZIP, + *gzip* or *bzip2* format; * `find_filename_content_pairs()` -- iterate over `(filename, content)` pairs from those "leaf" components of the e-mail message that match the given filtering criteria (each optional): content type(s) and/or content/filename - regexes; there is also an option to include stuff extracted from - attachments in the ZIP, *gzip* and/or *bzip2* format(s); + regexes; there is also a possibility to include stuff extracted + from attachments in the ZIP, *gzip* and/or *bzip2* format(s); For more information, see the signatures and docs of these methods. @@ -240,6 +240,10 @@ class `email.message.EmailMessage` (note that `ParsedEmailMessage` * https://docs.python.org/3/library/email.policy.html#email.policy.EmailPolicy.content_manager * https://docs.python.org/3/library/email.message.html#email.message.EmailMessage.get_content * https://docs.python.org/3/library/email.contentmanager.html + + However, note also that `.get_content()`'s + return values are expected (by the `ParsedEmailMessage`'s machinery) + to be a `str` or `bytes`; otherwise a `NotImplementedError` is raised. """ @@ -363,8 +367,9 @@ def get_subject(self, *, normalize_whitespace: bool = True) -> Optional[str]: otherwise `None` will be returned). If the `normalize_whitespace` argument is true (it is by default) - then any trailing whitespace characters are removed and any other - series of whitespace characters are normalized to single spaces. + then any leading/trailing whitespace characters are removed and + any other series of whitespace characters are normalized to single + spaces. """ header = self['Subject'] if header is None: @@ -405,7 +410,7 @@ def find_content( multiple alternatives can be given as a list; if not given at all, there will be no filename-regex-based filtering; - note that a message component can have no filename associated + note that a message component may have no filename associated with it -- then the filename will be considered empty (so, for example, the `^$` regex will match it); @@ -430,7 +435,8 @@ def find_content( *** This method returns the content (*aka* payload) of the matching - message component. + message component. The returned value is as a `bytes` or `str` + object, or `None`. More precisely: the `find_filename_content_pairs()` method is called, and the first item yielded by the resultant iterator is @@ -445,7 +451,8 @@ def find_content( If there are more matching components than one, a `ValueError` is raised -- unless the `ignore_extra_matches` argument has been - explicitly set to `True`. + explicitly set to `True` (then the content of the first matching + component is returned). """ items = self.find_filename_content_pairs( @@ -458,7 +465,7 @@ def find_content( _, content = next(items, (None, None)) if (not ignore_extra_matches) and next(items, None) is not None: - raise ValueError( + raise ValueError( # TODO: introduce a specific subclass of `ValueError`... f'multiple components of the message match ' f'the following criteria: {content_type=!a}, ' f'{filename_regex=!a} {content_regex=!a}, ' @@ -497,7 +504,7 @@ def find_filename_content_pairs( multiple alternatives can be given as a list; if not given at all, there will be no filename-regex-based filtering; - note that a message component can have no filename associated + note that a message component may have no filename associated with it -- then the filename will be considered empty (so, for example, the `^$` regex will match it); @@ -742,7 +749,7 @@ def __generate_from_msg( force_content_as, ): if force_content_as in (str, 'str'): - content = as_unicode(content, decode_error_handling='replace') + content = as_unicode(content, decode_error_handling='replace') # maybe TODO: change to `surrogateescape`? assert isinstance(filename, str) assert (force_content_as is None and isinstance(content, (bytes, str)) diff --git a/N6Lib/n6lib/pyramid_commons/_tween_factories.py b/N6Lib/n6lib/pyramid_commons/_tween_factories.py index ee25ed2..0abe573 100644 --- a/N6Lib/n6lib/pyramid_commons/_tween_factories.py +++ b/N6Lib/n6lib/pyramid_commons/_tween_factories.py @@ -1,12 +1,12 @@ -# Copyright (c) 2021 NASK. All rights reserved. +# Copyright (c) 2021-2023 NASK. All rights reserved. """ -This modules provides custom Pyramid-compliant *tweens* (see: +This module provides custom Pyramid-compliant *tweens* (see: https://docs.pylonsproject.org/projects/pyramid/en/stable/narr/hooks.html#registering-tweens). """ -import collections import sys +from collections.abc import Iterator from pyramid.response import Response @@ -95,7 +95,7 @@ def auth_api_context_tween(request): try: response = handler(request) unwrapped_app_iter = getattr(response, 'app_iter', None) - if isinstance(unwrapped_app_iter, collections.Iterator): + if isinstance(unwrapped_app_iter, Iterator): response.app_iter = _auth_api_exiting_app_iter(unwrapped_app_iter) return response finally: diff --git a/N6Lib/n6lib/record_dict.py b/N6Lib/n6lib/record_dict.py index fc2d0cc..b9aeef6 100644 --- a/N6Lib/n6lib/record_dict.py +++ b/N6Lib/n6lib/record_dict.py @@ -1,4 +1,4 @@ -# Copyright (c) 2013-2022 NASK. All rights reserved. +# Copyright (c) 2013-2023 NASK. All rights reserved. # # TODO: more comments + docs @@ -24,11 +24,10 @@ ) from n6lib.common_helpers import ( LimitedDict, - ascii_str, as_bytes, + ascii_str, ipv4_to_str, make_exc_ascii_str, - try_to_normalize_surrogate_pairs_to_proper_codepoints, ) from n6lib.const import ( CATEGORY_TO_NORMALIZED_NAME, @@ -43,6 +42,7 @@ from n6lib.url_helpers import ( URL_SCHEME_AND_REST_LEGACY_REGEX, make_provisional_url_search_key, + prepare_norm_brief, ) @@ -396,7 +396,7 @@ class RecordDict(collections_abc.MutableMapping): # *EXPERIMENTAL* (likely to be changed or removed in the future # without any warning/deprecation/etc.) '_url_data', - '_url_data_ready', + '_url_data_ready', # <- legacy key, to be removed... # internal keys of aggregated items '_group', @@ -412,12 +412,13 @@ class RecordDict(collections_abc.MutableMapping): '_bl-current-time', } - # *EXPERIMENTAL* (likely to be changed or removed in the future - # without any warning/deprecation/etc.) - setitem_key_to_target_key = { - # (trick for non-idempotent adjusters...) - '_url_data': '_url_data_ready', - } + ### TODO later? + # }) - { + # # (not stored in Event DB, can be added to + # # query results by the *data backend API*) + # 'url_orig_ascii', + # 'url_orig_b64', + # } # for the following keys, if the given value is invalid, # AdjusterError is not propagated; instead the value is just @@ -527,13 +528,33 @@ def iter_db_items(self): # *EXPERIMENTAL* (likely to be changed or removed in the future # without any warning/deprecation/etc.) def _prepare_url_data_items(self, item_prototype, custom_items): - url_data = self.get('_url_data_ready') - if url_data is not None: + _url_data = self.get('_url_data') + if _url_data is not None: + assert isinstance(_url_data, dict) + assert isinstance(_url_data.get('orig_b64'), str) + assert isinstance(_url_data.get('norm_options'), dict) + # Set event's `url` to an EventDB-query-searchable key: + url_orig_bin = base64.urlsafe_b64decode(_url_data['orig_b64']) + item_prototype['url'] = make_provisional_url_search_key(url_orig_bin) # [sic] assert 'url_data' not in custom_items - assert isinstance(url_data.get('url_orig'), str) - url_orig = base64.urlsafe_b64decode(url_data['url_orig']) - item_prototype['url'] = make_provisional_url_search_key(url_orig) # [sic] + # Set event's `url_data` to a dict digestible by the relevant code in + # `n6lib.data_backend_api._EventsQueryProcessor._preprocess_result_dict()`: + url_data = _url_data.copy() + url_data['norm_brief'] = prepare_norm_brief(**url_data.pop('norm_options')) custom_items['url_data'] = url_data + ## ------------------------------------------------------------------------ + ## The following code fragment handles the `_url_data_ready` *LEGACY* key: + assert self.get('_url_data_ready') is None + else: + url_data = self.get('_url_data_ready') + if url_data is not None: + assert 'url_data' not in custom_items + assert isinstance(url_data.get('url_orig'), str) + url_orig_bin = base64.urlsafe_b64decode(url_data['url_orig']) + item_prototype['url'] = make_provisional_url_search_key(url_orig_bin) # [sic] + custom_items['url_data'] = url_data + ## -- to be removed... (TODO later) + ## ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ __repr__ = attr_repr('_dict') @@ -556,9 +577,8 @@ def __setitem__(self, key, value): ######## silently ignore the legacy item if key == '__preserved_custom_keys__': return ######## ^^^ (to be removed later) - target_key = self.setitem_key_to_target_key.get(key, key) try: - self._dict[target_key] = self._get_adjusted_value(key, value) + self._dict[key] = self._get_adjusted_value(key, value) except AdjusterError as exc: if key in self.without_adjuster_error: LOGGER.warning('Invalid value not stored (%s)', exc) @@ -704,26 +724,119 @@ def __exit__(self, exc_type, exc, tb): adjust__do_not_resolve_fqdn_to_ip = ensure_isinstance(bool) adjust__parsed_old = rd_adjuster - # *EXPERIMENTAL* internal field adjusters + # ---------------------------------------------------------------- + # *EXPERIMENTAL*: the `_url_data` field's adjuster # (likely to be changed or removed in the future # without any warning/deprecation/etc.) - adjust__url_data = make_dict_adjuster( - url_orig=chained( - unicode_surrogate_pass_and_esc_adjuster, - make_adjuster_applying_callable(try_to_normalize_surrogate_pairs_to_proper_codepoints), - make_adjuster_applying_callable(as_bytes), - make_adjuster_applying_callable(base64.urlsafe_b64encode), - unicode_adjuster, - ensure_validates_by_regexp(r'\A[0-9a-zA-Z\-_=]+\Z'), - ensure_not_longer_than(2 ** 17)), - url_norm_opts=make_dict_adjuster()) + + _MAX_URL_ORIG_B64_LENGTH = 2 ** 17 + + @preceded_by(make_dict_adjuster( + # Note: when a parser sets the `_url_data` mapping, it + # is required to contain only these items: `orig` and + # `norm_options`. + orig=ensure_isinstance(str, bytes, bytearray), # (<- note: not present in adjusted dict) + norm_options=make_dict_adjuster( # (see: `n6lib.url_helpers.normalize_url()`...) + merge_surrogate_pairs=ensure_isinstance(bool), + empty_path_slash=ensure_isinstance(bool), + remove_ipv6_zone=ensure_isinstance(bool), + + # The following flag is, typically, provided automatically + # by the adjuster's machinery (no need to set it explicitly). + unicode_str=ensure_isinstance(bool), + ), + + # The following item is, typically, provided + # automatically by the adjuster's machinery. + orig_b64=chained( + ensure_isinstance(str), + ensure_validates_by_regexp(r'\A[0-9a-zA-Z\-_=]+\Z'), # <- URL-safe Base64 variant + ensure_not_longer_than(_MAX_URL_ORIG_B64_LENGTH), + ), + )) + def adjust__url_data(self, value): + assert isinstance(value, dict) + return self._get_adjusted_url_data(**value) + + def _get_adjusted_url_data(self, *, + orig=None, + orig_b64=None, + norm_options, + **extra_items): + if orig is None and orig_b64 is None: + raise TypeError( + 'either `orig` or `orig_b64` ' + 'needs to be present') + if orig is not None and orig_b64 is not None: + raise TypeError( + 'either `orig` or `orig_b64` ' + 'needs to be present, ' + 'but not both') + assert isinstance(norm_options, dict) + + if orig_b64 is None: + # Here we are at the *parser* pipeline processing stage. + assert orig is not None + + if extra_items: + raise TypeError(f'illegal **extra_items present ({extra_items=!a})') + + if isinstance(orig, (bytes, bytearray)): + if 'unicode_str' not in norm_options: + norm_options['unicode_str'] = False + if norm_options['unicode_str']: + raise TypeError( + "`orig` is a bytes/bytearray, so " + "`norm_options['unicode_str']`, " + "if specified, should be False") + else: + assert isinstance(orig, str) + if 'unicode_str' not in norm_options: + norm_options['unicode_str'] = True + if not norm_options['unicode_str']: + raise TypeError( + "`orig` is a str, so " + "`norm_options['unicode_str']`, " + "if specified, should be True") + + if not orig: + raise ValueError('`orig` is empty') + + url_orig_bin = as_bytes(orig, 'surrogatepass') + orig_b64 = base64.urlsafe_b64encode(url_orig_bin).decode('ascii') + else: + # Here, typically, we are at some later pipeline processing + # stage than the *parser* stage (what means that the items + # of `_url_data` have already been prepared at the *parser* + # stage -- see the `if...` branch above). Note that here we + # accept any `extra_items` without complaining -- to ease + # transition if new keys are supported in the future... + assert orig_b64 is not None + assert orig is None + + assert isinstance(orig_b64, str) + if len(orig_b64) > self._MAX_URL_ORIG_B64_LENGTH: + raise ValueError( + f'length of `orig_b64` ({len(orig_b64)}) is greater ' + f'than the maximum ({self._MAX_URL_ORIG_B64_LENGTH})') + + return dict( + orig_b64=orig_b64, + norm_options=norm_options, + **extra_items, + ) + + ## *LEGACY*: the `_url_data_ready` internal field's adjuster + ## -- to be removed... (TODO later) adjust__url_data_ready = make_dict_adjuster( url_orig=chained( ensure_isinstance(str), ensure_validates_by_regexp(r'\A[0-9a-zA-Z\-_=]+\Z'), - ensure_not_longer_than(2 ** 17)), + ensure_not_longer_than(_MAX_URL_ORIG_B64_LENGTH)), url_norm_opts=make_dict_adjuster()) + # ---------------------------------------------------------------- + # hi-freq-only internal field adjusters adjust__group = unicode_adjuster adjust__first_time = chained( @@ -1020,3 +1133,4 @@ class BLRecordDict(RecordDict): assert _data_spec.all_result_keys == { key for key in _all_keys if key not in ('type', 'enriched') and not key.startswith('_')} +# ^ TODO later? assert _data_spec.all_result_keys - {'url_orig_ascii', 'url_orig_b64'} == { diff --git a/N6Lib/n6lib/ripe_api_client.py b/N6Lib/n6lib/ripe_api_client.py index 1b3a9de..0dfba36 100644 --- a/N6Lib/n6lib/ripe_api_client.py +++ b/N6Lib/n6lib/ripe_api_client.py @@ -1,4 +1,4 @@ -# Copyright (c) 2022 NASK. All rights reserved. +# Copyright (c) 2022-2023 NASK. All rights reserved. import json from typing import Optional @@ -80,19 +80,19 @@ class RIPEApiClient: * Creates unique URLs using each ASN/IP network, * Requests data from every - previously created - URL. - * Search for `admin-c` and `tech-c` keys/values. - * Creates unique URLs using values from `admin-c` and `tech-c` keys. - Usually there is more than one created URL, but not all of them - are valid. + * Search for `admin-c`, `tech-c` and `org` keys/values. + * Creates unique URLs using values from `admin-c` and `tech-c` + and `org` keys. Usually there is more than one created URL, + but not all of them are valid. Phase III (obtaining *attrs_data* from *unique details URLs* and abuse contact finder): * Downloads content - as we use to call it in code, the `attrs_data` - from any URL created based on the data - contained in the `admin-c` and `tech-c` keys. Note that - not every URL is valid, in case of a 404 error the URL - is skipped. + contained in the `admin-c`, `tech-c` and `org` keys. + Note that not every URL is valid, in case of a 404 error + the URL is skipped. *** @@ -123,7 +123,9 @@ class RIPEApiClient: DETAILS_ROLE_URL_PATTERN = 'https://rest.db.ripe.net/ripe/role/' DETAILS_PERSON_URL_PATTERN = 'https://rest.db.ripe.net/ripe/person/' + DETAILS_ORGANISATION_URL_PATTERN = 'https://rest.db.ripe.net/ripe/organisation/' DETAILS_EXTENSION = '.json' + DETAILS_UNFILTERED_EXTENSION = '?unfiltered' _UNIQUE_ASN_MARKER = 'ASN' _UNIQUE_IP_NETWORK_MARKER = 'IP Network' @@ -139,7 +141,7 @@ def __init__(self, ) -> None: self.asn_seq = self._get_validated_as_numbers(asn_seq) self.ip_network_seq = self._get_validated_ip_networks(ip_network_seq) - self.asn_ip_network_to_details_urls = dict() + self.marker_to_details_urls = dict() if not (self.asn_seq or self.ip_network_seq): raise ValueError('ASN or IP Network should be provided.') if self.asn_seq and self.ip_network_seq: @@ -188,7 +190,7 @@ def _get_validated_ip_networks(ip_networks_seq: Optional[list] def _set_asn_and_ip_network_to_unique_details_urls_structure(self) -> None: for marker in (self._UNIQUE_ASN_MARKER, self._UNIQUE_IP_NETWORK_MARKER): - self.asn_ip_network_to_details_urls[marker] = dict() + self.marker_to_details_urls[marker] = dict() # @@ -246,41 +248,76 @@ def _obtain_partial_url_from_response(self, if response["data"]["records"]: for record in response["data"]["records"][0]: if record['key'] in ('admin-c', 'tech-c'): - if asn: - self._provide_asn_or_ip_network_to_unique_details_urls( - value=record['value'], - marker=self._UNIQUE_ASN_MARKER, - asn_ip_network=asn, - ) - if ip_network: - self._provide_asn_or_ip_network_to_unique_details_urls( - value=record['value'], - marker=self._UNIQUE_IP_NETWORK_MARKER, - asn_ip_network=ip_network, - ) - - def _provide_asn_or_ip_network_to_unique_details_urls(self, - value: str, - marker: str, - asn_ip_network: str, - ) -> None: - assert marker is not None - if not self.asn_ip_network_to_details_urls[marker].get(asn_ip_network): - self.asn_ip_network_to_details_urls[marker][asn_ip_network] = set() - self.asn_ip_network_to_details_urls[marker][asn_ip_network].update(( - f'{self.DETAILS_PERSON_URL_PATTERN}{value}{self.DETAILS_EXTENSION}', - f'{self.DETAILS_ROLE_URL_PATTERN}{value}{self.DETAILS_EXTENSION}', - )) + self._provide_marker_to_unique_details_urls( + value=record['value'], + asn=asn, + ip_network=ip_network, + ) + if record['key'] == 'org': + self._provide_marker_to_unique_details_urls( + value=record['value'], + asn=asn, + ip_network=ip_network, + org=True, + ) + + def _provide_marker_to_unique_details_urls(self, + value: str, + asn: str = None, + ip_network: str = None, + org: bool = False, + ) -> None: + if asn is not None: + assert ip_network is None + self._provide_asn_to_unique_details_urls(asn=asn, + value=value, + org=org) + if ip_network is not None: + assert asn is None + self._provide_ip_network_to_unique_details_urls(ip_network=ip_network, + value=value, + org=org) + + def _provide_asn_to_unique_details_urls(self, + asn: str, + value: str, + org: bool = False, + ) -> None: + if not self.marker_to_details_urls[self._UNIQUE_ASN_MARKER].get(asn): + self.marker_to_details_urls[self._UNIQUE_ASN_MARKER][asn] = set() + if org: + self.marker_to_details_urls[self._UNIQUE_ASN_MARKER][asn].add( + f'{self.DETAILS_ORGANISATION_URL_PATTERN}{value}{self.DETAILS_EXTENSION}{self.DETAILS_UNFILTERED_EXTENSION}' + ) + else: + self.marker_to_details_urls[self._UNIQUE_ASN_MARKER][asn].update(( + f'{self.DETAILS_PERSON_URL_PATTERN}{value}{self.DETAILS_EXTENSION}{self.DETAILS_UNFILTERED_EXTENSION}', + f'{self.DETAILS_ROLE_URL_PATTERN}{value}{self.DETAILS_EXTENSION}{self.DETAILS_UNFILTERED_EXTENSION}', + )) + + def _provide_ip_network_to_unique_details_urls(self, + ip_network: str, + value: str, + org: bool = False, + ): + if not self.marker_to_details_urls[self._UNIQUE_IP_NETWORK_MARKER].get(ip_network): + self.marker_to_details_urls[self._UNIQUE_IP_NETWORK_MARKER][ip_network] = set() + if org: + self.marker_to_details_urls[self._UNIQUE_IP_NETWORK_MARKER][ip_network].add( + f'{self.DETAILS_ORGANISATION_URL_PATTERN}{value}{self.DETAILS_EXTENSION}{self.DETAILS_UNFILTERED_EXTENSION}' + ) + else: + self.marker_to_details_urls[self._UNIQUE_IP_NETWORK_MARKER][ip_network].update(( + f'{self.DETAILS_PERSON_URL_PATTERN}{value}{self.DETAILS_EXTENSION}{self.DETAILS_UNFILTERED_EXTENSION}', + f'{self.DETAILS_ROLE_URL_PATTERN}{value}{self.DETAILS_EXTENSION}{self.DETAILS_UNFILTERED_EXTENSION}', + )) # * Phase III - Obtaining data from unique *details* URLs + abuse contact finder: def _get_attrs_data_from_unique_details_urls(self) -> list: attrs_data_from_details_urls = [] - for ( - marker, - asn_or_ip_network_to_urls - ) in self.asn_ip_network_to_details_urls.items(): + for (marker, asn_or_ip_network_to_urls) in self.marker_to_details_urls.items(): if asn_or_ip_network_to_urls: self._provide_attrs_data(attrs_data_from_details_urls, marker, @@ -293,7 +330,7 @@ def _provide_attrs_data(self, asn_or_ip_network_to_urls: dict, ) -> None: for asn_or_ip_network, unique_urls in asn_or_ip_network_to_urls.items(): - adjusted_attributes = [('Data for', str(asn_or_ip_network))] + adjusted_attributes = [('Unfiltered data for', str(asn_or_ip_network))] contact_url = self._obtain_abuse_url(asn_or_ip_network, marker) response = self._perform_single_request(contact_url) abuse_contact_emails = response['data']['abuse_contacts'] diff --git a/N6Lib/n6lib/search_engine_api.py b/N6Lib/n6lib/search_engine_api.py index fa03165..6b90eab 100644 --- a/N6Lib/n6lib/search_engine_api.py +++ b/N6Lib/n6lib/search_engine_api.py @@ -1,4 +1,4 @@ -# Copyright (c) 2022 NASK. All rights reserved. +# Copyright (c) 2022-2023 NASK. All rights reserved. """ The module provides functionality of searching any text through @@ -23,9 +23,10 @@ language, class `Analyzer` raises exception `AnalyzerError`. Example of use: ->>> se = SearchEngine("pl") ->>> se.index_document(SearchedDocument(1, "tekst dokumentu")) ->>> result = se.search("dokument") + + se = SearchEngine("pl") + se.index_document(SearchedDocument(1, "tekst dokumentu")) + result = se.search("dokument") Module is inspired by the article https://bart.degoe.de/building-a-full-text-search-engine-150-lines-of-code/ diff --git a/N6Lib/n6lib/tests/auth_related_quicktest.py b/N6Lib/n6lib/tests/auth_related_quicktest.py index 2e7e8ff..293a740 100644 --- a/N6Lib/n6lib/tests/auth_related_quicktest.py +++ b/N6Lib/n6lib/tests/auth_related_quicktest.py @@ -580,17 +580,18 @@ def data_maker_for____TestAuthAPI___get_inside_criteria(session): models.InsideFilterASN(asn=34)], inside_filter_ccs=[ models.InsideFilterCC(cc='PL'), - models.InsideFilterCC(cc=u'US')], + models.InsideFilterCC(cc='US')], inside_filter_fqdns=[ models.InsideFilterFQDN(fqdn='example.com'), - models.InsideFilterFQDN(fqdn=u'xyz.example.net')], + models.InsideFilterFQDN(fqdn='xyz.example.net')], inside_filter_ip_networks=[ + models.InsideFilterIPNetwork(ip_network='0.10.20.30/8'), models.InsideFilterIPNetwork(ip_network='1.2.3.4/16'), - models.InsideFilterIPNetwork(ip_network=u'101.102.103.104/32')], + models.InsideFilterIPNetwork(ip_network='101.102.103.104/32')], inside_filter_urls=[ - models.InsideFilterURL(url='exp.pl'), - models.InsideFilterURL(url=u'bank.pl/auth.php'), - models.InsideFilterURL(url=u'Łódź')]) + models.InsideFilterURL(url='example.info'), + models.InsideFilterURL(url='institution.example.pl/auth.php'), + models.InsideFilterURL(url='Łódź')]) yield models.Org(org_id='o2', inside_filter_asns=[models.InsideFilterASN(asn=1234567)]) yield models.Org(org_id='o3', @@ -613,6 +614,7 @@ def _data_matching_those_from_auth_related_test_helpers(session): models.CriteriaASN(asn=2), models.CriteriaASN(asn=3)], criteria_ip_networks=[ + models.CriteriaIPNetwork(ip_network='0.0.0.0/30'), models.CriteriaIPNetwork(ip_network='10.0.0.0/8'), models.CriteriaIPNetwork(ip_network='192.168.0.0/24')]) cri2 = models.CriteriaContainer( @@ -739,7 +741,7 @@ def _data_matching_those_from_auth_related_test_helpers(session): # orgs o1 = models.Org( org_id='o1', - actual_name=u'Actual Name Zażółć', + actual_name='Actual Name Zażółć', org_groups=[go1], full_access=True, stream_api_enabled=True, email_notification_enabled=True, diff --git a/N6Lib/n6lib/tests/test_auth_api.py b/N6Lib/n6lib/tests/test_auth_api.py index a9238c3..4726e45 100644 --- a/N6Lib/n6lib/tests/test_auth_api.py +++ b/N6Lib/n6lib/tests/test_auth_api.py @@ -1,4 +1,4 @@ -# Copyright (c) 2014-2022 NASK. All rights reserved. +# Copyright (c) 2014-2023 NASK. All rights reserved. import collections import contextlib @@ -850,8 +850,8 @@ class TestAuthAPI___get_inside_criteria(_AuthAPILdapDataBasedMethodTestMixIn, 'n6asn': ['12', '34'], 'n6cc': ['PL', 'US'], 'n6fqdn': ['example.com', 'xyz.example.net'], - 'n6ip-network': ['1.2.3.4/16', '101.102.103.104/32'], - 'n6url': ['exp.pl', 'bank.pl/auth.php', u'Łódź'], + 'n6ip-network': ['0.10.20.30/8', '1.2.3.4/16', '101.102.103.104/32'], + 'n6url': ['example.info', 'institution.example.pl/auth.php', 'Łódź'], }), ('o=o2,ou=orgs,dc=n6,dc=cert,dc=pl', { 'n6asn': ['1234567'], @@ -870,8 +870,12 @@ class TestAuthAPI___get_inside_criteria(_AuthAPILdapDataBasedMethodTestMixIn, 'asn_seq': [12, 34], 'cc_seq': ['PL', 'US'], 'fqdn_seq': ['example.com', 'xyz.example.net'], - 'ip_min_max_seq': [(16908288, 16973823), (1701209960, 1701209960)], - 'url_seq': ['exp.pl', 'bank.pl/auth.php', u'Łódź'], + 'ip_min_max_seq': [ + (1, 16777215), # <- Note: here the minimum IP is 1, not 0 (see: #8861). + (16908288, 16973823), + (1701209960, 1701209960), + ], + 'url_seq': ['example.info', 'institution.example.pl/auth.php', 'Łódź'], }, { 'org_id': 'o2', @@ -2194,7 +2198,7 @@ def test(self): { 'org_id': 'o7', 'ip_min_max_seq': [ - (0, 1), + (1, 2), (102, MAX_IP), ], }, @@ -2202,8 +2206,8 @@ def test(self): 'org_id': 'o6', 'ip_min_max_seq': [ (MAX_IP, MAX_IP), - (2, 100), - (0, 0), + (3, 100), + (1, 1), ], }, ] @@ -2218,39 +2222,40 @@ def test(self): ]), ({'o6'}, [ - {2}, + {4}, + {3}, {50}, {100}, - {2, 30, 70, 100}, + {3, 30, 70, 100}, {30, 40, 50, 60, 70, 101}, ]), ({'o7'}, [ - {1}, + {2}, {102}, {12345}, {MAX_IP - 1}, {150, 101, 12345}, - {1, 102, 12345, MAX_IP - 1}, + {2, 102, 12345, MAX_IP - 1}, ]), ({'o6', 'o7'}, [ - {0}, + {1}, {MAX_IP}, - {0, MAX_IP}, - {0, 1}, + {1, MAX_IP}, {1, 2}, - {1, 50}, - {1, 100}, - {2, 102}, + {2, 3}, + {2, 50}, + {2, 100}, + {3, 102}, {50, 102}, {100, 102}, {100, 101, 102}, - {2, 12345}, + {3, 12345}, {50, 12345}, {100, 12345}, {100, 12345, MAX_IP}, - {0, 50, 150, MAX_IP}, + {1, 50, 150, MAX_IP}, ]), ] @@ -2305,25 +2310,25 @@ def test(self): # ) (set(), [ - {0}, + {1}, {5}, {9}, - {0, 5, 9}, + {1, 5, 9}, {131}, {135}, {138}, {5, 135}, - {0, 9, 131, 135, 138}, + {1, 9, 131, 135, 138}, {171}, {174, 176}, {179}, {191}, {12345}, {MAX_IP}, - {0, MAX_IP}, + {1, MAX_IP}, {5, 134, 175, 192}, {5, 131, 135, 138, 171, 175, 179, 191, 12345}, - {0, 5, 9, 131, 135, 138, 171, 175, 179, 191, 12345, MAX_IP}, + {1, 5, 9, 131, 135, 138, 171, 175, 179, 191, 12345, MAX_IP}, ]), ({'o8'}, [ @@ -2331,7 +2336,7 @@ def test(self): {130}, {124, 125}, {5, 124, 125, 12345}, - {0, 5, 9, 130, 131, 135, 138, 171, 175, 179, 191, 12345, MAX_IP}, + {1, 5, 9, 130, 131, 135, 138, 171, 175, 179, 191, 12345, MAX_IP}, ]), ({'o9'}, [ @@ -2347,7 +2352,7 @@ def test(self): {168}, {170}, {91, 94, 99, 145, 165, 170, MAX_IP}, - {0, 94, 145, 165, 12345}, + {1, 94, 145, 165, 12345}, ]), ({'o10'}, [ @@ -2385,7 +2390,7 @@ def test(self): {120}, {190}, {9, 25, 29, 120, 190, 12345}, - {0, 121, 139, MAX_IP}, + {1, 121, 139, MAX_IP}, ]), ({'o9', 'o10'}, [ @@ -2402,7 +2407,7 @@ def test(self): {115}, {169}, {71, 80, 115}, - {1, 71, 81, 169, 12345, MAX_IP}, + {2, 71, 81, 169, 12345, MAX_IP}, {139, 140}, ]), @@ -2435,13 +2440,15 @@ def test(self): {130, 138, 139, 140}, {130, 169}, {150, 180}, - {3, 30, 42, 70, 100, 110}, + {4, 30, 42, 70, 100, 110}, {5, 10, 120, 125, 155, 170, 185, MAX_IP} ]), ] def _ip_min_max(ip_network): - return ip_network_tuple_to_min_max_ip(ip_network_as_tuple(ip_network)) + return ip_network_tuple_to_min_max_ip( + ip_network_as_tuple(ip_network), + force_min_ip_greater_than_zero=True) _ip = ip_str_to_int SPECIFIC_IP_CRITERIA = [ @@ -2468,13 +2475,13 @@ def _ip_min_max(ip_network): # ) (set(), [ - {0}, + {1}, {_ip('10.10.9.255')}, {_ip('10.10.10.152')}, {_ip('10.10.11.0')}, {MAX_IP}, { - 0, + 1, _ip('10.10.9.255'), _ip('10.10.10.152'), _ip('10.10.11.0'), @@ -2497,7 +2504,7 @@ def _ip_min_max(ip_network): for i in range(256) # (here we do not skip `...152`) } | { - 0, + 1, _ip('10.10.9.255'), _ip('10.10.11.0'), MAX_IP, @@ -2979,9 +2986,9 @@ def test___ids_and_urls(self, inside_criteria, expected_content): expected_content=( [ -1, # guard item - 0, 1, 2, + 3, 101, 102, MAX_IP, diff --git a/N6Lib/n6lib/tests/test_data_backend_api.py b/N6Lib/n6lib/tests/test_data_backend_api.py index a822c4e..bf6920c 100644 --- a/N6Lib/n6lib/tests/test_data_backend_api.py +++ b/N6Lib/n6lib/tests/test_data_backend_api.py @@ -1,4 +1,4 @@ -# Copyright (c) 2013-2021 NASK. All rights reserved. +# Copyright (c) 2013-2023 NASK. All rights reserved. import copy import unittest @@ -17,6 +17,7 @@ ) from n6lib.data_backend_api import ( + LOGGER as module_logger, N6DataBackendAPI, _EventsQueryProcessor, ) @@ -54,7 +55,7 @@ def test(self): @expand -class Test_EventsQueryProcessor___get_key_to_query_func(unittest.TestCase): +class Test_EventsQueryProcessor__get_key_to_query_func(unittest.TestCase): @foreach( param(data_spec_class=N6DataSpec), @@ -105,7 +106,7 @@ def test(self, data_spec_class): @expand -class Test_EventsQueryProcessor__generate_query_results(unittest.TestCase): +class Test_EventsQueryProcessor_generate_query_results(unittest.TestCase): _UTCNOW = dt(2015, 1, 3, 17, 18, 19) @@ -456,176 +457,1089 @@ class Test_EventsQueryProcessor__preprocess_result_dict(TestCaseMixin, unittest. @paramseq def cases(cls): yield param( + # 'SY:'-prefixed `url`, no `custom` + # -> result: nothing raw_result_dict={ - # 'SY:'-prefixed `url`, no `custom`/`url_data`, some data - # -> nothing - 'url': u'SY:cośtam', - 'foo': u'bar', + 'url': 'SY:cośtam', }, + expected_log_regexes=[ + r"^ERROR:.*`url` \('SY:co\\u015btam'\) starts with 'SY:' but no `url_data`!", + ], expected_result=None, - ) + ).label( + "(01) 'SY:'-prefixed `url`, no `custom`") + + yield param( + # 'SY:'-prefixed `url`, no `custom`, unrelated data + # -> result: nothing + # + # [general remark: the *unrelated data* stuff is generally + # irrelevant for the core logic the tests provided by this + # class concern; many of them include *unrelated data*, but + # this is done just to show that those data do not interfere + # with that logic, or -- when applicable -- that they are + # passed through without problems...] + raw_result_dict={ + 'url': 'SY:cośtam', + 'foo': 'bar', + }, + expected_log_regexes=[ + r"^ERROR:.*`url` \('SY:co\\u015btam'\) starts with 'SY:' but no `url_data`!", + ], + expected_result=None, + ).label( + "(02) 'SY:'-prefixed `url`, no `custom`, unrelated data") + yield param( + # 'SY:'-prefixed `url`, `custom` without `url_data` + # -> result: nothing raw_result_dict={ - # 'SY:'-prefixed `url`, `custom` without `url_data`, some data - # -> nothing 'custom': {'spam': 'ham'}, - 'url': u'SY:cośtam', + 'url': 'SY:cośtam', + }, + expected_log_regexes=[ + r"^ERROR:.*`url` \('SY:co\\u015btam'\) starts with 'SY:' but no `url_data`!", + ], + expected_result=None, + ).label( + "(03) 'SY:'-prefixed `url`, `custom` without `url_data`") + + yield param( + # 'SY:'-prefixed `url`, `custom` without `url_data`, unrelated data + # -> result: nothing + raw_result_dict={ + 'custom': {'spam': 'ham'}, + 'url': 'SY:cośtam', 'foo': 'bar', }, + expected_log_regexes=[ + r"^ERROR:.*`url` \('SY:co\\u015btam'\) starts with 'SY:' but no `url_data`!", + ], expected_result=None, - ) + ).label( + "(04) 'SY:'-prefixed `url`, `custom` without `url_data`, unrelated data") + yield param( - # `custom`+`url_data`, no 'url' - # -> nothing + # `custom` with `url_data`, no 'url' + # -> result: nothing raw_result_dict={ - 'custom': { - 'url_data': { - 'url_orig': 'x', - 'url_norm_opts': {'x': 'y'}, - }, - }, + 'custom': { + 'url_data': { + 'orig_b64': 'eA==', + 'norm_brief': 'emru', + }, + }, }, + expected_log_regexes=[ + r"^ERROR:.*`url_data` present.*but `url` \(None\) does not start with 'SY:'!", + ], expected_result=None, - ) + ).label( + "(05) `custom` with `url_data`, no 'url'") + yield param( - # `custom`+`url_data`, some data, no 'url' - # -> nothing + # [analogous to previous case, but with `url_data` in legacy format] + # `custom` with `url_data`, no 'url' + # -> result: nothing raw_result_dict={ - 'custom': { - 'url_data': { - 'url_orig': u'x', - 'url_norm_opts': {'x': u'y'}, - }, - }, - 'foo': u'bar', + 'custom': { + 'url_data': { + 'url_orig': 'eA==', + 'url_norm_opts': {'transcode1st': True, 'epslash': True, 'rmzone': True}, + }, + }, }, + expected_log_regexes=[ + r"^ERROR:.*`url_data` present.*but `url` \(None\) does not start with 'SY:'!", + ], expected_result=None, - ) + ).label( + "(06) `custom` with `url_data`, no 'url' [@legacy]") + yield param( - # `url` without 'SY:' prefix, `custom`+`url_data` - # -> nothing + # `custom` with `url_data`, unrelated data, no 'url' + # -> result: nothing raw_result_dict={ - 'custom': { - 'url_data': { - 'url_orig': u'x', - 'url_norm_opts': {'x': u'y'}, - }, - }, - 'url': u'foo:bar', + 'custom': { + 'url_data': { + 'orig_b64': 'eA==', + 'norm_brief': 'emru', + }, + }, + 'foo': 'bar', }, + expected_log_regexes=[ + r"^ERROR:.*`url_data` present.*but `url` \(None\) does not start with 'SY:'!", + ], expected_result=None, - ) + ).label( + "(07) `custom` with `url_data`, unrelated data, no 'url'") + yield param( - # `url` without 'SY:' prefix, `custom`+`url_data`, some data - # -> nothing + # [analogous to previous case, but with `url_data` in legacy format] + # `custom` with `url_data`, unrelated data, no 'url' + # -> result: nothing raw_result_dict={ - 'custom': { - 'url_data': { - 'url_orig': 'x', - 'url_norm_opts': {'x': 'y'}, - }, - }, - 'foo': 'bar', + 'custom': { + 'url_data': { + 'url_orig': 'eA==', + 'url_norm_opts': {'transcode1st': True, 'epslash': True, 'rmzone': True}, + }, + }, + 'foo': 'bar', }, + expected_log_regexes=[ + r"^ERROR:.*`url_data` present.*but `url` \(None\) does not start with 'SY:'!", + ], expected_result=None, - ) + ).label( + "(08) `custom` with `url_data`, unrelated data, no 'url' [@legacy]") + yield param( - # `custom`+`url_data` but the latter is not valid (not a dict) - # -> nothing + # `url` without 'SY:' prefix, `custom` with `url_data` + # -> result: nothing raw_result_dict={ - 'custom': { - 'url_data': [u'something'], - }, - 'url': u'SY:foo:bar', + 'custom': { + 'url_data': { + 'orig_b64': 'eA==', + 'norm_brief': 'emru', + }, + }, + 'url': 'foo:cośtam', }, + expected_log_regexes=[ + r"^ERROR:.*`url_data` present.*but `url` \('foo:.*'\) does not start with 'SY:'!", + ], expected_result=None, - ) + ).label( + "(09) `url` without 'SY:' prefix, `custom` with `url_data`") + yield param( - # `custom`+`url_data` but the latter is not valid (missing keys) - # -> nothing + # [analogous to previous case, but with `url_data` in legacy format] + # `url` without 'SY:' prefix, `custom` with `url_data` + # -> result: nothing raw_result_dict={ - 'custom': { - 'url_data': { - 'url_norm_opts': {'x': 'y'}, - }, - }, - 'foo': 'bar', + 'custom': { + 'url_data': { + 'url_orig': 'eA==', + 'url_norm_opts': {'transcode1st': True, 'epslash': True, 'rmzone': True}, + }, + }, + 'url': 'foo:cośtam', }, + expected_log_regexes=[ + r"^ERROR:.*`url_data` present.*but `url` \('foo:.*'\) does not start with 'SY:'!", + ], expected_result=None, - ) + ).label( + "(10) `url` without 'SY:' prefix, `custom` with `url_data` [@legacy]") + yield param( - # `custom`+`url_data` but the latter is not valid (illegal keys) - # -> nothing + # `url` without 'SY:' prefix, `custom` with `url_data`, unrelated data + # -> result: nothing raw_result_dict={ - 'custom': { - 'url_data': { - 'url_orig': u'x', - 'url_norm_opts': {'x': 'y'}, - 'spam': 'ham', - }, - }, - 'foo': 'bar', + 'custom': { + 'url_data': { + 'orig_b64': 'eA==', + 'norm_brief': 'emru', + }, + }, + 'url': 'foo:cośtam', + 'foo': 'bar', }, + expected_log_regexes=[ + r"^ERROR:.*`url_data` present.*but `url` \('foo:.*'\) does not start with 'SY:'!", + ], expected_result=None, - ) + ).label( + "(11) `url` without 'SY:' prefix, `custom` with `url_data`, unrelated data") + yield param( - # some data. no `url`, no `url_data` - # -> some data + # [analogous to previous case, but with `url_data` in legacy format] + # `url` without 'SY:' prefix, `custom` with `url_data`, unrelated data + # -> result: nothing raw_result_dict={ - 'foo': 'bar', + 'custom': { + 'url_data': { + 'url_orig': 'eA==', + 'url_norm_opts': {'transcode1st': True, 'epslash': True, 'rmzone': True}, + }, + }, + 'url': 'foo:cośtam', + 'foo': 'bar', + }, + expected_log_regexes=[ + r"^ERROR:.*`url_data` present.*but `url` \('foo:.*'\) does not start with 'SY:'!", + ], + expected_result=None, + ).label( + "(12) `url` without 'SY:' prefix, `custom` with `url_data`, unrelated data [@legacy]") + + yield param( + # 'SY:'-prefixed `url`, `custom` with `url_data` which is not valid (not a dict) + # -> result: nothing + raw_result_dict={ + 'custom': { + 'url_data': ['something'], + }, + 'url': 'SY:cośtam', + }, + expected_log_regexes=[ + r"^ERROR:.*`url_data` \(\['something'\]\) is not valid!", + ], + expected_result=None, + ).label( + "(13) `'SY:'-prefixed `url`, custom` with `url_data` which is not valid " + "(not a dict)") + + yield param( + # 'SY:'-prefixed `url`, `custom` with `url_data` which is not valid (missing keys), + # -> result: nothing + raw_result_dict={ + 'custom': { + 'url_data': { + 'norm_brief': 'emru', + }, + }, + 'url': 'SY:cośtam', + }, + expected_log_regexes=[ + r"^ERROR:.*`url_data` \(\{'norm_brief': 'emru'\}\) is not valid!", + ], + expected_result=None, + ).label( + "(14) `'SY:'-prefixed `url`, custom` with `url_data` which is not valid " + "(missing keys)") + + yield param( + # [analogous to previous case, but with `url_data` in legacy format] + # 'SY:'-prefixed `url`, `custom` with `url_data` which is not valid (missing keys) + # -> result: nothing + raw_result_dict={ + 'custom': { + 'url_data': { + 'url_norm_opts': {'transcode1st': True, 'epslash': True, 'rmzone': True}, + }, + }, + 'url': 'SY:cośtam', + }, + expected_log_regexes=[ + r"^ERROR:.*`url_data` \(\{'url_norm_opts': \{.*\}\}\) is not valid!", + ], + expected_result=None, + ).label( + "(15) 'SY:'-prefixed `url`, `custom` with `url_data` which is not valid " + "(missing keys) [@legacy]") + + yield param( + # 'SY:'-prefixed `url`, `custom` with `url_data` which is not valid (illegal keys) + # -> result: nothing + raw_result_dict={ + 'custom': { + 'url_data': { + 'orig_b64': 'eA==', + 'norm_brief': 'emru', + 'spam': 'ham', + }, + }, + 'url': 'SY:cośtam', + }, + expected_log_regexes=[ + r"^ERROR:.*`url_data` \(\{.*\}\) is not valid!", + ], + expected_result=None, + ).label( + "(16) 'SY:'-prefixed `url`, `custom` with `url_data` which is not valid " + "(illegal keys)") + + yield param( + # [analogous to previous case, but with `url_data` in legacy format] + # 'SY:'-prefixed `url`, `custom` with `url_data` which is not valid (illegal keys) + # -> result: nothing + raw_result_dict={ + 'custom': { + 'url_data': { + 'url_orig': 'eA==', + 'url_norm_opts': {'transcode1st': True, 'epslash': True, 'rmzone': True}, + 'spam': 'ham', + }, + }, + 'url': 'SY:cośtam', + }, + expected_log_regexes=[ + r"^ERROR:.*`url_data` \(\{.*\}\) is not valid!", + ], + expected_result=None, + ).label( + "(17) 'SY:'-prefixed `url`, `custom` with `url_data` which is not valid " + "(illegal keys) [@legacy]") + + yield param( + # 'SY:'-prefixed `url`, `custom` with `url_data` which is not valid (empty `orig_b64`) + # -> result: nothing + raw_result_dict={ + 'custom': { + 'url_data': { + 'orig_b64': '', + 'norm_brief': 'emru', + }, + }, + 'url': 'SY:cośtam', + }, + expected_log_regexes=[ + r"^ERROR:.*`url_data` \(\{.*\}\) is not valid!", + ], + expected_result=None, + ).label( + "(18) `custom` with `url_data` which is not valid " + "(empty `orig_b64`)") + + yield param( + # [analogous to previous case, but with `url_data` in legacy format] + # 'SY:'-prefixed `url`, `custom` with `url_data` which is not valid (empty `url_orig`) + # -> result: nothing + raw_result_dict={ + 'custom': { + 'url_data': { + 'url_orig': '', + 'url_norm_opts': {'transcode1st': True, 'epslash': True, 'rmzone': True}, + }, + }, + 'url': 'SY:cośtam', + }, + expected_log_regexes=[ + r"^ERROR:.*`url_data` \(\{.*\}\) is not valid!", + ], + expected_result=None, + ).label( + "(19) 'SY:'-prefixed `url`, `custom` with `url_data` which is not valid " + "(empty `url_orig`) [@legacy]") + + yield param( + # unrelated data, no `url`, no `custom` + # -> result: unrelated data + raw_result_dict={ + 'foo': 'bar', + }, + expected_result={ + 'foo': 'bar', + }, + ).label( + "(20) unrelated data, no `url`, no `custom`") + + yield param( + # unrelated data, no `url`, `custom` without `url_data` + # -> result: unrelated data, `custom` + raw_result_dict={ + 'custom': {'spam': 'ham'}, + 'foo': 'bar', + }, + expected_result={ + 'custom': {'spam': 'ham'}, + 'foo': 'bar', + }, + ).label( + "(21) unrelated data, no `url`, `custom` without `url_data`") + + yield param( + # `url` without 'SY:' prefix, unrelated data, no `custom` + # -> result: `url`, unrelated data + raw_result_dict={ + 'url': 'something-else', + 'foo': 'bar', + }, + expected_result={ + 'url': 'something-else', + 'foo': 'bar', + }, + ).label( + "(22) `url` without 'SY:' prefix, unrelated data, no `custom`") + + yield param( + # `url` without 'SY:' prefix, unrelated data, `custom` without `url_data` + # -> result: `url`, unrelated data, `custom` + raw_result_dict={ + 'custom': {'spam': 'ham'}, + 'url': 'something-else', + 'foo': 'bar', + }, + expected_result={ + 'custom': {'spam': 'ham'}, + 'url': 'something-else', + 'foo': 'bar', + }, + ).label( + "(23) `url` without 'SY:' prefix, unrelated data, `custom` without `url_data`") + + yield param( + # 'SY:'-prefixed `url`, + # `custom` with `url_data`, + # *and* `url.b64` in params (matching!) + # -- so: `orig_b64` URL-safe-base64-decoded + normalized + # matches normalized `url.b64` + # + # -> result: + # `url` being `orig_b64` URL-safe-base64-decoded + normalized, + # `custom` without `url_data` + # + # remarks: + # * active normalization options: + # `empty_path_slash`, `merge_surrogate_pairs`, `remove_ipv6_zone`, `unicode_str` + filtering_params={ + 'url.b64': [ + (b'HtTp://\xc4\x86ma.ExAmPlE.cOm:80/?q=\xed\xb3\x9d\xed\xa0\x80' + b'%3D-%4D-%5D-Ni!?#\xf4\x8f\xbf\xbf\xed\xb3\x8c'), + ], + }, + raw_result_dict={ + 'url': 'SY:foo:cośtam/not-important', + 'custom': { + 'url_data': { + # `orig_b64` is URL-safe-base64-encoded: + # b'htTP://\xc4\x86ma.eXample.COM:?q=\xed\xb3\x9d\xed\xa0\x80' + # b'%3D-%4D-%5D-Ni!?#\xed\xaf\xbf\xed\xbf\xbf\xed\xb3\x8c' + 'orig_b64': ('aHRUUDovL8SGbWEuZVhhbXBsZS5DT006P3E97bOd' + '7aCAJTNELSU0RC0lNUQtTmkhPyPtr7_tv7_ts4w='), + + # (`empty_path_slash`, `merge_surrogate_pairs`, + # `remove_ipv6_zone`, `unicode_str`) + 'norm_brief': 'emru', + }, + }, }, expected_result={ - 'foo': 'bar', + 'url': 'http://Ćma.example.com/?q=\udcdd\ud800%3D-%4D-%5D-Ni!?#\U0010FFFF\udccc', + 'custom': {}, # (empty `custom` is harmless, as data spec removes it later anyway) }, - ) + ).label( + "(24) 'SY:'-prefixed `url`, `custom` with `url_data`, `url.b64` in params " + "(matching, 'emru')") + yield param( - # some data, no `url`, `custom` without `url_data` - # -> some data, `custom` + # 'SY:'-prefixed `url`, + # `custom` with `url_data`, + # *and* `url.b64` in params (matching!) + # -- so: `orig_b64` URL-safe-base64-decoded + normalized + # matches normalized `url.b64` + # + # -> result: + # `url` being `orig_b64` URL-safe-base64-decoded + normalized, + # `custom` without `url_data` + # + # remarks: + # * active normalization options: + # `empty_path_slash`, `merge_surrogate_pairs`, `unicode_str` + # * here lack of `remove_ipv6_zone` changes nothing, as this + # URL does not contain IPv6 address + filtering_params={ + 'url.b64': [ + (b'HtTp://\xc4\x86ma.ExAmPlE.cOm:80?q=\xed\xb3\x9d\xed\xa0\x80' + b'%3D-%4D-%5D-Ni!?#\xf4\x8f\xbf\xbf\xed\xb3\x8c'), + ], + }, raw_result_dict={ - 'custom': {'spam': u'ham'}, - 'foo': u'bar', + 'url': 'SY:foo:cośtam/not-important', + 'custom': { + 'url_data': { + # `orig_b64` is URL-safe-base64-encoded: + # b'htTP://\xc4\x86ma.eXample.COM:/?q=\xed\xb3\x9d\xed\xa0\x80' + # b'%3D-%4D-%5D-Ni!?#\xed\xaf\xbf\xed\xbf\xbf\xed\xb3\x8c' + 'orig_b64': ('aHRUUDovL8SGbWEuZVhhbXBsZS5DT006Lz9xPe2z' + 'ne2ggCUzRC0lNEQtJTVELU5pIT8j7a-_7b-_7bOM'), + + # (`empty_path_slash`, `merge_surrogate_pairs`, `unicode_str`) + 'norm_brief': 'emu', + }, + }, }, expected_result={ - 'custom': {'spam': u'ham'}, - 'foo': u'bar', + 'url': 'http://Ćma.example.com/?q=\udcdd\ud800%3D-%4D-%5D-Ni!?#\U0010FFFF\udccc', + 'custom': {}, # (empty `custom` is harmless, as data spec removes it later anyway) }, - ) + ).label( + "(25) 'SY:'-prefixed `url`, `custom` with `url_data`, `url.b64` in params " + "(matching, 'emu')") + yield param( - # `url` without 'SY:' prefix, some data, no `custom`/`url_data` - # -> `url`, some data + # 'SY:'-prefixed `url`, + # `custom` with `url_data`, + # *and* `url.b64` in params (matching!) + # -- so: `orig_b64` URL-safe-base64-decoded + normalized + # matches normalized `url.b64` + # + # -> result: + # `url` being `orig_b64` URL-safe-base64-decoded + normalized + coerced to `str`, + # `custom` without `url_data` + # + # remarks: + # * active normalization options: + # `empty_path_slash`, `merge_surrogate_pairs` + # * here lack of `unicode_str` does not prevent matching + filtering_params={ + 'url.b64': [ + (b'HtTp://\xc4\x86ma.ExAmPlE.cOm:80?q=\xed\xb3\x9d\xed\xa0\x80' + b'%3D-%4D-%5D-Ni!?#\xf4\x8f\xbf\xbf\xed\xb3\x8c'), + ], + }, raw_result_dict={ - 'url': u'something-else', - 'foo': u'bar', + 'url': 'SY:foo:cośtam/not-important', + 'custom': { + 'url_data': { + # `orig_b64` is URL-safe-base64-encoded: + # b'htTP://\xc4\x86ma.eXample.COM:/?q=\xed\xb3\x9d\xed\xa0\x80' + # b'%3D-%4D-%5D-Ni!?#\xed\xaf\xbf\xed\xbf\xbf\xed\xb3\x8c' + 'orig_b64': ('aHRUUDovL8SGbWEuZVhhbXBsZS5DT006Lz9xPe2z' + 'ne2ggCUzRC0lNEQtJTVELU5pIT8j7a-_7b-_7bOM'), + + # (`empty_path_slash`, `merge_surrogate_pairs`) + 'norm_brief': 'em', + }, + }, }, expected_result={ - 'url': u'something-else', - 'foo': u'bar', + 'url': ( + # note: lack of the `unicode_str` option means that + # the result of normalization is a `bytes` object + # (which is then coerced to `str`, just for the + # "url" result item, by applying the helper function + # `as_str_with_minimum_esc()`; the bytes which encode + # unpaired surrogates are escaped by it using the + # `\x...` notation) + 'http://Ćma.example.com/?q=\\xed\\xb3\\x9d\\xed\\xa0\\x80' + '%3D-%4D-%5D-Ni!?#\U0010FFFF\\xed\\xb3\\x8c'), + 'custom': {}, # (empty `custom` is harmless, as data spec removes it later anyway) }, - ) + ).label( + "(26) 'SY:'-prefixed `url`, `custom` with `url_data`, `url.b64` in params " + "(matching, 'em')") + yield param( - # `url` without 'SY:' prefix, some data, `custom` without `url_data` - # -> `url`, some data, `custom` + # 'SY:'-prefixed `url`, + # `custom` with `url_data`, + # *and* `url.b64` in params (but not matching!) + # -- so: `orig_b64` URL-safe-base64-decoded + normalized + # does *not* match normalized `url.b64` + # + # -> result: nothing + # + # remarks: + # * active normalization options: + # `empty_path_slash`, `merge_surrogate_pairs` + # * here `merge_surrogate_pairs` is ineffective, as + # both `url.b64` and `orig_b64` contain a non-UTF-8 and + # non-surrogate garbage -- namely, the `\xdd` byte just + # before the 'Ni!?' fragment -- which makes the binary + # contents undecodable with the `utf-8` codec, even with + # the `surrogatepass` error handler... + filtering_params={ + 'url.b64': [ + (b'HtTp://\xc4\x86ma.ExAmPlE.cOm:80?q=\xed\xb3\x9d\xed\xa0\x80' + b'%3D-%4D-%5D-\xddNi!?#\xf4\x8f\xbf\xbf\xed\xb3\x8c'), + ], + }, raw_result_dict={ - 'custom': {'spam': 'ham'}, - 'url': 'something-else', - 'foo': 'bar', + 'url': 'SY:foo:cośtam/not-important', + 'custom': { + 'url_data': { + # `orig_b64` is URL-safe-base64-encoded: + # b'htTP://\xc4\x86ma.eXample.COM:/?q=\xed\xb3\x9d\xed\xa0\x80' + # b'%3D-%4D-%5D-\xddNi!?#\xed\xaf\xbf\xed\xbf\xbf\xed\xb3\x8c' + 'orig_b64': ('aHRUUDovL8SGbWEuZVhhbXBsZS5DT006Lz9xPe2z' + 'ne2ggCUzRC0lNEQtJTVELd1OaSE_I-2vv-2_v-2zjA=='), + + # (`empty_path_slash`, `merge_surrogate_pairs`) + 'norm_brief': 'em', + }, + }, + }, + expected_result=None, + ).label( + "(27) 'SY:'-prefixed `url`, `custom` with `url_data`, `url.b64` in params " + "(NOT matching, 'em', ...)") + + yield param( + # 'SY:'-prefixed `url`, + # `custom` with `url_data`, + # *and* `url.b64` in params (matching!) + # -- so: `orig_b64` URL-safe-base64-decoded + normalized + # matches normalized `url.b64` + # + # -> result: + # `url` being `orig_b64` URL-safe-base64-decoded + normalized + coerced to `str`, + # `custom` without `url_data` + # + # remarks: + # * active normalization options: + # `empty_path_slash`, `merge_surrogate_pairs` + # * here `merge_surrogate_pairs` is ineffective (as above), + # but that does not prevent matching because here `url.b64` + # and `orig_b64` contain same non-strict-UTF-8 bytes (i.e., + # surrogates and non-surrogate garbage) + filtering_params={ + 'url.b64': [ + (b'HtTp://\xc4\x86ma.ExAmPlE.cOm:80?q=\xed\xb3\x9d\xed\xa0\x80' + b'%3D-%4D-%5D-\xddNi!?#\xed\xaf\xbf\xed\xbf\xbf\xed\xb3\x8c'), + ], + }, + raw_result_dict={ + 'url': 'SY:foo:cośtam/not-important', + 'custom': { + 'url_data': { + # `orig_b64` is URL-safe-base64-encoded: + # b'htTP://\xc4\x86ma.eXample.COM:/?q=\xed\xb3\x9d\xed\xa0\x80' + # b'%3D-%4D-%5D-\xddNi!?#\xed\xaf\xbf\xed\xbf\xbf\xed\xb3\x8c' + 'orig_b64': ('aHRUUDovL8SGbWEuZVhhbXBsZS5DT006Lz9xPe2z' + 'ne2ggCUzRC0lNEQtJTVELd1OaSE_I-2vv-2_v-2zjA=='), + + # (`empty_path_slash`, `merge_surrogate_pairs`) + 'norm_brief': 'em', + }, + }, }, expected_result={ - 'custom': {'spam': 'ham'}, - 'url': 'something-else', - 'foo': 'bar', + 'url': ( + # note: lack of the `unicode_str` option means that + # the result of normalization is a `bytes` object + # (which is then coerced to `str`, just for the + # "url" result item, by applying the helper function + # `as_str_with_minimum_esc()`; the bytes which encode + # surrogates, paired or unpaired, as well as any + # other non-strict-UTF-8 ones are escaped using + # the `\x...` notation) + 'http://Ćma.example.com/?q=\\xed\\xb3\\x9d\\xed\\xa0\\x80' + '%3D-%4D-%5D-\\xddNi!?#\\xed\\xaf\\xbf\\xed\\xbf\\xbf\\xed\\xb3\\x8c'), + 'custom': {}, # (empty `custom` is harmless, as data spec removes it later anyway) }, - ) + ).label( + "(28) 'SY:'-prefixed `url`, `custom` with `url_data`, `url.b64` in params " + "(matching, 'em', ...)") + yield param( - # `url`, `custom`+`url_data`+other, some data + # 'SY:'-prefixed `url`, + # `custom` with `url_data`, + # *and* `url.b64` in params (but not matching!) + # -- so: `orig_b64` URL-safe-base64-decoded + normalized + # does *not* match normalized `url.b64` + # + # -> result: nothing + # + # remarks: + # * active normalization options: none + # * here lack of `merge_surrogate_pairs` is irrelevant, as + # `url.b64` and `orig_b64` contain same non-strict-UTF-8 + # bytes (i.e., surrogates and non-surrogate garbage) + # * here lack of `empty_path_slash` causes that there is + # no match, as only `url.b64` has the URL's `path` empty + filtering_params={ + 'url.b64': [ + (b'HtTp://\xc4\x86ma.ExAmPlE.cOm:80?q=\xed\xb3\x9d\xed\xa0\x80' + b'%3D-%4D-%5D-\xddNi!?#\xed\xaf\xbf\xed\xbf\xbf\xed\xb3\x8c'), + ], + }, + raw_result_dict={ + 'url': 'SY:foo:cośtam/not-important', + 'custom': { + 'url_data': { + # `orig_b64` is URL-safe-base64-encoded: + # b'htTP://\xc4\x86ma.eXample.COM:/?q=\xed\xb3\x9d\xed\xa0\x80' + # b'%3D-%4D-%5D-\xddNi!?#\xed\xaf\xbf\xed\xbf\xbf\xed\xb3\x8c' + 'orig_b64': ('aHRUUDovL8SGbWEuZVhhbXBsZS5DT006Lz9xPe2z' + 'ne2ggCUzRC0lNEQtJTVELd1OaSE_I-2vv-2_v-2zjA=='), + + # (no active normalization options) + 'norm_brief': '', + }, + }, + }, + expected_result=None, + ).label( + "(29) 'SY:'-prefixed `url`, `custom` with `url_data`, `url.b64` in params " + "(NOT matching, '', ...)") + + yield param( + # 'SY:'-prefixed `url`, + # `custom` with `url_data`, + # *and* `url.b64` in params (matching!) + # -- so: `orig_b64` URL-safe-base64-decoded + normalized + # matches normalized `url.b64` + # + # -> result: + # `url` being `orig_b64` URL-safe-base64-decoded + normalized + coerced to `str`, + # `custom` without `url_data` + # + # remarks: + # * active normalization options: none + # * here lack of `merge_surrogate_pairs` is irrelevant, as + # `url.b64` and `orig_b64` contain same non-strict-UTF-8 + # bytes (i.e., surrogates and non-surrogate garbage) + # * here lack of `empty_path_slash` is irrelevant, as + # `orig_b64` and `url.b64` have the same URL path + filtering_params={ + 'url.b64': [ + (b'HtTp://\xc4\x86ma.ExAmPlE.cOm:80/?q=\xed\xb3\x9d\xed\xa0\x80' + b'%3D-%4D-%5D-\xddNi!?#\xed\xaf\xbf\xed\xbf\xbf\xed\xb3\x8c'), + ], + }, + raw_result_dict={ + 'url': 'SY:foo:cośtam/not-important', + 'custom': { + 'url_data': { + # `orig_b64` is URL-safe-base64-encoded: + # b'htTP://\xc4\x86ma.eXample.COM:/?q=\xed\xb3\x9d\xed\xa0\x80' + # b'%3D-%4D-%5D-\xddNi!?#\xed\xaf\xbf\xed\xbf\xbf\xed\xb3\x8c' + 'orig_b64': ('aHRUUDovL8SGbWEuZVhhbXBsZS5DT006Lz9xPe2z' + 'ne2ggCUzRC0lNEQtJTVELd1OaSE_I-2vv-2_v-2zjA=='), + + # (no active normalization options) + 'norm_brief': '', + }, + }, + }, + expected_result={ + 'url': ( + # note: lack of the `unicode_str` option means that + # the result of normalization is a `bytes` object + # (which is then coerced to `str`, just for the + # "url" result item, by applying the helper function + # `as_str_with_minimum_esc()`; the bytes which encode + # surrogates, paired or unpaired, as well as any + # other non-strict-UTF-8 ones are escaped using + # the `\x...` notation) + 'http://Ćma.example.com/?q=\\xed\\xb3\\x9d\\xed\\xa0\\x80' + '%3D-%4D-%5D-\\xddNi!?#\\xed\\xaf\\xbf\\xed\\xbf\\xbf\\xed\\xb3\\x8c'), + 'custom': {}, # (empty `custom` is harmless, as data spec removes it later anyway) + }, + ).label( + "(30) 'SY:'-prefixed `url`, `custom` with `url_data`, `url.b64` in params " + "(matching, '', ...)") + + yield param( + # 'SY:'-prefixed `url`, + # `custom` with `url_data`, + # *and* `url.b64` in params (but not matching!) + # -- so: `orig_b64` URL-safe-base64-decoded + normalized + # does *not* match normalized `url.b64` + # + # -> result: nothing + # + # remarks: + # * active normalization options: + # `empty_path_slash`, `unicode_str` + # * here lack of `merge_surrogate_pairs` causes that there + # is no match + filtering_params={ + 'url.b64': [ + (b'HtTp://\xc4\x86ma.ExAmPlE.cOm:80?q=\xed\xb3\x9d\xed\xa0\x80' + b'%3D-%4D-%5D-Ni!?#\xf4\x8f\xbf\xbf\xed\xb3\x8c'), + ], + }, + raw_result_dict={ + 'url': 'SY:foo:cośtam/not-important', + 'custom': { + 'url_data': { + # `orig_b64` is URL-safe-base64-encoded: + # b'htTP://\xc4\x86ma.eXample.COM:/?q=\xed\xb3\x9d\xed\xa0\x80' + # b'%3D-%4D-%5D-Ni!?#\xed\xaf\xbf\xed\xbf\xbf\xed\xb3\x8c' + 'orig_b64': ('aHRUUDovL8SGbWEuZVhhbXBsZS5DT006Lz9xPe2z' + 'ne2ggCUzRC0lNEQtJTVELU5pIT8j7a-_7b-_7bOM'), + + # (`empty_path_slash`, `unicode_str`) + 'norm_brief': 'eu', + }, + }, + }, + expected_result=None, + ).label( + "(31) 'SY:'-prefixed `url`, `custom` with `url_data`, `url.b64` in params " + "(NOT matching, 'eu', ...)") + + yield param( + # 'SY:'-prefixed `url`, + # `custom` with `url_data`, + # *and* `url.b64` in params (but not matching!) + # -- so: `orig_b64` URL-safe-base64-decoded + normalized + # does *not* match normalized `url.b64` + # + # -> result: nothing + # + # * normalization option set to true: `empty_path_slash` + # * here lack of `merge_surrogate_pairs` causes that there + # is no match + filtering_params={ + 'url.b64': [ + (b'HtTp://\xc4\x86ma.ExAmPlE.cOm:80?q=\xed\xb3\x9d\xed\xa0\x80' + b'%3D-%4D-%5D-Ni!?#\xf4\x8f\xbf\xbf\xed\xb3\x8c'), + ], + }, + raw_result_dict={ + 'url': 'SY:foo:cośtam/not-important', + 'custom': { + 'url_data': { + # `orig_b64` is URL-safe-base64-encoded: + # b'htTP://\xc4\x86ma.eXample.COM:/?q=\xed\xb3\x9d\xed\xa0\x80' + # b'%3D-%4D-%5D-Ni!?#\xed\xaf\xbf\xed\xbf\xbf\xed\xb3\x8c' + 'orig_b64': ('aHRUUDovL8SGbWEuZVhhbXBsZS5DT006Lz9xPe2z' + 'ne2ggCUzRC0lNEQtJTVELU5pIT8j7a-_7b-_7bOM'), + + # (`empty_path_slash`) + 'norm_brief': 'e', + }, + }, + }, + expected_result=None, + ).label( + "(32) 'SY:'-prefixed `url`, `custom` with `url_data`, `url.b64` in params " + "(NOT matching, 'e', ...)") + + yield param( + # 'SY:'-prefixed `url`, + # `custom` with `url_data`, + # *and* `url.b64` in params (but not matching!) + # -- so: `orig_b64` URL-safe-base64-decoded + normalized + # does *not* match normalized `url.b64` + # + # -> result: nothing + # + # remarks: + # * active normalization options: + # `merge_surrogate_pairs`, `remove_ipv6_zone`, `unicode_str` + # * here lack of `empty_path_slash` causes that there is + # no match, as only `url.b64` has the URL's `path` empty + filtering_params={ + 'url.b64': [ + (b'HtTp://\xc4\x86ma.ExAmPlE.cOm:80?q=\xed\xb3\x9d\xed\xa0\x80' + b'%3D-%4D-%5D-Ni!?#\xf4\x8f\xbf\xbf\xed\xb3\x8c'), + ], + }, + raw_result_dict={ + 'url': 'SY:foo:cośtam/not-important', + 'custom': { + 'url_data': { + # `orig_b64` is URL-safe-base64-encoded: + # b'htTP://\xc4\x86ma.eXample.COM:/?q=\xed\xb3\x9d\xed\xa0\x80' + # b'%3D-%4D-%5D-Ni!?#\xed\xaf\xbf\xed\xbf\xbf\xed\xb3\x8c' + 'orig_b64': ('aHRUUDovL8SGbWEuZVhhbXBsZS5DT006Lz9xPe2z' + 'ne2ggCUzRC0lNEQtJTVELU5pIT8j7a-_7b-_7bOM'), + + # (`merge_surrogate_pairs`, `remove_ipv6_zone`, `unicode_str`) + 'norm_brief': 'mru', + }, + }, + }, + expected_result=None, + ).label( + "(33) 'SY:'-prefixed `url`, `custom` with `url_data`, `url.b64` in params " + "(NOT matching, 'mru', ...)") + + yield param( + # 'SY:'-prefixed `url`, + # `custom` with `url_data`, + # *and* `url.b64` in params (but not matching!) + # -- so: `orig_b64` URL-safe-base64-decoded + normalized + # does *not* match normalized `url.b64` + # + # -> result: nothing + # + # remarks: + # * active normalization options: + # `merge_surrogate_pairs`, `remove_ipv6_zone`, `unicode_str` + # * here lack of `empty_path_slash` causes that there is + # no match, as only `orig_b64` has the URL's `path` empty + filtering_params={ + 'url.b64': [ + (b'HtTp://\xc4\x86ma.ExAmPlE.cOm:80/?q=\xed\xb3\x9d\xed\xa0\x80' + b'%3D-%4D-%5D-Ni!?#\xf4\x8f\xbf\xbf\xed\xb3\x8c'), + ], + }, + raw_result_dict={ + 'url': 'SY:foo:cośtam/not-important', + 'custom': { + 'url_data': { + # `orig_b64` is URL-safe-base64-encoded: + # b'htTP://\xc4\x86ma.eXample.COM:?q=\xed\xb3\x9d\xed\xa0\x80' + # b'%3D-%4D-%5D-Ni!?#\xed\xaf\xbf\xed\xbf\xbf\xed\xb3\x8c' + 'orig_b64': ('aHRUUDovL8SGbWEuZVhhbXBsZS5DT006P3E97bOd' + '7aCAJTNELSU0RC0lNUQtTmkhPyPtr7_tv7_ts4w='), + + # (`merge_surrogate_pairs`, `remove_ipv6_zone`, `unicode_str`) + 'norm_brief': 'mru', + }, + }, + }, + expected_result=None, + ).label( + "(34) 'SY:'-prefixed `url`, `custom` with `url_data`, `url.b64` in params " + "(NOT matching, 'mru', ...)") + + yield param( + # 'SY:'-prefixed `url`, + # `custom` with `url_data`, + # *and* `url.b64` in params (matching!) + # -- so: `orig_b64` URL-safe-base64-decoded + normalized + # matches normalized `url.b64` + # + # -> result: + # `url` being `orig_b64` URL-safe-base64-decoded + normalized, + # `custom` without `url_data` + # + # remarks: + # * active normalization options: + # `merge_surrogate_pairs`, `remove_ipv6_zone`, `unicode_str` + # * here lack of `empty_path_slash` does not prevent matching, + # as both `orig_b64` and `url.b64` have the URL's `path` empty + filtering_params={ + 'url.b64': [ + (b'HtTp://\xc4\x86ma.ExAmPlE.cOm:80?q=\xed\xb3\x9d\xed\xa0\x80' + b'%3D-%4D-%5D-Ni!?#\xf4\x8f\xbf\xbf\xed\xb3\x8c'), + ], + }, + raw_result_dict={ + 'url': 'SY:foo:cośtam/not-important', + 'custom': { + 'url_data': { + # `orig_b64` is URL-safe-base64-encoded: + # b'htTP://\xc4\x86ma.eXample.COM:?q=\xed\xb3\x9d\xed\xa0\x80' + # b'%3D-%4D-%5D-Ni!?#\xed\xaf\xbf\xed\xbf\xbf\xed\xb3\x8c' + 'orig_b64': ('aHRUUDovL8SGbWEuZVhhbXBsZS5DT006P3E97bOd' + '7aCAJTNELSU0RC0lNUQtTmkhPyPtr7_tv7_ts4w='), + + # (`merge_surrogate_pairs`, `remove_ipv6_zone`, `unicode_str`) + 'norm_brief': 'mru', + }, + }, + }, + expected_result={ + 'url': 'http://Ćma.example.com?q=\udcdd\ud800%3D-%4D-%5D-Ni!?#\U0010FFFF\udccc', + 'custom': {}, # (empty `custom` is harmless, as data spec removes it later anyway) + }, + ).label( + "(35) 'SY:'-prefixed `url`, `custom` with `url_data`, `url.b64` in params " + "(matching, 'mru', ...)") + + yield param( + # 'SY:'-prefixed `url`, + # `custom` with `url_data`, + # *and* `url.b64` in params (matching!) + # -- so: `orig_b64` URL-safe-base64-decoded + normalized + # matches normalized `url.b64` + # + # -> result: + # `url` being `orig_b64` URL-safe-base64-decoded + normalized, + # `custom` without `url_data` + # + # remarks: + # * active normalization options: + # `merge_surrogate_pairs`, `remove_ipv6_zone`, `unicode_str` + # * here lack of `empty_path_slash` is irrelevant, as + # `orig_b64` and `url.b64` have the same URL path + filtering_params={ + 'url.b64': [ + (b'HtTp://\xc4\x86ma.ExAmPlE.cOm:80/?q=\xed\xb3\x9d\xed\xa0\x80' + b'%3D-%4D-%5D-Ni!?#\xf4\x8f\xbf\xbf\xed\xb3\x8c'), + ], + }, + raw_result_dict={ + 'url': 'SY:foo:cośtam/not-important', + 'custom': { + 'url_data': { + # `orig_b64` is URL-safe-base64-encoded: + # b'htTP://\xc4\x86ma.eXample.COM:/?q=\xed\xb3\x9d\xed\xa0\x80' + # b'%3D-%4D-%5D-Ni!?#\xed\xaf\xbf\xed\xbf\xbf\xed\xb3\x8c' + 'orig_b64': ('aHRUUDovL8SGbWEuZVhhbXBsZS5DT006Lz9xPe2z' + 'ne2ggCUzRC0lNEQtJTVELU5pIT8j7a-_7b-_7bOM'), + + # (`merge_surrogate_pairs`, `remove_ipv6_zone`, `unicode_str`) + 'norm_brief': 'mru', + }, + }, + }, + expected_result={ + 'url': 'http://Ćma.example.com/?q=\udcdd\ud800%3D-%4D-%5D-Ni!?#\U0010FFFF\udccc', + 'custom': {}, # (empty `custom` is harmless, as data spec removes it later anyway) + }, + ).label( + "(36) 'SY:'-prefixed `url`, `custom` with `url_data`, `url.b64` in params " + "(matching, 'mru', ...)") + + yield param( + # 'SY:'-prefixed `url`, + # `custom` with `url_data` + something else, + # unrelated data, + # *and* `url.b64` in params (some matching!) + # -- so: `orig_b64` URL-safe-base64-decoded + normalized + # matches some of normalized `url.b64` + # + # -> result: + # `url` being `orig_b64` URL-safe-base64-decoded + normalized, + # unrelated data, + # `custom` without `url_data` + # + # remarks: + # * active normalization options: + # `empty_path_slash`, `merge_surrogate_pairs`, `remove_ipv6_zone`, `unicode_str` + filtering_params={ + 'url.b64': [ + b'https://example.com/', + b'ftp://\xdd-non-UTF-8-garbage', + (b'HTTP://\xc4\x86ma.EXAMPLE.cOM:80/\xed\xb3\x9d\xed\xa0\x80' + b'Ala-ma-kota\xed\xaf\xbf\xed\xbf\xbf\xed\xb3\x8c'), + b'http://example.ORG:8080/?x=y&\xc4\x85=\xc4\x99', + ], + }, + raw_result_dict={ + 'custom': { + 'url_data': { + # `orig_b64` is URL-safe-base64-encoded: + # b'http://\xc4\x86ma.eXample.COM:80/\xed\xb3\x9d\xed\xa0\x80' + # b'Ala-ma-kota\xf4\x8f\xbf\xbf\xed\xb3\x8c' + 'orig_b64': ('aHR0cDovL8SGbWEuZVhhbXBsZS5DT006OD' + 'Av7bOd7aCAQWxhLW1hLWtvdGH0j7-_7bOM'), + + # (`empty_path_slash`, `merge_surrogate_pairs`, + # `remove_ipv6_zone`, `unicode_str`) + 'norm_brief': 'emru', + }, + 'something_else': 123, + }, + 'url': 'SY:foo:cośtam/not-important', + 'unrelated-data': 'FOO BAR !@#$%^&*()', + }, + expected_result={ + 'custom': { + 'something_else': 123, + }, + 'url': 'http://Ćma.example.com/\udcdd\ud800Ala-ma-kota\U0010FFFF\udccc', + 'unrelated-data': 'FOO BAR !@#$%^&*()', + }, + ).label( + "(37) 'SY:'-prefixed `url`, `custom` with `url_data`, unrelated data, `url.b64` in params " + "(matching, 'emru')") + + yield param( + # [analogous to previous case, but with `url_data` in legacy format] + # + # 'SY:'-prefixed `url`, + # `custom` with `url_data` + something else, + # unrelated data, # *and* `url.b64` in params (some matching!) - # -- so: app-level matching: normalized `url_orig` matched some of `url.b64` - # -> `url` being normalized `url_orig`, some data, custom without `url_data` + # -- so: `url_orig` URL-safe-base64-decoded + normalized + # matches some of normalized `url.b64` + # + # -> result: + # `url` being `url_orig` URL-safe-base64-decoded + normalized, + # unrelated data, + # `custom` without `url_data` + # + # remarks: + # * active normalization options: + # `empty_path_slash`, `merge_surrogate_pairs`, `remove_ipv6_zone`, `unicode_str` filtering_params={ 'url.b64': [ - u'https://example.com/', - u'HTTP://Ćma.EXAMPLE.cOM:80/\udcdd\ud800Ala-ma-kota\udbff\udfff\udccc', - u'http://example.ORG:8080/?x=y&ą=ę', + b'https://example.com/', + b'ftp://\xdd-non-UTF-8-garbage', + (b'HTTP://\xc4\x86ma.EXAMPLE.cOM:80/\xed\xb3\x9d\xed\xa0\x80' + b'Ala-ma-kota\xed\xaf\xbf\xed\xbf\xbf\xed\xb3\x8c'), + b'http://example.ORG:8080/?x=y&\xc4\x85=\xc4\x99', ], }, raw_result_dict={ @@ -634,32 +1548,100 @@ def cases(cls): # `url_orig` is URL-safe-base64-encoded: # b'http://\xc4\x86ma.eXample.COM:80/\xed\xb3\x9d\xed\xa0\x80' # b'Ala-ma-kota\xf4\x8f\xbf\xbf\xed\xb3\x8c' - 'url_orig': (u'aHR0cDovL8SGbWEuZVhhbXBsZS5DT006OD' - u'Av7bOd7aCAQWxhLW1hLWtvdGH0j7-_7bOM'), + 'url_orig': ('aHR0cDovL8SGbWEuZVhhbXBsZS5DT006OD' + 'Av7bOd7aCAQWxhLW1hLWtvdGH0j7-_7bOM'), + + # (legacy flags, translated to: `unicode_str`, `merge_surrogate_pairs`, + # `empty_path_slash`, `remove_ipv6_zone`) 'url_norm_opts': {'transcode1st': True, 'epslash': True, 'rmzone': True}, }, - 'spam': 123, + 'something_else': 123, }, - 'url': u'SY:foo:bar/not-important', - 'some-data': u'FOO BAR !@#$%^&*()', + 'url': 'SY:foo:cośtam/not-important', + 'unrelated-data': 'FOO BAR !@#$%^&*()', }, expected_result={ 'custom': { - 'spam': 123, + 'something_else': 123, }, - 'url': u'http://Ćma.example.com/\udcdd\ud800Ala-ma-kota\U0010FFFF\udccc', - 'some-data': u'FOO BAR !@#$%^&*()', + 'url': 'http://Ćma.example.com/\udcdd\ud800Ala-ma-kota\U0010FFFF\udccc', + 'unrelated-data': 'FOO BAR !@#$%^&*()', }, - ) + ).label( + "(38) 'SY:'-prefixed `url`, `custom` with `url_data`, unrelated data, `url.b64` in params " + "(matching, 'emru') [@legacy]") + yield param( - # `url`, `custom`+`url_data`+other, some data + # 'SY:'-prefixed `url`, + # `custom` with `url_data` + something else, + # unrelated data, # *and* *no* `url.b64` in params (so it does not constraints us...) - # -- so: *no* app-level matching - # -> `url` being normalized `url_orig`, some data, custom without `url_data` + # -- so: there is *no* application-level matching/filtering + # + # -> result: + # `url` being `orig_b64` URL-safe-base64-decoded + normalized, + # unrelated data, + # `custom` without `url_data` + # + # remarks: + # * active normalization options: + # `empty_path_slash`, `merge_surrogate_pairs`, `remove_ipv6_zone`, `unicode_str` filtering_params={ 'foobar': [ - u'https://example.com/', - u'http://example.ORG:8080/?x=y&ą=ę', + b'https://example.com/', + b'http://example.ORG:8080/?x=y&\xc4\x85=\xc4\x99', + ], + }, + raw_result_dict={ + 'custom': { + 'url_data': { + # `orig_b64` is URL-safe-base64-encoded: + # b'http://\xc4\x86ma.eXample.COM:80/\xed\xb3\x9d\xed\xa0\x80' + # b'Ala-ma-kota\xf4\x8f\xbf\xbf\xed\xb3\x8c' + 'orig_b64': ('aHR0cDovL8SGbWEuZVhhbXBsZS5DT006OD' + 'Av7bOd7aCAQWxhLW1hLWtvdGH0j7-_7bOM'), + + # (`empty_path_slash`, `merge_surrogate_pairs`, + # `remove_ipv6_zone`, `unicode_str`) + 'norm_brief': 'emru', + }, + 'something_else': 123, + }, + 'url': 'SY:foo:cośtam/not-important', + 'unrelated-data': 'FOO BAR !@#$%^&*()', + }, + expected_result={ + 'custom': { + 'something_else': 123, + }, + 'url': 'http://Ćma.example.com/\udcdd\ud800Ala-ma-kota\U0010FFFF\udccc', + 'unrelated-data': 'FOO BAR !@#$%^&*()', + }, + ).label( + "(39) 'SY:'-prefixed `url`, `custom` with `url_data`, unrelated data " + "(matching, 'emru')") + + yield param( + # [analogous to previous case, but with `url_data` in legacy format] + # + # 'SY:'-prefixed `url`, + # `custom` with `url_data` + something else, + # unrelated data, + # *and* *no* `url.b64` in params (so it does not constraints us...) + # -- so: there is *no* application-level matching/filtering + # + # -> result: + # `url` being `url_orig` URL-safe-base64-decoded + normalized, + # unrelated data, + # `custom` without `url_data` + # + # remarks: + # * active normalization options: + # `empty_path_slash`, `merge_surrogate_pairs`, `remove_ipv6_zone`, `unicode_str` + filtering_params={ + 'foobar': [ + b'https://example.com/', + b'http://example.ORG:8080/?x=y&\xc4\x85=\xc4\x99', ], }, raw_result_dict={ @@ -670,30 +1652,92 @@ def cases(cls): # b'Ala-ma-kota\xf4\x8f\xbf\xbf\xed\xb3\x8c' 'url_orig': ('aHR0cDovL8SGbWEuZVhhbXBsZS5DT006OD' 'Av7bOd7aCAQWxhLW1hLWtvdGH0j7-_7bOM'), + + # (legacy flags, translated to: `unicode_str`, `merge_surrogate_pairs`, + # `empty_path_slash`, `remove_ipv6_zone`) 'url_norm_opts': {'transcode1st': True, 'epslash': True, 'rmzone': True}, }, - 'spam': 123, + 'something_else': 123, }, - 'url': 'SY:foo:bar/not-important', - 'some-data': 'FOO BAR !@#$%^&*()', + 'url': 'SY:foo:cośtam/not-important', + 'unrelated-data': 'FOO BAR !@#$%^&*()', }, expected_result={ 'custom': { - 'spam': 123, + 'something_else': 123, + }, + 'url': 'http://Ćma.example.com/\udcdd\ud800Ala-ma-kota\U0010FFFF\udccc', + 'unrelated-data': 'FOO BAR !@#$%^&*()', + }, + ).label( + "(40) 'SY:'-prefixed `url`, `custom` with `url_data`, unrelated data " + "(matching, 'emru') [@legacy]") + + yield param( + # 'SY:'-prefixed `url`, + # `custom` with `url_data` + something else, + # unrelated data, + # *and* `url.b64` in params (but none matching!) + # -- so: `orig_b64` URL-safe-base64-decoded + normalized + # does *not* match any of normalized `url.b64` + # + # -> result: nothing + # + # remarks: + # * active normalization options: + # `empty_path_slash`, `merge_surrogate_pairs`, `remove_ipv6_zone`, `unicode_str` + filtering_params={ + 'url.b64': [ + b'https://example.com/', + b'http://example.ORG:8080/?x=y&\xc4\x85=\xc4\x99', + (b'http://\xc4\x86ma.eXample.COM:80/\xdd\xed\xa0\x80' + b'Ala-ma-kota\xf4\x8f\xbf\xbf\xed\xb3\x8c'), + ], + }, + raw_result_dict={ + 'custom': { + 'url_data': { + # `orig_b64` is URL-safe-base64-encoded: + # b'http://\xc4\x86ma.eXample.COM:80/\xed\xb3\x9d\xed\xa0\x80' + # b'Ala-ma-kota\xf4\x8f\xbf\xbf\xed\xb3\x8c' + 'orig_b64': ('aHR0cDovL8SGbWEuZVhhbXBsZS5DT006OD' + 'Av7bOd7aCAQWxhLW1hLWtvdGH0j7-_7bOM'), + + # (`empty_path_slash`, `merge_surrogate_pairs`, + # `remove_ipv6_zone`, `unicode_str`) + 'norm_brief': 'emru', + }, + 'something_else': 123, }, - 'url': u'http://Ćma.example.com/\udcdd\ud800Ala-ma-kota\U0010FFFF\udccc', - 'some-data': 'FOO BAR !@#$%^&*()', + 'url': 'SY:foo:cośtam/not-important', + 'unrelated-data': 'FOO BAR !@#$%^&*()', }, - ) + expected_result=None, + ).label( + "(41) 'SY:'-prefixed `url`, `custom` with `url_data`, unrelated data, `url.b64` in params " + "(NOT matching, 'emru')") + yield param( - # `url`, `custom`+`url_data`+other, some data + # [analogous to previous case, but with `url_data` in legacy format] + # + # 'SY:'-prefixed `url`, + # `custom` with `url_data` + something else, + # unrelated data, # *and* `url.b64` in params (but none matching!) - # -- so: app-level matching: normalized `url_orig` did *not* matched any of `url.b64` - # -> nothing + # -- so: `url_orig` URL-safe-base64-decoded + normalized + # does *not* match any of normalized `url.b64` + # + # -> result: nothing + # + # remarks: + # * active normalization options: + # `empty_path_slash`, `merge_surrogate_pairs`, `remove_ipv6_zone`, `unicode_str` filtering_params={ 'url.b64': [ - u'https://example.com/', - u'http://example.ORG:8080/?x=y&ą=ę', + b'https://example.com/', + b'http://example.ORG:8080/?x=y&\xc4\x85=\xc4\x99', + (b'http://\xc4\x86ma.eXample.COM:80/\xdd\xed\xa0\x80' + b'Ala-ma-kota\xf4\x8f\xbf\xbf\xed\xb3\x8c'), ], }, raw_result_dict={ @@ -702,38 +1746,120 @@ def cases(cls): # `url_orig` is URL-safe-base64-encoded: # b'http://\xc4\x86ma.eXample.COM:80/\xed\xb3\x9d\xed\xa0\x80' # b'Ala-ma-kota\xf4\x8f\xbf\xbf\xed\xb3\x8c' - 'url_orig': (u'aHR0cDovL8SGbWEuZVhhbXBsZS5DT006OD' - u'Av7bOd7aCAQWxhLW1hLWtvdGH0j7-_7bOM'), + 'url_orig': ('aHR0cDovL8SGbWEuZVhhbXBsZS5DT006OD' + 'Av7bOd7aCAQWxhLW1hLWtvdGH0j7-_7bOM'), + + # (legacy flags, translated to: `unicode_str`, `merge_surrogate_pairs`, + # `empty_path_slash`, `remove_ipv6_zone`) 'url_norm_opts': {'transcode1st': True, 'epslash': True, 'rmzone': True}, }, - 'spam': 123, + 'something_else': 123, }, - 'url': u'SY:foo:bar/not-important', - 'some-data': u'FOO BAR !@#$%^&*()', + 'url': 'SY:foo:cośtam/not-important', + 'unrelated-data': 'FOO BAR !@#$%^&*()', }, expected_result=None, - ) + ).label( + "(42) 'SY:'-prefixed `url`, `custom` with `url_data`, unrelated data, `url.b64` in params " + "(NOT matching, 'emru') [@legacy]") + yield param( - # `url`, `custom`+`url_data`+other, some data - # *and* (although none of `url.b64` matches in params) *url_normalization_data_cache* - # containing some matching (fake) stuff... - # -- so: app-level matching: fake-normalizer-processed `url_orig` matched something... - # -> `url` being fake-normalizer-processed `url_orig`, etc. ... + # [this example is not realistic -- yet it helps to test + # URL-normalization-data-cache-related machinery...] + # + # 'SY:'-prefixed `url`, + # `custom` with `url_data` + something else, + # unrelated data, + # *and* *url_normalization_data_cache* containing some matching + # (fake) stuff (even though none of `url.b64` matches) + # -- so: `orig_b64` URL-safe-base64-decoded + fake-normalizer-processed + # matches something... + # + # -> result: + # `url` being `orig_b64` URL-safe-base64-decoded + fake-normalizer-processed, + # unrelated data, + # `custom` without `url_data` filtering_params={ 'url.b64': [ - u'https://example.com/', - u'http://example.ORG:8080/?x=y&ą=ę', + b'https://example.com/', + b'http://example.ORG:8080/?x=y&\xc4\x85=\xc4\x99', ], }, url_normalization_data_cache={ - (('epslash', True), ('rmzone', True), ('transcode1st', True)): ( + 'emru': ( # "cached" normalizer (here it is fake, of course): - bytes.upper, + lambda b: b.upper().decode('utf-8', 'replace'), # "cached" normalized `url.b64` param values (here fake, of course): [ - b'HTTP://\xc4\x86MA.EXAMPLE.COM:80/\xed\xb3\x9d\xed\xa0\x80' - b'ALA-MA-KOTA\xf4\x8f\xbf\xbf\xed\xb3\x8c', + ('HTTP://ĆMA.EXAMPLE.COM:80/\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd' + 'ALA-MA-KOTA\U0010ffff\ufffd\ufffd\ufffd'), + 'foo-bar-irrelevant-val', + ] + ), + }, + raw_result_dict={ + 'custom': { + 'url_data': { + # `orig_b64` is URL-safe-base64-encoded: + # b'http://\xc4\x86ma.eXample.COM:80/\xed\xb3\x9d\xed\xa0\x80' + # b'Ala-ma-kota\xf4\x8f\xbf\xbf\xed\xb3\x8c' + 'orig_b64': ('aHR0cDovL8SGbWEuZVhhbXBsZS5DT006OD' + 'Av7bOd7aCAQWxhLW1hLWtvdGH0j7-_7bOM'), + 'norm_brief': 'emru', + }, + 'something_else': 123, + }, + 'url': 'SY:foo:cośtam/not-important', + 'unrelated-data': 'FOO BAR !@#$%^&*()', + }, + expected_result={ + 'custom': { + 'something_else': 123, + }, + 'url': ( + 'HTTP://ĆMA.EXAMPLE.COM:80/\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd' + 'ALA-MA-KOTA\U0010ffff\ufffd\ufffd\ufffd'), + 'unrelated-data': 'FOO BAR !@#$%^&*()', + }, + ).label( + "(43) 'SY:'-prefixed `url`, `custom` with `url_data`, unrelated data, " + "faked normalization cache (matching, 'emru')") + + yield param( + # [this example is not realistic -- yet it helps to test + # URL-normalization-data-cache-related machinery...] + # + # [analogous to previous case, but with `url_data` in legacy format] + # + # 'SY:'-prefixed `url`, + # `custom` with `url_data` + something else, + # unrelated data, + # *and* *url_normalization_data_cache* containing some matching + # (fake) stuff (even though none of `url.b64` matches) + # -- so: `url_orig` URL-safe-base64-decoded + fake-normalizer-processed + # matches something... + # + # -> result: + # `url` being `url_orig` URL-safe-base64-decoded + fake-normalizer-processed, + # unrelated data, + # `custom` without `url_data` + filtering_params={ + 'url.b64': [ + b'https://example.com/', + b'http://example.ORG:8080/?x=y&\xc4\x85=\xc4\x99', + ], + }, + url_normalization_data_cache={ + 'emru': ( + # "cached" normalizer (here it is fake, of course): + lambda b: b.upper().decode('utf-8', 'replace'), + + # "cached" normalized `url.b64` param values (here fake, of course): + [ + ('HTTP://ĆMA.EXAMPLE.COM:80/\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd' + 'ALA-MA-KOTA\U0010ffff\ufffd\ufffd\ufffd'), + 'foo-bar-irrelevant-val', ] ), }, @@ -743,35 +1869,79 @@ def cases(cls): # `url_orig` is URL-safe-base64-encoded: # b'http://\xc4\x86ma.eXample.COM:80/\xed\xb3\x9d\xed\xa0\x80' # b'Ala-ma-kota\xf4\x8f\xbf\xbf\xed\xb3\x8c' - 'url_orig': (u'aHR0cDovL8SGbWEuZVhhbXBsZS5DT006OD' - u'Av7bOd7aCAQWxhLW1hLWtvdGH0j7-_7bOM'), + 'url_orig': ('aHR0cDovL8SGbWEuZVhhbXBsZS5DT006OD' + 'Av7bOd7aCAQWxhLW1hLWtvdGH0j7-_7bOM'), 'url_norm_opts': {'transcode1st': True, 'epslash': True, 'rmzone': True}, }, - 'spam': 123, + 'something_else': 123, }, - 'url': u'SY:foo:bar/not-important', - 'some-data': u'FOO BAR !@#$%^&*()', + 'url': 'SY:foo:cośtam/not-important', + 'unrelated-data': 'FOO BAR !@#$%^&*()', }, expected_result={ 'custom': { - 'spam': 123, + 'something_else': 123, }, - # note: still bytes because of the above fake normalizer (`bytes.upper`)... - 'url': (b'HTTP://\xc4\x86MA.EXAMPLE.COM:80/\xed\xb3\x9d\xed\xa0\x80' - b'ALA-MA-KOTA\xf4\x8f\xbf\xbf\xed\xb3\x8c'), - 'some-data': u'FOO BAR !@#$%^&*()', + 'url': ( + 'HTTP://ĆMA.EXAMPLE.COM:80/\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd' + 'ALA-MA-KOTA\U0010ffff\ufffd\ufffd\ufffd'), + 'unrelated-data': 'FOO BAR !@#$%^&*()', }, - ) + ).label( + "(44) 'SY:'-prefixed `url`, `custom` with `url_data`, unrelated data, " + "faked normalization cache (matching, 'emru') [@legacy]") + yield param( + # [this example is not realistic -- yet it helps to test + # URL-normalization-data-cache-related machinery...] + # # similar situation but even *without* the `url.b64` params (but # that does not matter, as what is important is the cache!) url_normalization_data_cache={ - (('epslash', True), ('rmzone', True), ('transcode1st', True)): ( + 'emru': ( + # "cached" normalizer (here it is fake, of course): + lambda b: b.title().decode('utf-8'), + # "cached" normalized `url.b64` param values (here fake, of course): + [ + 'Https://Example.Com:', + ] + ), + }, + raw_result_dict={ + 'custom': { + 'url_data': { + # `orig_b64` is URL-safe-base64-encoded: b`https://example.com:` + 'orig_b64': ('aHR0cHM6Ly9leGFtcGxlLmNvbTo='), + 'norm_brief': 'emru', + }, + 'something_else': 123, + }, + 'url': 'SY:foo:cośtam/not-important', + 'unrelated-data': b'FOO BAR !@#$%^&*()', + }, + expected_result={ + 'custom': { + 'something_else': 123, + }, + 'url': 'Https://Example.Com:', + 'unrelated-data': b'FOO BAR !@#$%^&*()', + }, + ).label( + "(45) 'SY:'-prefixed `url`, `custom` with `url_data`, unrelated data, " + "faked normalization cache (matching, 'emru')") + + yield param( + # [this example is not realistic -- yet it helps to test + # URL-normalization-data-cache-related machinery...] + # + # [analogous to previous case, but with `url_data` in legacy format] + url_normalization_data_cache={ + 'emru': ( # "cached" normalizer (here it is fake, of course): - lambda b: b.upper().decode('utf-8'), + lambda b: b.title().decode('utf-8'), # "cached" normalized `url.b64` param values (here fake, of course): [ - u'HTTPS://EXAMPLE.COM:', + 'Https://Example.Com:', ] ), }, @@ -782,26 +1952,29 @@ def cases(cls): 'url_orig': ('aHR0cHM6Ly9leGFtcGxlLmNvbTo='), 'url_norm_opts': {'transcode1st': True, 'epslash': True, 'rmzone': True}, }, - 'spam': 123, + 'something_else': 123, }, - 'url': 'SY:foo:bar/not-important', - 'some-data': b'FOO BAR !@#$%^&*()', + 'url': 'SY:foo:cośtam/not-important', + 'unrelated-data': b'FOO BAR !@#$%^&*()', }, expected_result={ 'custom': { - 'spam': 123, + 'something_else': 123, }, - 'url': u'HTTPS://EXAMPLE.COM:', - 'some-data': b'FOO BAR !@#$%^&*()', + 'url': 'Https://Example.Com:', + 'unrelated-data': b'FOO BAR !@#$%^&*()', }, - ) + ).label( + "(46) 'SY:'-prefixed `url`, `custom` with `url_data`, unrelated data, " + "faked normalization cache (matching, 'emru') [@legacy]") @foreach(cases) - def test(self, raw_result_dict, expected_result, + def test(self, raw_result_dict, expected_result, expected_log_regexes=(), filtering_params=None, url_normalization_data_cache=None): + raw_result_dict = copy.deepcopy(raw_result_dict) mock = MagicMock() - meth = MethodProxy(_EventsQueryProcessor, mock) + meth = MethodProxy(_EventsQueryProcessor, mock, class_attrs='_call_silencing_decode_err') mock._filtering_params = ( copy.deepcopy(filtering_params) if filtering_params is not None @@ -810,6 +1983,8 @@ def test(self, raw_result_dict, expected_result, url_normalization_data_cache if url_normalization_data_cache is not None else {}) - raw_result_dict = copy.deepcopy(raw_result_dict) - actual_result = meth._preprocess_result_dict(raw_result_dict) + with self.assertLogRegexes(module_logger, expected_log_regexes): + + actual_result = meth._preprocess_result_dict(raw_result_dict) + self.assertEqualIncludingTypes(actual_result, expected_result) diff --git a/N6Lib/n6lib/tests/test_data_spec.py b/N6Lib/n6lib/tests/test_data_spec.py index 844c8bf..9e962c0 100644 --- a/N6Lib/n6lib/tests/test_data_spec.py +++ b/N6Lib/n6lib/tests/test_data_spec.py @@ -220,7 +220,8 @@ class TestN6DataSpec(TestCaseMixin, unittest.TestCase): 'active.min': [datetime.datetime(2015, 5, 3)], 'url': [u'http://www.ołówek.EXAMPLĘ.com/\udcddπœ\udcffę\udcff³¢ą.py'], 'url.sub': [u'xx' + 682 * u'\udccc'], - 'url.b64': [u'http://www.ołówek.EXAMPLĘ.com/\udcddπœ\udcffę\udcff³¢ą.py'], + 'url.b64': [b'http://www.o\xc5\x82\xc3\xb3wek.EXAMPL\xc4\x98.com/' + b'\xdd\xcf\x80\xc5\x93\xff\xc4\x99\xed\xb3\xbf\xc2\xb3\xc2\xa2\xc4\x85.py'], 'fqdn': ['www.test.org', 'www.xn--owek-qqa78b.xn--exampl-14a.com'], 'fqdn.sub': ['xn--owek-qqa78b'], 'opt.primary': [True], @@ -266,11 +267,34 @@ class TestN6DataSpec(TestCaseMixin, unittest.TestCase): 'cc': 'PL', 'asn': 80000, }, + { + # not a fully valid item -- to be skipped because: + # * `ip` is equal to `n6lib.const.LACK_OF_IPv4_PLACEHOLDER_AS_STR` (see #8861) + 'ip': '0.0.0.0', + 'cc': 'AB', + }, + { + # not a fully valid item -- to be skipped because: + # * `ip` is missing + 'asn': 123456789, + }, + { + # not a fully valid item -- to be skipped because: + # * `ip` is None + 'ip': None, + 'cc': 'AB', + 'asn': 123456789, + }, { 'ip': '10.0.255.128', 'cc': 'US', 'asn': '65535.65535', }, + { + 'ip': '123.123.123.123', + 'cc': None, # <- to be skipped + 'asn': None, # <- to be skipped + }, ], 'dport': 1234, 'time': datetime.datetime( @@ -301,6 +325,9 @@ class TestN6DataSpec(TestCaseMixin, unittest.TestCase): 'cc': 'US', 'asn': 4294967295, }, + { + 'ip': '123.123.123.123', + }, ], 'dport': 1234, 'time': datetime.datetime(2014, 3, 31, 23, 7, 42), @@ -452,6 +479,47 @@ def test__restricted_result_keys(self): # 'anonymized.source' deanonymized, 'some.other' skipped source=['some.source']), ), + param( + raw=dict( + raw_param_dict_base, + ip=['0.0.0.1']), + full_access=True, + res_limits={'request_parameters': None}, + expected_cleaned=dict( + cleaned_param_dict_base, + ip=['0.0.0.1']), + ), + param( + raw=dict( + raw_param_dict_base, + ip=['0.0.0.1']), + full_access=False, + res_limits={'request_parameters': None}, + expected_cleaned=dict( + cleaned_param_dict_base, + ip=['0.0.0.1'], + source=[]), + ), + param( + raw=dict( + raw_param_dict_base, + # 'ip' equal to `n6lib.const.LACK_OF_IPv4_PLACEHOLDER_AS_STR` is illegal + # (see #8861) + ip=['0.0.0.0']), + full_access=True, + res_limits={'request_parameters': None}, + expected_error=ParamValueCleaningError, + ), + param( + raw=dict( + raw_param_dict_base, + # 'ip' equal to `n6lib.const.LACK_OF_IPv4_PLACEHOLDER_AS_STR` is illegal + # (see #8861) + ip=['0.0.0.0']), + full_access=False, + res_limits={'request_parameters': None}, + expected_error=ParamValueCleaningError, + ), param( raw=dict( raw_param_dict_base, @@ -462,6 +530,16 @@ def test__restricted_result_keys(self): cleaned_param_dict_base, dip=['0.10.20.30']), ), + param( + raw=dict( + raw_param_dict_base, + # 'dip' equal to `n6lib.const.LACK_OF_IPv4_PLACEHOLDER_AS_STR` is illegal + # (see #8861) + dip=['0.0.0.0']), + full_access=True, + res_limits={'request_parameters': None}, + expected_error=ParamValueCleaningError, + ), param( raw=dict( raw_param_dict_base, @@ -627,6 +705,47 @@ def test__restricted_result_keys(self): # 'anonymized.source' deanonymized, 'some.other' skipped source=['some.source']), ), + param( + raw=dict( + raw_param_dict_base, + ip=['0.0.0.1']), + full_access=True, + res_limits={'request_parameters': request_parameters}, + expected_cleaned=dict( + cleaned_param_dict_base, + ip=['0.0.0.1']), + ), + param( + raw=dict( + raw_param_dict_base, + ip=['0.0.0.1']), + full_access=False, + res_limits={'request_parameters': request_parameters}, + expected_cleaned=dict( + cleaned_param_dict_base, + ip=['0.0.0.1'], + source=[]), + ), + param( + raw=dict( + raw_param_dict_base, + # 'ip' equal to `n6lib.const.LACK_OF_IPv4_PLACEHOLDER_AS_STR` is illegal + # (see #8861) + ip=['0.0.0.0']), + full_access=True, + res_limits={'request_parameters': request_parameters}, + expected_error=ParamValueCleaningError, + ), + param( + raw=dict( + raw_param_dict_base, + # 'ip' equal to `n6lib.const.LACK_OF_IPv4_PLACEHOLDER_AS_STR` is illegal + # (see #8861) + ip=['0.0.0.0']), + full_access=False, + res_limits={'request_parameters': request_parameters}, + expected_error=ParamValueCleaningError, + ), param( raw=dict( raw_param_dict_base, @@ -637,6 +756,16 @@ def test__restricted_result_keys(self): cleaned_param_dict_base, dip=['0.10.20.30']), ), + param( + raw=dict( + raw_param_dict_base, + # 'dip' equal to `n6lib.const.LACK_OF_IPv4_PLACEHOLDER_AS_STR` is illegal + # (see #8861) + dip=['0.0.0.0']), + full_access=True, + res_limits={'request_parameters': request_parameters}, + expected_error=ParamValueCleaningError, + ), param( raw=dict( raw_param_dict_base, @@ -1108,6 +1237,9 @@ def test__clean_param_dict(self, raw, full_access, res_limits, 'cc': 'US', 'asn': 4294967295, }, + { + 'ip': '123.123.123.123', + }, ]), ), param( @@ -1130,6 +1262,9 @@ def test__clean_param_dict(self, raw, full_access, res_limits, 'cc': 'US', 'asn': 4294967295, }, + { + 'ip': '123.123.123.123', + }, ]), ), param( @@ -1143,7 +1278,10 @@ def test__clean_param_dict(self, raw, full_access, res_limits, opt_primary=True, expected_cleaned=dict( cleaned_result_dict_base, - address=[{'ip': '100.101.102.103'}]), + address=[ + {'ip': '100.101.102.103'}, + {'ip': '123.123.123.123'}, + ]), ), param( raw=dict( @@ -1156,7 +1294,10 @@ def test__clean_param_dict(self, raw, full_access, res_limits, opt_primary=True, expected_cleaned=dict( restricted_access_cleaned_result_dict_base, - address=[{'ip': '100.101.102.103'}]), + address=[ + {'ip': '100.101.102.103'}, + {'ip': '123.123.123.123'}, + ]), ), param( raw=dict( @@ -1212,6 +1353,7 @@ def test__clean_param_dict(self, raw, full_access, res_limits, '100.101.102.103': ['ip', 'cc', 'asn'], '10.0.255.128': ['ip', 'cc', 'asn'], '1.2.8.9': ['ip'], # '1.2.8.9' non-existent + '123.123.123.123': ['ip'], })), full_access=True, opt_primary=True, @@ -1225,6 +1367,7 @@ def test__clean_param_dict(self, raw, full_access, res_limits, enriched=([], { '100.101.102.103': ['ip', 'cc', 'asn'], '10.0.255.128': ['ip', 'cc', 'asn'], + '123.123.123.123': ['ip'], })), full_access=False, opt_primary=True, @@ -1238,6 +1381,7 @@ def test__clean_param_dict(self, raw, full_access, res_limits, enriched=(['fqdn'], { '100.101.102.103': ['ip', 'cc', 'asn'], '10.0.255.128': ['ip', 'cc', 'asn'], + '123.123.123.123': ['ip'], })), full_access=True, opt_primary=True, @@ -1251,6 +1395,7 @@ def test__clean_param_dict(self, raw, full_access, res_limits, enriched=(['fqdn'], { '100.101.102.103': ['ip', 'cc', 'asn'], '10.0.255.128': ['ip', 'cc', 'asn'], + '123.123.123.123': ['ip'], })), full_access=False, opt_primary=True, @@ -1265,7 +1410,7 @@ def test__clean_param_dict(self, raw, full_access, res_limits, raw_result_dict_base, enriched=(['fqdn'], { '100.101.102.103': ['ip', 'cc', 'asn'], - '10.0.255.128': ['ip', 'cc', 'asn'], + '123.123.123.123': ['ip'], })), full_access=True, opt_primary=False, @@ -1283,6 +1428,85 @@ def test__clean_param_dict(self, raw, full_access, res_limits, expected_cleaned=dict(restricted_access_cleaned_result_dict_base), ), + # not a fully valid raw result dict: no 'address', but 'ip' or 'asn' or 'cc' present... + param( + raw=dict( + {k: v for k, v in raw_result_dict_base.items() + if k != 'address'}, + ip='1.2.3.4', + asn=123, + cc='PL'), + full_access=True, + expected_cleaned=dict( + cleaned_result_dict_base, + address=[{ + 'ip': '1.2.3.4', + 'asn': 123, + 'cc': 'PL', + }]), + ), + param( + raw=dict( + {k: v for k, v in raw_result_dict_base.items() + if k != 'address'}, + ip='1.2.3.4', + asn=None, + cc='PL'), + full_access=False, + expected_cleaned=dict( + restricted_access_cleaned_result_dict_base, + address=[{ + 'ip': '1.2.3.4', + 'cc': 'PL', + }]), + ), + param( + raw=dict( + {k: v for k, v in raw_result_dict_base.items() + if k != 'address'}, + ip=None), + full_access=True, + expected_cleaned={ + # `ip` is missing or None => no `address` + k: v for k, v in cleaned_result_dict_base.items() + if k != 'address'}, + ), + param( + raw=dict( + {k: v for k, v in raw_result_dict_base.items() + if k != 'address'}, + ip=None, + cc='PL'), + full_access=False, + expected_cleaned={ + # `ip` is missing or None => no `address` + k: v for k, v in restricted_access_cleaned_result_dict_base.items() + if k != 'address'}, + ), + param( + raw=dict( + {k: v for k, v in raw_result_dict_base.items() + if k != 'address'}, + asn=123), + full_access=True, + expected_cleaned={ + # `ip` is missing or None => no `address` + k: v for k, v in cleaned_result_dict_base.items() + if k != 'address'}, + ), + param( + raw=dict( + {k: v for k, v in raw_result_dict_base.items() + if k != 'address'}, + asn=123, + cc='PL'), + full_access=False, + expected_cleaned={ + # `ip` is missing or None => no `address` + k: v for k, v in restricted_access_cleaned_result_dict_base.items() + if k != 'address'}, + ), + # 'urls_matched' param( raw=dict( diff --git a/N6Lib/n6lib/tests/test_data_spec_fields.py b/N6Lib/n6lib/tests/test_data_spec_fields.py index 21d85ed..1bd59e6 100644 --- a/N6Lib/n6lib/tests/test_data_spec_fields.py +++ b/N6Lib/n6lib/tests/test_data_spec_fields.py @@ -10,6 +10,7 @@ import n6lib.data_spec.fields as n6_fields import n6sdk.data_spec.fields as sdk_fields import n6sdk.tests.test_data_spec_fields as sdk_tests +from n6lib.common_helpers import as_bytes from n6sdk.exceptions import FieldValueError from n6sdk.tests.test_data_spec_fields import ( FieldTestMixin, @@ -190,197 +191,76 @@ def cases__clean_param_value(self): ) yield case( given='aHRUUDovL3d3dy50ZXN0LnBs', - expected=u'htTP://www.test.pl', + expected=b'htTP://www.test.pl', ) yield case( - given=u'SFR0cDovL3d3dy50ZXN0LnBsL2NnaS1iaW4vZm9vLnBsPw==', - expected=u'HTtp://www.test.pl/cgi-bin/foo.pl?', - ) - yield case( - given=u'aHR0cDovL3d3dy50ZXN0LcSHLnBsL2NnaS9iaW4vZm9vLnBsP2RlYnVnPTEmaWQ9MTIz', - expected=u'http://www.test-ć.pl/cgi/bin/foo.pl?debug=1&id=123', + given='aHRUUDovL3d3dy50ZXN0LnBs\r\n', # with trailing `\r\n` + expected=b'htTP://www.test.pl', ) yield case( - given=(u'aHR0cDovL3d3dy5URVNULcSGLnBsL2NnaS1iaW4vYmFyLnBsP21vZGU9YnJvd3NlJm' - u'FtcDtkZWJ1Zz0lMjAxMjMmYW1wO2lkPWstJTVE'), - expected=(u'http://www.TEST-Ć.pl/cgi-bin/bar.pl?mode=browse&' - u'debug=%20123&id=k-%5D'), + given='aHRUUDovL3d3dy50ZXN0LnBs%0D%0A', # with trailing `\r\n`, %-encoded + expected=b'htTP://www.test.pl', ) yield case( - given=u'aHR0cDovL3TEmXN0LnBsL2bDs8OzL0Jhci_dP3E9z4DFk8SZwqnDn-KGkDMjdHJhbGFsYQk=', - expected=u'http://tęst.pl/fóó/Bar/\udcdd?q=πœę©ß←3#tralala\t', + given='aHRUUDovL3d3dy50ZXN0LnBs%250D%250A', # with trailing `\r\n`, 2 x %-encoded + expected=b'htTP://www.test.pl', ) - # the same but encoded with standard Base64 (not the required URL-safe-Base64) yield case( - given=u'aHR0cDovL3TEmXN0LnBsL2bDs8OzL0Jhci/dP3E9z4DFk8SZwqnDn+KGkDMjdHJhbGFsYQk=', - expected=FieldValueError, - ) - yield case( - given=u'aHR0cDovL3Rlc3QucGw=', - expected=u'http://test.pl', - ) - # the same with redundant padding - yield case( - given='aHR0cDovL3Rlc3QucGw==', - expected=u'http://test.pl', - ) - yield case( - given=u'aHR0cDovL3Rlc3QucGw===', - expected=u'http://test.pl', - ) - # the same with redundant padding and ignored characters after it - yield case( - given='aHR0cDovL3Rlc3QucGw===abcdef', - expected=u'http://test.pl', - ) - yield case( - given=u'aHR0cDovL3Rlc3QucGw=========abcdef', - expected=u'http://test.pl', - ) - # the same with redundant padding and illegal characters after it - yield case( - given='aHR0cDovL3Rlc3QucGw===ąć/', - expected=FieldValueError, - ) - yield case( - given=u'aHR0cDovL3Rlc3QucGw=========ąć/', - expected=FieldValueError, - ) - # the same with missing padding - yield case( - given=u'aHR0cDovL3Rlc3QucGw', - expected=FieldValueError, - ) - yield case( - given=(u'aHR0cDovL2V4YW1wbGUubmV0L3NlYXJjaC5waHA' - u'_cT3RgNCw0LfQvdGL0LUr0LDQstGC0L7RgNGLI-KJoMKywrMNCg=='), - expected=(u'http://example.net/search.php?q=разные+авторы#≠²³\r\n'), - ) - # the same with additional %-encoding: - yield case( - given=(u'aHR0cDovL2V4YW1wbGUubmV0L3NlYXJjaC5waHA' - u'_cT3RgNCw0LfQvdGL0LUr0LDQstGC0L7RgNGLI-KJoMKywrMNCg%3D%3D'), - expected=(u'http://example.net/search.php?q=разные+авторы#≠²³\r\n'), - ) - # the same with 2 x additional %-encoding (2nd is overzealous and lowercase-based): - yield case( - given=(u'aHR0cDovL2V4YW1wbGUubmV0L3NlYXJjaC5wa%48%41' - u'%5fcT3RgNCw0LfQvdGL0LUr0LDQstGC0L7RgNGLI%2dKJoMKywrMNCg%253D%253D'), - expected=(u'http://example.net/search.php?q=разные+авторы#≠²³\r\n'), - ) - # the same with 3 x additional %-encoding (2nd is overzealous and lowercase-based): - yield case( - given=(u'aHR0cDovL2V4YW1wbGUubmV0L3NlYXJjaC5wa%2548%2541' - u'%255fcT3RgNCw0LfQvdGL0LUr0LDQstGC0L7RgNGLI%252dKJoMKywrMNCg%25253D%25253D'), - expected=(u'http://example.net/search.php?q=разные+авторы#≠²³\r\n'), - ) - yield case( - given=u'', - expected=u'', - ) - # containing non-UTF-8 characters (-> to low surrogates) - yield case( - given=u'aHR0cHM6Ly9kZN3u', - expected=u'https://dd\udcdd\udcee', - ) - # as UTF-8 with low surrogates already encoded - yield case( - given=u'aHR0cHM6Ly9kZO2zne2zrg==', - expected=u'https://dd\udcdd\udcee', - ) - # the `%` character not being part of %-encoded stuff - yield case( - given='%AZ', - expected=FieldValueError, - ) - yield case( - given=u'aHR0cDovL3Rlc3QucGw=%a', - expected=FieldValueError, - ) - - def cases__clean_result_value(self): - yield case( - given=b'http://www.test.pl', - expected=FieldValueError, - ) - yield case( - given=u'HTtp://www.test.pl/cgi-bin/foo.pl?', - expected=FieldValueError, - ) - yield case( - given=b'aHRUUDovL3d3dy50ZXN0LnBs', - expected=u'htTP://www.test.pl', + given='aHRUUDovL3d3dy50ZXN0LnBs%25250D%25250A', # with trailing `\r\n`, 3 x %-encoded + expected=b'htTP://www.test.pl', ) yield case( given=u'SFR0cDovL3d3dy50ZXN0LnBsL2NnaS1iaW4vZm9vLnBsPw==', - expected=u'HTtp://www.test.pl/cgi-bin/foo.pl?', - ) - yield case( - given=b'aHR0cDovL3d3dy50ZXN0LcSHLnBsL2NnaS9iaW4vZm9vLnBsP2RlYnVnPTEmaWQ9MTIz', - expected=u'http://www.test-ć.pl/cgi/bin/foo.pl?debug=1&id=123', + expected=b'HTtp://www.test.pl/cgi-bin/foo.pl?', ) yield case( given=u'aHR0cDovL3d3dy50ZXN0LcSHLnBsL2NnaS9iaW4vZm9vLnBsP2RlYnVnPTEmaWQ9MTIz', - expected=u'http://www.test-ć.pl/cgi/bin/foo.pl?debug=1&id=123', - ) - yield case( - given=(b'aHR0cDovL3d3dy5URVNULcSGLnBsL2NnaS1iaW4vYmFyLnBsP21vZGU9YnJvd3NlJm' - b'FtcDtkZWJ1Zz0lMjAxMjMmYW1wO2lkPWstJTVE'), - expected=(u'http://www.TEST-Ć.pl/cgi-bin/bar.pl?mode=browse&' - u'debug=%20123&id=k-%5D'), - ) - yield case( - given=(u'aHR0cDovL3d3dy5URVNULcSGLnBsL2NnaS1iaW4vYmFyLnBsP21vZGU9YnJvd3NlJm' - u'FtcDtkZWJ1Zz0lMjAxMjMmYW1wO2lkPWstJTVE'), - expected=(u'http://www.TEST-Ć.pl/cgi-bin/bar.pl?mode=browse&' - u'debug=%20123&id=k-%5D'), + expected=as_bytes('http://www.test-ć.pl/cgi/bin/foo.pl?debug=1&id=123'), ) yield case( - given=b'aHR0cDovL3TEmXN0LnBsL2bDs8OzL0Jhci_dP3E9z4DFk8SZwqnDn-KGkDMjdHJhbGFsYQk=', - expected=u'http://tęst.pl/fóó/Bar/\udcdd?q=πœę©ß←3#tralala\t', + given=( + 'aHR0cDovL3d3dy5URVNULcSGLnBsL2NnaS1iaW4vYmFyLnBsP21vZGU9YnJvd3NlJm' + 'FtcDtkZWJ1Zz0lMjAxMjMmYW1wO2lkPWstJTVE'), + expected=as_bytes( + 'http://www.TEST-Ć.pl/cgi-bin/bar.pl?mode=browse&' + 'debug=%20123&id=k-%5D'), ) yield case( - given=u'aHR0cDovL3TEmXN0LnBsL2bDs8OzL0Jhci_dP3E9z4DFk8SZwqnDn-KGkDMjdHJhbGFsYQk=', - expected=u'http://tęst.pl/fóó/Bar/\udcdd?q=πœę©ß←3#tralala\t', + given='aHR0cDovL3TEmXN0LnBsL2bDs8OzL0Jhci_dP3E9z4DFk8SZwqnDn-KGkDMjdHJhbGFsYQk=', + expected=( + b'http://t\xc4\x99st.pl/f\xc3\xb3\xc3\xb3/Bar/\xdd' + b'?q=\xcf\x80\xc5\x93\xc4\x99\xc2\xa9\xc3\x9f\xe2\x86\x903#tralala\t'), ) # the same but encoded with standard Base64 (not the required URL-safe-Base64) - yield case( - given=b'aHR0cDovL3TEmXN0LnBsL2bDs8OzL0Jhci/dP3E9z4DFk8SZwqnDn+KGkDMjdHJhbGFsYQk=', - expected=FieldValueError, - ) yield case( given=u'aHR0cDovL3TEmXN0LnBsL2bDs8OzL0Jhci/dP3E9z4DFk8SZwqnDn+KGkDMjdHJhbGFsYQk=', expected=FieldValueError, ) - yield case( - given=b'aHR0cDovL3Rlc3QucGw=', - expected=u'http://test.pl', - ) yield case( given=u'aHR0cDovL3Rlc3QucGw=', - expected=u'http://test.pl', + expected=b'http://test.pl', ) # the same with redundant padding yield case( - given=b'aHR0cDovL3Rlc3QucGw==', - expected=u'http://test.pl', + given='aHR0cDovL3Rlc3QucGw==', + expected=b'http://test.pl', ) yield case( given=u'aHR0cDovL3Rlc3QucGw===', - expected=u'http://test.pl', + expected=b'http://test.pl', ) # the same with redundant padding and ignored characters after it yield case( - given=b'aHR0cDovL3Rlc3QucGw===abcdef', - expected=u'http://test.pl', + given='aHR0cDovL3Rlc3QucGw===abcdef', + expected=b'http://test.pl', ) yield case( given=u'aHR0cDovL3Rlc3QucGw=========abcdef', - expected=u'http://test.pl', + expected=b'http://test.pl', ) # the same with redundant padding and illegal characters after it yield case( - given=u'aHR0cDovL3Rlc3QucGw===ąć/'.encode('utf-8'), + given='aHR0cDovL3Rlc3QucGw===ąć/', expected=FieldValueError, ) yield case( @@ -388,101 +268,255 @@ def cases__clean_result_value(self): expected=FieldValueError, ) # the same with missing padding - yield case( - given=b'aHR0cDovL3Rlc3QucGw', - expected=FieldValueError, - ) yield case( given=u'aHR0cDovL3Rlc3QucGw', expected=FieldValueError, ) yield case( - given=(b'aHR0cDovL2V4YW1wbGUubmV0L3NlYXJjaC5waHA' - b'_cT3RgNCw0LfQvdGL0LUr0LDQstGC0L7RgNGLI-KJoMKywrMNCg=='), - expected=(u'http://example.net/search.php?q=разные+авторы#≠²³\r\n'), - ) - yield case( - given=(u'aHR0cDovL2V4YW1wbGUubmV0L3NlYXJjaC5waHA' - u'_cT3RgNCw0LfQvdGL0LUr0LDQstGC0L7RgNGLI-KJoMKywrMNCg=='), - expected=(u'http://example.net/search.php?q=разные+авторы#≠²³\r\n'), + given=( + 'aHR0cDovL2V4YW1wbGUubmV0L3NlYXJjaC5waHA' + '_cT3OtM65zrHPhs6_z4HOtc-EzrnOus-Mz4IhI-KJoMKywrMNCg=='), + expected=as_bytes( + 'http://example.net/search.php?q=διαφορετικός!#≠²³\r\n'), ) # the same with additional %-encoding: yield case( - given=(b'aHR0cDovL2V4YW1wbGUubmV0L3NlYXJjaC5waHA' - b'_cT3RgNCw0LfQvdGL0LUr0LDQstGC0L7RgNGLI-KJoMKywrMNCg%3D%3D'), - expected=(u'http://example.net/search.php?q=разные+авторы#≠²³\r\n'), - ) - yield case( - given=(u'aHR0cDovL2V4YW1wbGUubmV0L3NlYXJjaC5waHA' - u'_cT3RgNCw0LfQvdGL0LUr0LDQstGC0L7RgNGLI-KJoMKywrMNCg%3D%3D'), - expected=(u'http://example.net/search.php?q=разные+авторы#≠²³\r\n'), + given=( + 'aHR0cDovL2V4YW1wbGUubmV0L3NlYXJjaC5waHA' + '_cT3OtM65zrHPhs6_z4HOtc-EzrnOus-Mz4IhI-KJoMKywrMNCg%3D%3D'), + expected=as_bytes( + 'http://example.net/search.php?q=διαφορετικός!#≠²³\r\n'), ) # the same with 2 x additional %-encoding (2nd is overzealous and lowercase-based): yield case( - given=(b'aHR0cDovL2V4YW1wbGUubmV0L3NlYXJjaC5wa%48%41' - b'%5fcT3RgNCw0LfQvdGL0LUr0LDQstGC0L7RgNGLI%2dKJoMKywrMNCg%253D%253D'), - expected=(u'http://example.net/search.php?q=разные+авторы#≠²³\r\n'), - ) - yield case( - given=(u'aHR0cDovL2V4YW1wbGUubmV0L3NlYXJjaC5wa%48%41' - u'%5fcT3RgNCw0LfQvdGL0LUr0LDQstGC0L7RgNGLI%2dKJoMKywrMNCg%253D%253D'), - expected=(u'http://example.net/search.php?q=разные+авторы#≠²³\r\n'), + given=( + 'aHR0cDovL2V4YW1wbGUubmV0L3NlYXJjaC5wa%48%41' + '%5fcT3OtM65zrHPhs6_z4HOtc-EzrnOus-Mz4IhI%2dKJoMKywrMNCg%253D%253D'), + expected=as_bytes( + 'http://example.net/search.php?q=διαφορετικός!#≠²³\r\n'), ) # the same with 3 x additional %-encoding (2nd is overzealous and lowercase-based): yield case( - given=(b'aHR0cDovL2V4YW1wbGUubmV0L3NlYXJjaC5wa%2548%2541' - b'%255fcT3RgNCw0LfQvdGL0LUr0LDQstGC0L7RgNGLI%252dKJoMKywrMNCg%25253D%25253D'), - expected=(u'http://example.net/search.php?q=разные+авторы#≠²³\r\n'), - ) - yield case( - given=(u'aHR0cDovL2V4YW1wbGUubmV0L3NlYXJjaC5wa%2548%2541' - u'%255fcT3RgNCw0LfQvdGL0LUr0LDQstGC0L7RgNGLI%252dKJoMKywrMNCg%25253D%25253D'), - expected=(u'http://example.net/search.php?q=разные+авторы#≠²³\r\n'), - ) - yield case( - given=b'', - expected=u'', + given=( + 'aHR0cDovL2V4YW1wbGUubmV0L3NlYXJjaC5wa%2548%2541' + '%255fcT3OtM65zrHPhs6_z4HOtc-EzrnOus-Mz4IhI%252dKJoMKywrMNCg%25253D%25253D'), + expected=as_bytes( + 'http://example.net/search.php?q=διαφορετικός!#≠²³\r\n'), ) yield case( - given=u'', - expected=u'', - ) - # containing non-UTF-8 characters (-> to low surrogates) - yield case( - given=b'aHR0cHM6Ly9kZN3u', - expected=u'https://dd\udcdd\udcee', + given='', + expected=FieldValueError, ) + # containing non-UTF-8 bytes yield case( - given=u'aHR0cHM6Ly9kZN3u', - expected=u'https://dd\udcdd\udcee', + given='aHR0cHM6Ly9kZN3u', + expected=b'https://dd\xdd\xee', ) # as UTF-8 with low surrogates already encoded yield case( - given=b'aHR0cHM6Ly9kZO2zne2zrg==', - expected=u'https://dd\udcdd\udcee', - ) - yield case( - given=u'aHR0cHM6Ly9kZO2zne2zrg==', - expected=u'https://dd\udcdd\udcee', + given='aHR0cHM6Ly9kZO2zne2zrg==', + expected=as_bytes('https://dd\udcdd\udcee'), ) # the `%` character not being part of %-encoded stuff yield case( - given=b'%AZ', + given='%AZ', expected=FieldValueError, ) yield case( - given=u'aHR0cDovL3Rlc3QucGw=%a', + given='aHR0cDovL3Rlc3QucGw=%a', expected=FieldValueError, ) - # incorrect type - yield case( - given=123, - expected=TypeError, - ) + + def cases__clean_result_value(self): yield case( - given=None, + given='whatever', expected=TypeError, ) + ### TODO later? uncomment and adjust these test cases to the new implementation... + # yield case( + # given=b'http://www.test.pl', + # expected=FieldValueError, + # ) + # yield case( + # given=u'HTtp://www.test.pl/cgi-bin/foo.pl?', + # expected=FieldValueError, + # ) + # yield case( + # given=b'aHRUUDovL3d3dy50ZXN0LnBs', + # expected=u'htTP://www.test.pl', + # ) + # yield case( + # given=u'SFR0cDovL3d3dy50ZXN0LnBsL2NnaS1iaW4vZm9vLnBsPw==', + # expected=u'HTtp://www.test.pl/cgi-bin/foo.pl?', + # ) + # yield case( + # given=b'aHR0cDovL3d3dy50ZXN0LcSHLnBsL2NnaS9iaW4vZm9vLnBsP2RlYnVnPTEmaWQ9MTIz', + # expected=u'http://www.test-ć.pl/cgi/bin/foo.pl?debug=1&id=123', + # ) + # yield case( + # given=u'aHR0cDovL3d3dy50ZXN0LcSHLnBsL2NnaS9iaW4vZm9vLnBsP2RlYnVnPTEmaWQ9MTIz', + # expected=u'http://www.test-ć.pl/cgi/bin/foo.pl?debug=1&id=123', + # ) + # yield case( + # given=(b'aHR0cDovL3d3dy5URVNULcSGLnBsL2NnaS1iaW4vYmFyLnBsP21vZGU9YnJvd3NlJm' + # b'FtcDtkZWJ1Zz0lMjAxMjMmYW1wO2lkPWstJTVE'), + # expected=(u'http://www.TEST-Ć.pl/cgi-bin/bar.pl?mode=browse&' + # u'debug=%20123&id=k-%5D'), + # ) + # yield case( + # given=(u'aHR0cDovL3d3dy5URVNULcSGLnBsL2NnaS1iaW4vYmFyLnBsP21vZGU9YnJvd3NlJm' + # u'FtcDtkZWJ1Zz0lMjAxMjMmYW1wO2lkPWstJTVE'), + # expected=(u'http://www.TEST-Ć.pl/cgi-bin/bar.pl?mode=browse&' + # u'debug=%20123&id=k-%5D'), + # ) + # yield case( + # given=b'aHR0cDovL3TEmXN0LnBsL2bDs8OzL0Jhci_dP3E9z4DFk8SZwqnDn-KGkDMjdHJhbGFsYQk=', + # expected=u'http://tęst.pl/fóó/Bar/\udcdd?q=πœę©ß←3#tralala\t', + # ) + # yield case( + # given=u'aHR0cDovL3TEmXN0LnBsL2bDs8OzL0Jhci_dP3E9z4DFk8SZwqnDn-KGkDMjdHJhbGFsYQk=', + # expected=u'http://tęst.pl/fóó/Bar/\udcdd?q=πœę©ß←3#tralala\t', + # ) + # # the same but encoded with standard Base64 (not the required URL-safe-Base64) + # yield case( + # given=b'aHR0cDovL3TEmXN0LnBsL2bDs8OzL0Jhci/dP3E9z4DFk8SZwqnDn+KGkDMjdHJhbGFsYQk=', + # expected=FieldValueError, + # ) + # yield case( + # given=u'aHR0cDovL3TEmXN0LnBsL2bDs8OzL0Jhci/dP3E9z4DFk8SZwqnDn+KGkDMjdHJhbGFsYQk=', + # expected=FieldValueError, + # ) + # yield case( + # given=b'aHR0cDovL3Rlc3QucGw=', + # expected=u'http://test.pl', + # ) + # yield case( + # given=u'aHR0cDovL3Rlc3QucGw=', + # expected=u'http://test.pl', + # ) + # # the same with redundant padding + # yield case( + # given=b'aHR0cDovL3Rlc3QucGw==', + # expected=u'http://test.pl', + # ) + # yield case( + # given=u'aHR0cDovL3Rlc3QucGw===', + # expected=u'http://test.pl', + # ) + # # the same with redundant padding and ignored characters after it + # yield case( + # given=b'aHR0cDovL3Rlc3QucGw===abcdef', + # expected=u'http://test.pl', + # ) + # yield case( + # given=u'aHR0cDovL3Rlc3QucGw=========abcdef', + # expected=u'http://test.pl', + # ) + # # the same with redundant padding and illegal characters after it + # yield case( + # given=u'aHR0cDovL3Rlc3QucGw===ąć/'.encode('utf-8'), + # expected=FieldValueError, + # ) + # yield case( + # given=u'aHR0cDovL3Rlc3QucGw=========ąć/', + # expected=FieldValueError, + # ) + # # the same with missing padding + # yield case( + # given=b'aHR0cDovL3Rlc3QucGw', + # expected=FieldValueError, + # ) + # yield case( + # given=u'aHR0cDovL3Rlc3QucGw', + # expected=FieldValueError, + # ) + # yield case( + # given=(b'aHR0cDovL2V4YW1wbGUubmV0L3NlYXJjaC5waHA' + # b'_cT3OtM65zrHPhs6_z4HOtc-EzrnOus-Mz4IhI-KJoMKywrMNCg=='), + # expected=(u'http://example.net/search.php?q=διαφορετικός!#≠²³\r\n'), + # ) + # yield case( + # given=(u'aHR0cDovL2V4YW1wbGUubmV0L3NlYXJjaC5waHA' + # u'_cT3OtM65zrHPhs6_z4HOtc-EzrnOus-Mz4IhI-KJoMKywrMNCg=='), + # expected=(u'http://example.net/search.php?q=διαφορετικός!#≠²³\r\n'), + # ) + # # the same with additional %-encoding: + # yield case( + # given=(b'aHR0cDovL2V4YW1wbGUubmV0L3NlYXJjaC5waHA' + # b'_cT3OtM65zrHPhs6_z4HOtc-EzrnOus-Mz4IhI-KJoMKywrMNCg%3D%3D'), + # expected=(u'http://example.net/search.php?q=διαφορετικός!#≠²³\r\n'), + # ) + # yield case( + # given=(u'aHR0cDovL2V4YW1wbGUubmV0L3NlYXJjaC5waHA' + # u'_cT3OtM65zrHPhs6_z4HOtc-EzrnOus-Mz4IhI-KJoMKywrMNCg%3D%3D'), + # expected=(u'http://example.net/search.php?q=διαφορετικός!#≠²³\r\n'), + # ) + # # the same with 2 x additional %-encoding (2nd is overzealous and lowercase-based): + # yield case( + # given=(b'aHR0cDovL2V4YW1wbGUubmV0L3NlYXJjaC5wa%48%41' + # b'%5fcT3OtM65zrHPhs6_z4HOtc-EzrnOus-Mz4IhI%2dKJoMKywrMNCg%253D%253D'), + # expected=(u'http://example.net/search.php?q=διαφορετικός!#≠²³\r\n'), + # ) + # yield case( + # given=(u'aHR0cDovL2V4YW1wbGUubmV0L3NlYXJjaC5wa%48%41' + # u'%5fcT3OtM65zrHPhs6_z4HOtc-EzrnOus-Mz4IhI%2dKJoMKywrMNCg%253D%253D'), + # expected=(u'http://example.net/search.php?q=διαφορετικός!#≠²³\r\n'), + # ) + # # the same with 3 x additional %-encoding (2nd is overzealous and lowercase-based): + # yield case( + # given=(b'aHR0cDovL2V4YW1wbGUubmV0L3NlYXJjaC5wa%2548%2541' + # b'%255fcT3OtM65zrHPhs6_z4HOtc-EzrnOus-Mz4IhI%252dKJoMKywrMNCg%25253D%25253D'), + # expected=(u'http://example.net/search.php?q=διαφορετικός!#≠²³\r\n'), + # ) + # yield case( + # given=(u'aHR0cDovL2V4YW1wbGUubmV0L3NlYXJjaC5wa%2548%2541' + # u'%255fcT3OtM65zrHPhs6_z4HOtc-EzrnOus-Mz4IhI%252dKJoMKywrMNCg%25253D%25253D'), + # expected=(u'http://example.net/search.php?q=διαφορετικός!#≠²³\r\n'), + # ) + # yield case( + # given=b'', + # expected=u'', + # ) + # yield case( + # given=u'', + # expected=u'', + # ) + # # containing non-UTF-8 characters (-> to low surrogates) + # yield case( + # given=b'aHR0cHM6Ly9kZN3u', + # expected=u'https://dd\udcdd\udcee', + # ) + # yield case( + # given=u'aHR0cHM6Ly9kZN3u', + # expected=u'https://dd\udcdd\udcee', + # ) + # # as UTF-8 with low surrogates already encoded + # yield case( + # given=b'aHR0cHM6Ly9kZO2zne2zrg==', + # expected=u'https://dd\udcdd\udcee', + # ) + # yield case( + # given=u'aHR0cHM6Ly9kZO2zne2zrg==', + # expected=u'https://dd\udcdd\udcee', + # ) + # # the `%` character not being part of %-encoded stuff + # yield case( + # given=b'%AZ', + # expected=FieldValueError, + # ) + # yield case( + # given=u'aHR0cDovL3Rlc3QucGw=%a', + # expected=FieldValueError, + # ) + # # incorrect type + # yield case( + # given=123, + # expected=TypeError, + # ) + # yield case( + # given=None, + # expected=TypeError, + # ) # class TestURLsMatchedFieldForN6(FieldTestMixin, unittest.TestCase): diff --git a/N6Lib/n6lib/tests/test_db_events.py b/N6Lib/n6lib/tests/test_db_events.py index 746eb6e..a674164 100644 --- a/N6Lib/n6lib/tests/test_db_events.py +++ b/N6Lib/n6lib/tests/test_db_events.py @@ -1,4 +1,4 @@ -# Copyright (c) 2013-2021 NASK. All rights reserved. +# Copyright (c) 2013-2023 NASK. All rights reserved. import datetime import socket @@ -279,6 +279,10 @@ def test__like_query(self, or_mock, key, mapped_to=None, value=[('10.20.30.41', 24)], min_max_ips=[(169090560, 169090815)], result=sen.or_result), + param( + value=[('0.0.0.123', 24)], + min_max_ips=[(1, 255)], # <- Note: here the minimum IP is 1, not 0 (see: #8861). + result=sen.or_result), param( value=[('10.20.30.441', 24), ('10.20.30.41', 32)], exc_type=socket.error), @@ -383,10 +387,12 @@ def test__modified_query(self, key, cmp_meth_name=None, exc_type=None): def test__to_raw_result_dict__1(self): self.test_init_and_attrs_1() + self.obj.dip = sen.some_other_ip_addr d = self.obj.to_raw_result_dict() self.assertEqual(d, { 'id': sen.event_id, 'ip': sen.some_ip_addr, + 'dip': sen.some_other_ip_addr, 'dport': sen.some_port_number, 'time': datetime.datetime(2014, 3, 31, 23, 7, 42), 'client': ['c1', 'c2'], @@ -394,9 +400,11 @@ def test__to_raw_result_dict__1(self): def test__to_raw_result_dict__2(self): self.test_init_and_attrs_2() + self.obj.dip = '0.0.0.0' # "no IP" placeholder d = self.obj.to_raw_result_dict() self.assertEqual(d, { - # note that ip='0.0.0.0' has been removed + # note that ip='0.0.0.0' and dip='0.0.0.0' have been removed + # (see: `n6lib.const.LACK_OF_IPv4_PLACEHOLDER_AS_STR`...) 'time': datetime.datetime(2014, 3, 31, 23, 7, 42), 'expires': datetime.datetime(2015, 3, 31, 23, 7, 43), ### THIS IS A PROBLEM -- TO BE SOLVED IN #3113: diff --git a/N6Lib/n6lib/tests/test_generate_test_events.py b/N6Lib/n6lib/tests/test_generate_test_events.py index 501c41b..d521ad3 100644 --- a/N6Lib/n6lib/tests/test_generate_test_events.py +++ b/N6Lib/n6lib/tests/test_generate_test_events.py @@ -20,18 +20,19 @@ "time", "url", "fqdn", "address", "proto", "sport", "dport", "dip", "id", "rid", "client", "replaces", "status", "md5", "origin", "sha1", "sha256", "target", "modified", "expires"], - "required_attributes": ["id", "rid", "source", "restriction", "confidence", "category", - "time"], + "required_event_attributes": ["id", "rid", "source", "restriction", "confidence", "category", + "time"], "dip_categories": ["bots", "cnc", "dos-attacker", "scanning", "other"], - "port_values": ["sport", "dport"], - "md5_values": ["id", "rid", "replaces", "md5"], - "possible_cc_codes": ["PL", "US", "DE", "CA", "FR", "UK"], - "possible_domains": ["www.example.com", "example.net"], + "port_attributes": ["sport", "dport"], + "md5_attributes": ["id", "rid", "replaces", "md5"], + "possible_cc_in_address": ["PL", "US", "DE", "CA", "FR", "UK"], + "possible_fqdn": ["www.example.com", "example.net"], "possible_url": ["http://example.com/index.html", "http://www.example.com/home.html"], + "possible_name": ["test event"], "possible_source": ["source.one", "another.source", "yet.another-source"], "possible_restriction": ["public", "need-to-know"], "possible_target": ["Example Ltd.", "Random Co"], - "possible_client": ["Test Client 1", "Test Client 2"], + "possible_client": ["test.client1", "test.client2"], "seconds_max": 180000, "expires_days_max": 8, "random_ips_max": 5, @@ -49,7 +50,7 @@ def setUp(self, mocked_config): event_instance = RandomEvent() self.event = event_instance.event config = event_instance.config - self.required_attrs = config['required_attributes'] + self.required_attrs = config['required_event_attributes'] def test_required_attrs(self): """ @@ -281,7 +282,7 @@ class TestExtraParams(unittest.TestCase): def _get_possible_vals(): with standard_config_patch: random_event_config = RandomEvent().config - config_fqdn_vals = 'possible_domains' + config_fqdn_vals = 'possible_fqdn' config_url_vals = 'possible_url' possible_fqdns = random_event_config.get(config_fqdn_vals) possible_urls = random_event_config.get(config_url_vals) diff --git a/N6Lib/n6lib/tests/test_knowledge_base_helpers.py b/N6Lib/n6lib/tests/test_knowledge_base_helpers.py index 35f173b..7951815 100644 --- a/N6Lib/n6lib/tests/test_knowledge_base_helpers.py +++ b/N6Lib/n6lib/tests/test_knowledge_base_helpers.py @@ -1,4 +1,4 @@ -# Copyright (c) 2021 NASK. All rights reserved. +# Copyright (c) 2021-2023 NASK. All rights reserved. import contextlib import os @@ -7,6 +7,7 @@ import unittest from unittest.mock import patch +import pytest from unittest_expander import ( expand, foreach, @@ -304,6 +305,7 @@ def test_errors_removed_article(self): class TestBuildKnowledgeBaseData(unittest.TestCase): + @pytest.mark.slow def test_build_kb_data_correct_structure(self): with patch( "n6lib.pyramid_commons.knowledge_base_helpers.read_dir_with_subdirs", diff --git a/N6Lib/n6lib/tests/test_record_dict.py b/N6Lib/n6lib/tests/test_record_dict.py index 8ae1eae..a512007 100644 --- a/N6Lib/n6lib/tests/test_record_dict.py +++ b/N6Lib/n6lib/tests/test_record_dict.py @@ -1,4 +1,4 @@ -# Copyright (c) 2013-2021 NASK. All rights reserved. +# Copyright (c) 2013-2023 NASK. All rights reserved. import collections import collections.abc as collections_abc @@ -432,7 +432,7 @@ def test__unicode_surrogate_pass_and_esc_adjuster(self): u'\udced\udca0' # mess converted to surrogates u'\x7f' # proper code point (ascii DEL) u'\ud800' # surrogate '\ud800' (smallest one) - u'\udfff' # surrogate '\udfff' (biggest one) + u'\udfff' # surrogate '\udfff' (biggest one) [note: *not* merged with one above] u'\udcee\udcbf\udcc0' # mess converted to surrogates u'\ue000' # proper code point '\ue000' (bigger than biggest surrogate) u'\udce6' # mess converted to surrogate @@ -628,25 +628,56 @@ def setUp(self): asn=1), ], ) - self.with_url_data1 = dict( + self.with_url_data_1 = dict( self.only_required, - url=u'foo:bar', + url='foo:bar', + # (as set by the code of a *parser*) _url_data=dict( - # in fact it will be transformed to `_url_data_ready`... - # (FIXME: should'n it be tested somewhere else?) - url_orig=u'http://\u0106ma.eXample.COM:80/\udcdd\ud800Ala-ma-kota\U0010ffff\udccc', - url_norm_opts=dict( - transcode1st=True, - epslash=True, - rmzone=True, + orig=( + 'http://\u0106ma.eXample.COM:80/' + '\udcdd\ud800Ala-ma-kota\U0010ffff\udccc'), + norm_options=dict( + merge_surrogate_pairs=True, + empty_path_slash=True, + remove_ipv6_zone=True, + ), + ), + ) + self.with_url_data_2 = dict( + self.only_required, + url='foo:bar', + # (as set by the code of a *parser*) + _url_data=dict( + orig=(bytearray( + b'http://\xc4\x86ma.eXample.COM:80/' + b'\xed\xb3\x9d\xed\xa0\x80Ala-ma-kota\xf4\x8f\xbf\xbf\xcc')), + norm_options=dict( + merge_surrogate_pairs=True, + remove_ipv6_zone=True, + ), + ), + ) + self.with_url_data_3 = dict( + self.only_required, + url='foo:bar', + # (as processed at later stages than the *parser* stage) + _url_data=dict( + orig_b64=( + 'aHR0cDovL8SGbWEuZVhhbXBsZS5DT006OD' + 'Av7bOd7aCAQWxhLW1hLWtvdGH0j7-_7bOM'), + norm_options=dict( + unicode_str=True, + merge_surrogate_pairs=False, ), ), ) - self.with_url_data2 = dict( + self.with_url_data_ready = dict( # *LEGACY*, to be removed... self.only_required, - url=u'foo:bar', + url='foo:bar', _url_data_ready=dict( - url_orig='aHR0cDovL8SGbWEuZVhhbXBsZS5DT006ODAv7bOd7aCAQWxhLW1hLWtvdGH0j7-_7bOM', + url_orig=( + 'aHR0cDovL8SGbWEuZVhhbXBsZS5DT006OD' + 'Av7bOd7aCAQWxhLW1hLWtvdGH0j7-_7bOM'), url_norm_opts=dict( transcode1st=True, epslash=True, @@ -848,33 +879,57 @@ def test__iter_db_items(self): cc='PL', asn=1), ], - with_url_data1=[ + with_url_data_1=[ dict( self.only_required, - url=u'SY:http://\u0106ma.example.com/\ufffdAla-ma-kota\ufffd', + url='SY:http://\u0106ma.example.com/\ufffdAla-ma-kota\ufffd', custom=dict( url_data=dict( - url_orig=( - 'aHR0cDovL8SGbWEuZVhhbXBsZS5DT006ODAv7bOd7aCAQWxhLW1hLWtvdGH0j7' - '-_7bOM'), - url_norm_opts=dict( - transcode1st=True, - epslash=True, - rmzone=True, - ), + orig_b64=( + 'aHR0cDovL8SGbWEuZVhhbXBsZS5DT006OD' + 'Av7bOd7aCAQWxhLW1hLWtvdGH0j7-_7bOM'), + norm_brief='emru', # ('u' added automatically because orig was `str`) + ), + ), + ), + ], + with_url_data_2=[ + dict( + self.only_required, + url='SY:http://\u0106ma.example.com/\ufffdAla-ma-kota\ufffd', + custom=dict( + url_data=dict( + orig_b64=( + 'aHR0cDovL8SGbWEuZVhhbXBsZS5DT006OD' + 'Av7bOd7aCAQWxhLW1hLWtvdGH0j7-_zA=='), + norm_brief='mr', # ('u' not added because orig was binary) ), ), ), ], - with_url_data2=[ + with_url_data_3=[ dict( self.only_required, - url=u'SY:http://\u0106ma.example.com/\ufffdAla-ma-kota\ufffd', + url='SY:http://\u0106ma.example.com/\ufffdAla-ma-kota\ufffd', + custom=dict( + url_data=dict( + orig_b64=( + 'aHR0cDovL8SGbWEuZVhhbXBsZS5DT006OD' + 'Av7bOd7aCAQWxhLW1hLWtvdGH0j7-_7bOM'), + norm_brief='u', + ), + ), + ), + ], + with_url_data_ready=[ # *LEGACY*, to be removed... + dict( + self.only_required, + url='SY:http://\u0106ma.example.com/\ufffdAla-ma-kota\ufffd', custom=dict( url_data=dict( url_orig=( - 'aHR0cDovL8SGbWEuZVhhbXBsZS5DT006ODAv7bOd7aCAQWxhLW1hLWtvdGH0j7' - '-_7bOM'), + 'aHR0cDovL8SGbWEuZVhhbXBsZS5DT006OD' + 'Av7bOd7aCAQWxhLW1hLWtvdGH0j7-_7bOM'), url_norm_opts=dict( transcode1st=True, epslash=True, @@ -1226,6 +1281,10 @@ def test__setitem__address(self): [{'cc': 'PL', 'asn': 123}], {'ip': '100.101.102.103', 'cc': 'PL', 'asn': 123, 'xxx': 'spam'}, [{'ip': '100.101.102.1031', 'cc': 'PL', 'asn': 123}], + {'ip': '0.0.0.0', 'cc': 'PL', 'asn': 123}, # (disallowed: "no IP" placeholder) + [{'ip': '0.00.000.0', 'cc': 'PL', 'asn': 123}], # (disallowed: "no IP" placeholder) + {'ip': '00.00.00.00', 'cc': 'PL', 'asn': 123}, # (disallowed: "no IP" placeholder) + [{'ip': 0, 'cc': 'PL', 'asn': 123}], # (disallowed: "no IP" placeholder) {'ip': '1684366951', 'cc': 'PL', 'asn': 123}, [{'ip': None, 'cc': 'PL', 'asn': 123}], [ @@ -1263,7 +1322,7 @@ def test__setitem__address(self): def test__setitem__dip(self): self._test_setitem_valid('dip', ( - S(u'0.0.0.0', (0, u'0.0.0.0')), + S(u'0.0.0.1', (1, u'0.0.0.1')), S(u'0.0.0.10', (10, u'0.0.0.10')), S(u'100.101.102.103', ( u'100.101.102.103', @@ -1277,6 +1336,9 @@ def test__setitem__dip(self): '1684366951', '100.101.102.103.100', '100.101.102.1030', + '0.0.0.0', # (disallowed: "no IP" placeholder) + '00.0.0.00', # (disallowed: "no IP" placeholder) + 0, # (disallowed: "no IP" placeholder) u'100.101.102', 168436695123456789, ['100.101.102.103'], @@ -1349,6 +1411,261 @@ def test__setitem__url__skipping_invalid(self): rd['url'] = invalid self.assertNotIn('url', rd) + def test__setitem___url_data(self): + self._test_setitem_valid('_url_data', ( + S( + dict( + orig_b64=( + 'aHR0cDovL8SGbWEuZVhhbXBsZS5DT006OD' + 'Av7bOd7aCAQWxhLW1hLWtvdGH0j7-_7bOM'), + norm_options=dict( + unicode_str=True, # (<- always `True` if given `orig` was a `str`) + merge_surrogate_pairs=False, + empty_path_slash=True, + ), + ), + ( + # (as set by the code of a *parser*) + dict( + orig=( + 'http://\u0106ma.eXample.COM:80/' + '\udcdd\ud800Ala-ma-kota\U0010ffff\udccc'), + norm_options=dict( + # (no need to provide `unicode_str` -- inferred from type of `orig`) + merge_surrogate_pairs=False, + empty_path_slash=True, + ), + ), + dict( + orig=( + 'http://\u0106ma.eXample.COM:80/' + '\udcdd\ud800Ala-ma-kota\U0010ffff\udccc'), + norm_options=dict( + unicode_str=True, + merge_surrogate_pairs=False, + empty_path_slash=True, + ), + ), + # (as processed at later stages than the *parser* stage) + dict( + orig_b64=( + 'aHR0cDovL8SGbWEuZVhhbXBsZS5DT006OD' + 'Av7bOd7aCAQWxhLW1hLWtvdGH0j7-_7bOM'), + norm_options=CIDict( # (<- non-`dict` mapping is OK) + unicode_str=True, # (<- here needed because of no `orig`) + merge_surrogate_pairs=False, + empty_path_slash=True, + ), + ), + ), + ), + S( + dict( + orig_b64='aHR0cDovL2Zvby5iYXIv', + norm_options=dict( + unicode_str=False, # (<- always `False` if given `orig` was binary) + ), + ), + ( + # (as set by the code of a *parser*) + dict( + orig=b'http://foo.bar/', + norm_options=dict(), # (<- empty `norm_options` is perfectly OK) + # (^ no need to provide `unicode_str` -- inferred from type of `orig`) + ), + dict( + orig=bytearray(b'http://foo.bar/'), + norm_options=dict( + unicode_str=False, + ), + ), + # (as processed at later stages than the *parser* stage) + dict( + orig_b64='aHR0cDovL2Zvby5iYXIv', + norm_options=dict( + unicode_str=False, # (<- here needed because of no `orig`) + ), + ), + ), + ), + S( + dict( + orig_b64=( + 'aHR0cMSGbWEuZVhhbXBsZS5DT006ODAv' + '7bOd7aCAQWxhLW1hLWtvdGH0j7-_zA=='), + norm_options=dict( + unicode_str=False, # (<- always `False` if given `orig` was binary) + empty_path_slash=True, + ), + ), + ( + # (as set by the code of a *parser*) + CIDict( # (<- non-`dict` mapping is OK) + orig=(bytearray( + b'http\xc4\x86ma.eXample.COM:80/' + b'\xed\xb3\x9d\xed\xa0\x80Ala-ma-kota\xf4\x8f\xbf\xbf\xcc')), + norm_options=dict( + # (no need to provide `unicode_str` -- inferred from type of `orig`) + empty_path_slash=True, + ), + ), + dict( + orig=( + b'http\xc4\x86ma.eXample.COM:80/' + b'\xed\xb3\x9d\xed\xa0\x80Ala-ma-kota\xf4\x8f\xbf\xbf\xcc'), + norm_options=dict( + unicode_str=False, + empty_path_slash=True, + ), + ), + # (as processed at later stages than the *parser* stage) + dict( + orig_b64=( + 'aHR0cMSGbWEuZVhhbXBsZS5DT006ODAv' + '7bOd7aCAQWxhLW1hLWtvdGH0j7-_zA=='), + norm_options=dict( + unicode_str=False, # (<- here needed because of no `orig`) + empty_path_slash=True, + ), + ), + ), + ), + S( + dict( + orig_b64=( + 'aHR0cDovL8SGbWEuZVhhbXBsZS5DT006OD' + 'Av7bOd7aCAQWxhLW1hLWtvdGH0j7-_zA=='), + norm_options=dict( + unicode_str=False, + remove_ipv6_zone=False, + ), + + # Note: if `orig_b64` (rather than `orig`) is given + # (that typically happens at later processing stages + # than the *parser* stage) then any extra items are + # passed through without error -- to ease transition + # if new keys are supported in the future... + extra_item='blah-blah-blah...', + and_another_one=[{b'!'}], + ), + dict( + orig_b64=( + 'aHR0cDovL8SGbWEuZVhhbXBsZS5DT006OD' + 'Av7bOd7aCAQWxhLW1hLWtvdGH0j7-_zA=='), + norm_options=dict( + unicode_str=False, # (<- here needed because of no `orig`) + remove_ipv6_zone=False, + ), + extra_item='blah-blah-blah...', + and_another_one=[{b'!'}], + ), + ), + )) + self._test_setitem_adjuster_error('_url_data', ( + dict(), # missing keys... + dict(orig=b'http://foo.bar/'), # missing key: `norm_options` + dict(orig_b64='aHR0cDovL2Zvby5iYXIv'), # missing key: `norm_options` + dict(norm_options=dict()), # missing key: `orig` or `orig_b64` + dict( + orig=b'http://foo.bar/', # *either* `orig` *or* `orig_b64` + orig_b64='aHR0cDovL2Zvby5iYXIv', # should be present, but *not both* + norm_options=dict(), + ), + dict( + orig=b'http://foo.bar/', + norm_options=dict(), + extra_item=42, # <- extra (unknown) key when `orig` given + ), + dict( + orig='http://foo.bar/', + norm_options=dict( + unicode_str=False, # <- `False` is wrong when `orig` is a `str` + ), + ), + dict( + orig=b'http://foo.bar/', + norm_options=dict( + unicode_str=True, # <- `True` is wrong when `orig` is binary + ), + ), + dict( + orig=bytearray(b'http://foo.bar/'), + norm_options=dict( + unicode_str=True, # <- `True` is wrong when `orig` is binary + ), + ), + dict( + orig_b64='aHR0+DovL2Zvby/iYXIv', # <- non-URL-safe-Base64-variant character(s) + norm_options=dict(), + ), + dict( + orig='', # <- empty + norm_options=dict(), + ), + dict( + orig=b'', # <- empty + norm_options=dict(), + ), + dict( + orig=bytearray(b''), # <- empty + norm_options=dict(), + ), + dict( + orig_b64='', # <- empty + norm_options=dict(), + ), + dict( + orig_b64=('0' * (2**17 + 4)), # <- too long + norm_options=dict(), + ), + dict( + orig=(b'x' * (2**19)), # <- too long + norm_options=dict(), + ), + dict( + orig=bytearray(b'x' * (2**19)), # <- too long + norm_options=dict(), + ), + dict( + orig=('x' * (2**19)), # <- too long + norm_options=dict(), + ), + dict( + orig={b'http://foo.bar/'}, # <- wrong type (`set` instead of: `str`, + norm_options=dict(), # `bytes` or `bytearray`) + ), + dict( + orig_b64=['aHR0cDovL2Zvby5iYXIv'], # <- wrong type (`list` instead of `str`) + norm_options=dict(), + ), + dict( + orig_b64=b'aHR0cDovL2Zvby5iYXIv', # <- wrong type (`bytes` instead of `str`) + norm_options=dict(), + ), + dict( + orig=b'http://foo.bar/', + norm_options=[], # <- wrong type (`list` instead of `dict`) + ), + dict( + orig_b64='aHR0cDovL2Zvby5iYXIv', + norm_options='d', # <- wrong type (`str` instead of `dict`) + ), + dict( + orig=b'http://foo.bar/', + norm_options=dict( + remove_ipv6_zone=1, # <- wrong type (`int` instead of `bool`) + ), + ), + # wrong types of the whole value (should be `dict`): + b'http://foo.bar/', + 'aHR0cDovL2Zvby5iYXIv', + [('orig', b'http://foo.bar/'), ('norm_options', dict())], + {'orig_b64', 'norm_options'}, + 1684366951, + datetime.datetime.now(), + None, + )) + def test__setitem__fqdn__valid_or_too_long(self): self._test_setitem_valid('fqdn', ( S(u'www.example.com', ( @@ -1589,6 +1906,8 @@ def test__setitem__enriched(self): # wrong keys/values (['url'], {'1.2.3.4': ['asn', 'cc']}), (['fqdn'], {'1.2.3.444': ['asn', 'cc']}), + (['fqdn'], {'0.0.0.0': ['asn', 'cc']}), # (disallowed: "no IP" placeholder) + (['fqdn'], {'0.0.0.000': ['asn', 'cc']}), # (disallowed: "no IP" placeholder) (['fqdn'], {'1.2.3.4': ['url', 'cc']}), )) diff --git a/N6Lib/n6lib/tests/test_ripe_api_client.py b/N6Lib/n6lib/tests/test_ripe_api_client.py index 4684f1e..d34a531 100644 --- a/N6Lib/n6lib/tests/test_ripe_api_client.py +++ b/N6Lib/n6lib/tests/test_ripe_api_client.py @@ -1,4 +1,4 @@ -# Copyright (c) 2022 NASK. All rights reserved. +# Copyright (c) 2022-2023 NASK. All rights reserved. import unittest from unittest.mock import ( @@ -45,11 +45,6 @@ 'key': 'remarks', 'value': 'Example-Cloud_Network_1', }, - { - 'details_link': None, - 'key': 'org', - 'value': 'ORG-EXAMPLE-RIPE', - }, { 'details_link': None, 'key': 'remarks', @@ -140,11 +135,6 @@ 'key': 'as-name', 'value': 'MAGICRETAIL', }, - { - 'details_link': None, - 'key': 'org', - 'value': 'ORG-MRS11-RIPE', - }, { 'details_link': None, 'key': 'import', @@ -241,6 +231,108 @@ 'version': '4.1', } +# The same as `DEFAULT_ASN__ADMINC_TECHC_ROLE_EXAMPLE` +# but this time we have key 'org' +DEFAULT_ASN__ROLE__ORG_EXAMPLE = { + 'build_version': 'live.2022.2.1.69', + 'cached': False, + 'data': { + 'authorities': ['ripe'], + 'irr_records': [], + 'query_time': '2000-01-01T00:00:00', + 'records': [[ + { + 'details_link': 'https://stat.ripe.net/AS11111', + 'key': 'aut-num', + 'value': '11111', + }, + { + 'details_link': None, + 'key': 'as-name', + 'value': 'Example-Cloud_1', + }, + { + 'details_link': None, + 'key': 'remarks', + 'value': 'Example-Cloud_Network_1', + }, + { + 'details_link': None, + 'key': 'org', + 'value': 'ORG-EXAMPLE-RIPE', + }, + { + 'details_link': None, + 'key': 'remarks', + 'value': 'Example Company details: http://as11111.example_domain.com', + }, + { + 'details_link': None, + 'key': 'import', + 'value': 'from AS2222222222 accept ANY', + }, + { + 'details_link': None, + 'key': 'export', + 'value': 'to AS62222222222 action community .= { 6777:6777 }; ' + 'announce AS-ASSA-EUUE', + }, + { + 'details_link': 'https://rest.db.ripe.net/ripe/person-role/ZXCV-EXMPL-RIPE', + 'key': 'admin-c', + 'value': 'ZXCV-EXMPL-RIPE', + }, + { + 'details_link': 'https://rest.db.ripe.net/ripe/person-role/ASDF-EXMPL-RIPE', + 'key': 'tech-c', + 'value': 'ASDF-EXMPL-RIPE', + }, + { + 'details_link': None, + 'key': 'status', + 'value': 'ASSIGNED', + }, + { + 'details_link': 'https://rest.db.ripe.net/ripe/mntner/RIPE-BEG-END-MNT', + 'key': 'mnt-by', + 'value': 'RIPE-BEG-END-MNT', + }, + { + 'details_link': 'https://rest.db.ripe.net/ripe/mntner/EXMP-CODE', + 'key': 'mnt-by', + 'value': 'EXMP-CODE', + }, + { + 'details_link': None, + 'key': 'created', + 'value': '2000-01-01T00:00:00Z', + }, + { + 'details_link': None, + 'key': 'last-modified', + 'value': '2000-01-01T00:00:00Z', + }, + { + 'details_link': None, + 'key': 'source', + 'value': 'RIPE', + }, + ]], + 'resource': '11111', + }, + 'data_call_name': 'whois', + 'data_call_status': 'supported - connecting to ursa', + 'messages': [], + 'process_time': 50, + 'query_id': '20220202082341-2a1a5670-d197-4530-bd32-491d312deffb', + 'see_also': [], + 'server_id': 'app134', + 'status': 'ok', + 'status_code': 200, + 'time': '2000-01-01T00:00:00Z', + 'version': '4.1', +} + DEFAULT_ASN__ABUSE_CONTACT_REQUEST = { 'build_version': 'live.2022.2.1.69', 'cached': True, @@ -286,6 +378,104 @@ 'key': 'as-name', 'value': 'Example-Cloud_1', }, + { + 'details_link': None, + 'key': 'remarks', + 'value': 'Example-Cloud_Network_1', + }, + { + 'details_link': None, + 'key': 'remarks', + 'value': 'Example Company details: http://1.1.1.1/24.example_domain.com', + }, + { + 'details_link': None, + 'key': 'import', + 'value': 'from AS2222222222 accept ANY', + }, + { + 'details_link': None, + 'key': 'export', + 'value': 'to AS62222222222 action community .= { 6777:6777 }; ' + 'announce AS-ASSA-EUUE', + }, + { + 'details_link': 'https://rest.db.ripe.net/ripe/person-role/AAAA-EXMPL-RIPE', + 'key': 'admin-c', + 'value': 'AAAA-EXMPL-RIPE', + }, + { + 'details_link': 'https://rest.db.ripe.net/ripe/person-role/BBBB-EXMPL-RIPE', + 'key': 'tech-c', + 'value': 'BBBB-EXMPL-RIPE', + }, + { + 'details_link': None, + 'key': 'status', + 'value': 'ASSIGNED', + }, + { + 'details_link': 'https://rest.db.ripe.net/ripe/mntner/RIPE-BEG-END-MNT', + 'key': 'mnt-by', + 'value': 'RIPE-BEG-END-MNT', + }, + { + 'details_link': 'https://rest.db.ripe.net/ripe/mntner/EXMP-CODE', + 'key': 'mnt-by', + 'value': 'EXMP-CODE', + }, + { + 'details_link': None, + 'key': 'created', + 'value': '2000-01-01T00:00:00Z', + }, + { + 'details_link': None, + 'key': 'last-modified', + 'value': '2000-01-01T00:00:00Z', + }, + { + 'details_link': None, + 'key': 'source', + 'value': 'RIPE', + }, + ]], + 'resource': '11111', + }, + 'data_call_name': 'whois', + 'data_call_status': 'supported - connecting to ursa', + 'messages': [], + 'process_time': 50, + 'query_id': '20220202082341-2a1a5670-d197-4530-bd32-491d312deffb', + 'see_also': [], + 'server_id': 'app134', + 'status': 'ok', + 'status_code': 200, + 'time': '2000-01-01T00:00:00Z', + 'version': '4.1', +} + +# The same as `DEFAULT_IP_NETWORK__ADMINC_TECHC_ROLE_EXAMPLE` +# but this time we have key 'org' +DEFAULT_IP_NETWORK__ROLE__ORG_EXAMPLE = { + 'build_version': 'live.2022.2.1.69', + 'cached': False, + 'data': { + 'authorities': ['ripe'], + 'irr_records': [], + 'query_time': '2000-01-01T00:00:00', + 'records': [[ + { + 'details_link': 'https://stat.ripe.net/1.1.1.1/24', + 'key': 'aut-num', + 'value': '11111', + }, + { + 'details_link': None, + 'key': 'as-name', + 'value': 'Example-Cloud_1', + }, + { 'details_link': None, 'key': 'remarks', @@ -386,11 +576,6 @@ 'key': 'as-name', 'value': 'MAGICRETAIL', }, - { - 'details_link': None, - 'key': 'org', - 'value': 'ORG-MRS11-RIPE', - }, { 'details_link': None, 'key': 'import', @@ -512,12 +697,132 @@ # -# ASN - case #1 +# ASN - case #1, #2 + +DEFAULT__ORG_URL_1 = 'https://rest.db.ripe.net/ripe/organisation/ORG-EXAMPLE-RIPE.json?unfiltered' +DEFAULT_ASN__PERSON_URL_1 = 'https://rest.db.ripe.net/ripe/role/ASDF-EXMPL-RIPE.json?unfiltered' +DEFAULT_ASN__ROLE_URL_1 = 'https://rest.db.ripe.net/ripe/role/ZXCV-EXMPL-RIPE.json?unfiltered' +DEFAULT_ASN__PERSON_URL_2 = 'https://rest.db.ripe.net/ripe/person/ASDF-EXMPL-RIPE.json?unfiltered' +DEFAULT_ASN__ROLE_URL_2 = 'https://rest.db.ripe.net/ripe/person/ZXCV-EXMPL-RIPE.json?unfiltered' +DEFAULT_ASN__ORG_DETAILS_REQUEST_1 = { + 'objects': { + 'object': [ + {'attributes': { + 'attribute': [ + { + 'name': 'organisation', + 'value': 'ORG-EXAMPLE-RIPE', + }, + { + 'name': 'org-name', + 'value': 'Example-Org-Name', + }, + { + 'name': 'country', + 'value': 'EX', + }, + { + 'name': 'org-type', + 'value': 'EXMPL', + }, + { + 'name': 'address', + 'value': 'Some Imagined Street', + }, + { + 'name': 'address', + 'value': 'Milano', + }, + { + 'name': 'address', + 'value': 'ITALY', + }, + { + 'name': 'phone', + 'value': '+11 11 11111111', + }, + { + 'name': 'fax-no', + 'value': '+11 11 11111112', + }, + { + 'name': 'e-mail', + 'value': 'example@example-org-domain.ex', + }, + { + 'link': { + 'href': 'https://rest.db.ripe.net/ripe/mntner/AS11111-MNT', + 'type': 'locator', + }, + 'name': 'mnt-ref', + 'referenced-type': 'mntner', + 'value': 'AS12345-MNT', + }, + { + 'link': { + 'href': 'https://rest.db.ripe.net/ripe/person/EX11111-RIPE', + 'type': 'locator', + }, + 'name': 'admin-c', + 'referenced-type': 'person', + 'value': 'EX11111-RIPE', + }, + { + 'link': { + 'href': 'https://rest.db.ripe.net/ripe/mntner/RIPE-EMP-EX-MNT', + 'type': 'locator', + }, + 'name': 'mnt-by', + 'referenced-type': 'mntner', + 'value': 'RIPE-EMP-EX-MNT', + }, + { + 'link': { + 'href': 'https://rest.db.ripe.net/ripe/mntner/AS11111-MNT', + 'type': 'locator', + }, + 'name': 'mnt-by', + 'referenced-type': 'mntner', + 'value': 'AS11111-MNT-MNT', + }, + { + 'name': 'created', + 'value': '2000-01-01T00:00:00Z', + }, + { + 'name': 'last-modified', + 'value': '2022-01-01T00:00:00Z', + }, + { + 'name': 'source', + 'value': 'RIPE', + } + ] + }, + 'primary-key': { + 'attribute': [ + { + 'name': 'nic-hdl', + 'value': 'AAAA-EXMPL-RIPE', + } + ], + }, + 'source': {'id': 'ripe'}, + 'type': 'role', + } + ], + }, + 'terms-and-conditions': { + 'href': 'http://www.ripe.net/db/support/db-terms-conditions.pdf', + 'type': 'locator', + }, + 'version': { + 'commit-id': '111a11a', + 'timestamp': '2000-01-01T00:00:00Z', + 'version': '1.102.2', + }, +} -DEFAULT_ASN__PERSON_URL_1 = 'https://rest.db.ripe.net/ripe/role/ASDF-EXMPL-RIPE.json' -DEFAULT_ASN__ROLE_URL_1 = 'https://rest.db.ripe.net/ripe/role/ZXCV-EXMPL-RIPE.json' -DEFAULT_ASN__PERSON_URL_2 = 'https://rest.db.ripe.net/ripe/person/ASDF-EXMPL-RIPE.json' -DEFAULT_ASN__ROLE_URL_2 = 'https://rest.db.ripe.net/ripe/person/ZXCV-EXMPL-RIPE.json' DEFAULT_ASN__ROLE_DETAILS_REQUEST_1 = { 'objects': { 'object': [{ @@ -726,12 +1031,12 @@ # -# ASN - case #2 +# ASN - case #3 DEFAULT_ASN__ABUSE_CONTACT_URL_1 = ( 'https://stat.ripe.net/data/abuse-contact-finder/data.json?resource=as22222') -DEFAULT_ASN__PERSON_URL_3 = 'https://rest.db.ripe.net/ripe/person/XXXX_PERSON-RIPE.json' -DEFAULT_ASN__ROLE_URL_3 = 'https://rest.db.ripe.net/ripe/role/XXXX_PERSON-RIPE.json' +DEFAULT_ASN__PERSON_URL_3 = 'https://rest.db.ripe.net/ripe/person/XXXX_PERSON-RIPE.json?unfiltered' +DEFAULT_ASN__ROLE_URL_3 = 'https://rest.db.ripe.net/ripe/role/XXXX_PERSON-RIPE.json?unfiltered' DEFAULT_ASN__PERSON_DETAILS_REQUEST_1 = { 'objects': { 'object': [{ @@ -811,12 +1116,134 @@ # -# IP networks input - case #1 +# IP networks input - case #1, #2 + +# Note, that we use `DEFAULT__ORG_URL_1` which is the same for ASN/IP Networks, but just to show +# that it is being used here... +# DEFAULT__ORG_URL_1 = 'https://rest.db.ripe.net/ripe/organisation/ORG-EXAMPLE-RIPE.json?unfiltered' +DEFAULT_IP_NETWORK__PERSON_URL_1 = 'https://rest.db.ripe.net/ripe/role/AAAA-EXMPL-RIPE.json?unfiltered' +DEFAULT_IP_NETWORK__ROLE_URL_1 = 'https://rest.db.ripe.net/ripe/role/BBBB-EXMPL-RIPE.json?unfiltered' +DEFAULT_IP_NETWORK__PERSON_URL_2 = 'https://rest.db.ripe.net/ripe/person/AAAA-EXMPL-RIPE.json?unfiltered' +DEFAULT_IP_NETWORK__ROLE_URL_2 = 'https://rest.db.ripe.net/ripe/person/BBBB-EXMPL-RIPE.json?unfiltered' +DEFAULT_IP_NETWORK__ORG_DETAILS_REQUEST_1 = { + 'objects': { + 'object': [ + {'attributes': { + 'attribute': [ + { + 'name': 'organisation', + 'value': 'ORG-EXAMPLE-RIPE', + }, + { + 'name': 'org-name', + 'value': 'Example-Org-Name', + }, + { + 'name': 'country', + 'value': 'EX', + }, + { + 'name': 'org-type', + 'value': 'EXMPL', + }, + { + 'name': 'address', + 'value': 'Some Imagined Street', + }, + { + 'name': 'address', + 'value': 'Milano', + }, + { + 'name': 'address', + 'value': 'ITALY', + }, + { + 'name': 'phone', + 'value': '+11 11 11111111', + }, + { + 'name': 'fax-no', + 'value': '+11 11 11111112', + }, + { + 'name': 'e-mail', + 'value': 'example@example-org-domain.ex', + }, + { + 'link': { + 'href': 'https://rest.db.ripe.net/ripe/mntner/1.1.1.1/24-MNT', + 'type': 'locator', + }, + 'name': 'mnt-ref', + 'referenced-type': 'mntner', + 'value': 'AS12345-MNT', + }, + { + 'link': { + 'href': 'https://rest.db.ripe.net/ripe/person/EX11111-RIPE', + 'type': 'locator', + }, + 'name': 'admin-c', + 'referenced-type': 'person', + 'value': 'EX11111-RIPE', + }, + { + 'link': { + 'href': 'https://rest.db.ripe.net/ripe/mntner/RIPE-EMP-EX-MNT', + 'type': 'locator', + }, + 'name': 'mnt-by', + 'referenced-type': 'mntner', + 'value': 'RIPE-EMP-EX-MNT', + }, + { + 'link': { + 'href': 'https://rest.db.ripe.net/ripe/mntner/1.1.1.1/24-MNT', + 'type': 'locator', + }, + 'name': 'mnt-by', + 'referenced-type': 'mntner', + 'value': '1.1.1.1/24-MNT-MNT', + }, + { + 'name': 'created', + 'value': '2000-01-01T00:00:00Z', + }, + { + 'name': 'last-modified', + 'value': '2022-01-01T00:00:00Z', + }, + { + 'name': 'source', + 'value': 'RIPE', + } + ] + }, + 'primary-key': { + 'attribute': [ + { + 'name': 'nic-hdl', + 'value': 'AAAA-EXMPL-RIPE', + } + ], + }, + 'source': {'id': 'ripe'}, + 'type': 'role', + } + ], + }, + 'terms-and-conditions': { + 'href': 'http://www.ripe.net/db/support/db-terms-conditions.pdf', + 'type': 'locator', + }, + 'version': { + 'commit-id': '111a11a', + 'timestamp': '2000-01-01T00:00:00Z', + 'version': '1.102.2', + }, +} -DEFAULT_IP_NETWORK__PERSON_URL_1 = 'https://rest.db.ripe.net/ripe/role/AAAA-EXMPL-RIPE.json' -DEFAULT_IP_NETWORK__ROLE_URL_1 = 'https://rest.db.ripe.net/ripe/role/BBBB-EXMPL-RIPE.json' -DEFAULT_IP_NETWORK__PERSON_URL_2 = 'https://rest.db.ripe.net/ripe/person/AAAA-EXMPL-RIPE.json' -DEFAULT_IP_NETWORK__ROLE_URL_2 = 'https://rest.db.ripe.net/ripe/person/BBBB-EXMPL-RIPE.json' DEFAULT_IP_NETWORK__ROLE_DETAILS_REQUEST_1 = { 'objects': { 'object': [{ @@ -1025,12 +1452,12 @@ # -# IP networks input - case #2 +# IP networks input - case #3 DEFAULT_IP_NETWORK__ABUSE_CONTACT_URL_1 = ( 'https://stat.ripe.net/data/abuse-contact-finder/data.json?resource=1.1.1.1/24') -DEFAULT_IP_NETWORK__PERSON_URL_3 = 'https://rest.db.ripe.net/ripe/person/XXXX_PERSON-RIPE.json' -DEFAULT_IP_NETWORK__ROLE_URL_3 = 'https://rest.db.ripe.net/ripe/role/XXXX_PERSON-RIPE.json' +DEFAULT_IP_NETWORK__PERSON_URL_3 = 'https://rest.db.ripe.net/ripe/person/XXXX_PERSON-RIPE.json?unfiltered' +DEFAULT_IP_NETWORK__ROLE_URL_3 = 'https://rest.db.ripe.net/ripe/role/XXXX_PERSON-RIPE.json?unfiltered' DEFAULT_IP_NETWORK__PERSON_DETAILS_REQUEST_1 = { 'objects': { 'object': [{ @@ -1129,7 +1556,7 @@ class TestRipeApiClient(unittest.TestCase): DEFAULT_ASN__ROLE_DETAILS_REQUEST_3, DEFAULT_ASN__ROLE_DETAILS_REQUEST_4, ], - asn_and_ip_network_to_unique_details_urls={ + marker_to_details_urls={ 'ASN': { '11111': { DEFAULT_ASN__ROLE_URL_1, @@ -1142,7 +1569,7 @@ class TestRipeApiClient(unittest.TestCase): }, expected_attrs=[ [ - ('Data for', '11111'), + ('Unfiltered data for', '11111'), ('Abuse Contact Emails', ['example_contact_email@example_domain.com']), ('role', 'XXXX_ROLE'), @@ -1177,6 +1604,86 @@ class TestRipeApiClient(unittest.TestCase): ], ), + # ASN Admin-C/Tech-C Role Example -- with 'org' (organisation) key + param( + asn_seq=['11111'], + perform_request_mocked_responses=[ + DEFAULT_ASN__ROLE__ORG_EXAMPLE, + DEFAULT_ASN__ABUSE_CONTACT_REQUEST, + DEFAULT_ASN__ORG_DETAILS_REQUEST_1, + DEFAULT_ASN__ROLE_DETAILS_REQUEST_1, + DEFAULT_ASN__ROLE_DETAILS_REQUEST_2, + DEFAULT_ASN__ROLE_DETAILS_REQUEST_3, + DEFAULT_ASN__ROLE_DETAILS_REQUEST_4, + ], + marker_to_details_urls={ + 'ASN': { + '11111': { + DEFAULT__ORG_URL_1, + DEFAULT_ASN__ROLE_URL_1, + DEFAULT_ASN__ROLE_URL_2, + DEFAULT_ASN__PERSON_URL_1, + DEFAULT_ASN__PERSON_URL_2, + }, + }, + 'IP Network': {}, + }, + expected_attrs=[ + [ + ('Unfiltered data for', '11111'), + ('Abuse Contact Emails', + ['example_contact_email@example_domain.com']), + ('organisation', 'ORG-EXAMPLE-RIPE'), + ('org-name', 'Example-Org-Name'), + ('country', 'EX'), + ('org-type', 'EXMPL'), + ('address', 'Some Imagined Street'), + ('address', 'Milano'), + ('address', 'ITALY'), + ('phone', '+11 11 11111111'), + ('fax-no', '+11 11 11111112'), + ('e-mail', 'example@example-org-domain.ex'), + ('mnt-ref', 'AS12345-MNT'), + ('admin-c', 'EX11111-RIPE'), + ('mnt-by', 'RIPE-EMP-EX-MNT'), + ('mnt-by', 'AS11111-MNT-MNT'), + ('created', '2000-01-01T00:00:00Z'), + ('last-modified', '2022-01-01T00:00:00Z'), + ('source', 'RIPE'), + ('', ''), + + ('role', 'XXXX_ROLE'), + ('address', 'Example Company'), + ('address', '0001 Example Street_1'), + ('phone', '+11 111 1111 1111'), + ('admin-c', 'ZXCV-EXMPL-RIPE'), + ('tech-c', 'ZXCV-EXMPL-RIPE'), + ('nic-hdl', 'ASDF-EXMPL-RIPE'), + ('remarks', '************* PLEASE NOTE **************'), + ('remarks', '********* EXAMPLE REMARKS HERE *********'), + ('abuse-mailbox', 'example_email@example_domain.com'), + ('mnt-by', 'EXMP-CODE'), + ('created', '2000-01-01T00:00:00Z'), + ('last-modified', '2000-01-01T00:00:00Z'), + ('source', 'RIPE'), + ('', ''), + ('role', 'Example Cloud - Example Administration'), + ('address', 'Example Company Name'), + ('address', 'Example Street_2'), + ('address', 'Example City'), + ('address', 'GB'), + ('admin-c', 'SOME-PRSN-RIPE'), + ('tech-c', 'ASDF-EXMPL-RIPE'), + ('nic-hdl', 'ZXCV-EXMPL-RIPE'), + ('created', '2000-01-01T00:00:00Z'), + ('last-modified', '2000-01-01T00:00:00Z'), + ('source', 'RIPE'), + ('mnt-by', 'EXMP-CODE'), + ('', ''), + ], + ], + ), + # ASN Admin-C/Tech-C Person Example param( asn_seq=['22222'], @@ -1186,7 +1693,7 @@ class TestRipeApiClient(unittest.TestCase): DEFAULT_ASN__PERSON_DETAILS_REQUEST_1, DEFAULT_ASN__PERSON_DETAILS_REQUEST_2, ], - asn_and_ip_network_to_unique_details_urls={ + marker_to_details_urls={ 'ASN': { '22222': { DEFAULT_ASN__PERSON_URL_3, @@ -1197,7 +1704,7 @@ class TestRipeApiClient(unittest.TestCase): }, expected_attrs=[ [ - ('Data for', '22222'), + ('Unfiltered data for', '22222'), ('Abuse Contact Emails', ['example_contact_email@example_domain.com']), ('person', 'Example Person_1'), @@ -1229,7 +1736,7 @@ class TestRipeApiClient(unittest.TestCase): DEFAULT_ASN__PERSON_DETAILS_REQUEST_1, DEFAULT_ASN__PERSON_DETAILS_REQUEST_2, ], - asn_and_ip_network_to_unique_details_urls={ + marker_to_details_urls={ 'ASN': { '11111': { DEFAULT_ASN__ROLE_URL_1, @@ -1246,7 +1753,7 @@ class TestRipeApiClient(unittest.TestCase): }, expected_attrs=[ [ - ('Data for', '11111'), + ('Unfiltered data for', '11111'), ('Abuse Contact Emails', ['example_contact_email@example_domain.com']), ('role', 'XXXX_ROLE'), @@ -1279,7 +1786,7 @@ class TestRipeApiClient(unittest.TestCase): ('', ''), ], [ - ('Data for', '22222'), + ('Unfiltered data for', '22222'), ('Abuse Contact Emails', ['example_contact_email@example_domain.com']), ('person', 'Example Person_1'), @@ -1299,7 +1806,7 @@ class TestRipeApiClient(unittest.TestCase): def test_run_asn_input(self, asn_seq, perform_request_mocked_responses, - asn_and_ip_network_to_unique_details_urls, + marker_to_details_urls, expected_attrs): with patch( "n6lib.ripe_api_client.RIPEApiClient._perform_single_request", @@ -1310,8 +1817,8 @@ def test_run_asn_input(self, attrs = ripe_api_client._get_attrs_data_from_unique_details_urls() self.assertEqual(attrs, expected_attrs) self.assertEqual( - ripe_api_client.asn_ip_network_to_details_urls, - asn_and_ip_network_to_unique_details_urls, + ripe_api_client.marker_to_details_urls, + marker_to_details_urls, ) @foreach( @@ -1326,7 +1833,7 @@ def test_run_asn_input(self, DEFAULT_IP_NETWORK__ROLE_DETAILS_REQUEST_3, DEFAULT_IP_NETWORK__ROLE_DETAILS_REQUEST_4, ], - asn_and_ip_network_to_unique_details_urls={ + marker_to_details_urls={ 'ASN': {}, 'IP Network': { '1.1.1.1/24': { @@ -1339,7 +1846,7 @@ def test_run_asn_input(self, }, expected_attrs=[ [ - ('Data for', '1.1.1.1/24'), + ('Unfiltered data for', '1.1.1.1/24'), ('Abuse Contact Emails', ['example_contact_email@example_domain.com']), ('role', 'XXXX_ROLE'), @@ -1374,6 +1881,85 @@ def test_run_asn_input(self, ], ), + # IP Network Admin-C/Tech-C Role Example -- with 'org' (organisation) key + param( + ip_network_seq=['1.1.1.1/24'], + perform_request_mocked_responses=[ + DEFAULT_IP_NETWORK__ROLE__ORG_EXAMPLE, + DEFAULT_IP_NETWORK__ABUSE_CONTACT_REQUEST, + DEFAULT_IP_NETWORK__ORG_DETAILS_REQUEST_1, + DEFAULT_IP_NETWORK__ROLE_DETAILS_REQUEST_1, + DEFAULT_IP_NETWORK__ROLE_DETAILS_REQUEST_2, + DEFAULT_IP_NETWORK__ROLE_DETAILS_REQUEST_3, + DEFAULT_IP_NETWORK__ROLE_DETAILS_REQUEST_4, + ], + marker_to_details_urls={ + 'ASN': {}, + 'IP Network': { + '1.1.1.1/24': { + DEFAULT__ORG_URL_1, + DEFAULT_IP_NETWORK__ROLE_URL_1, + DEFAULT_IP_NETWORK__ROLE_URL_2, + DEFAULT_IP_NETWORK__PERSON_URL_1, + DEFAULT_IP_NETWORK__PERSON_URL_2, + }, + }, + }, + expected_attrs=[ + [ + ('Unfiltered data for', '1.1.1.1/24'), + ('Abuse Contact Emails', + ['example_contact_email@example_domain.com']), + ('organisation', 'ORG-EXAMPLE-RIPE'), + ('org-name', 'Example-Org-Name'), + ('country', 'EX'), + ('org-type', 'EXMPL'), + ('address', 'Some Imagined Street'), + ('address', 'Milano'), + ('address', 'ITALY'), + ('phone', '+11 11 11111111'), + ('fax-no', '+11 11 11111112'), + ('e-mail', 'example@example-org-domain.ex'), + ('mnt-ref', 'AS12345-MNT'), + ('admin-c', 'EX11111-RIPE'), + ('mnt-by', 'RIPE-EMP-EX-MNT'), + ('mnt-by', '1.1.1.1/24-MNT-MNT'), + ('created', '2000-01-01T00:00:00Z'), + ('last-modified', '2022-01-01T00:00:00Z'), + ('source', 'RIPE'), + ('', ''), + ('role', 'XXXX_ROLE'), + ('address', 'Example Company'), + ('address', '0001 Example Street_1'), + ('phone', '+11 111 1111 1111'), + ('admin-c', 'BBBB-EXMPL-RIPE'), + ('tech-c', 'BBBB-EXMPL-RIPE'), + ('nic-hdl', 'AAAA-EXMPL-RIPE'), + ('remarks', '************* PLEASE NOTE **************'), + ('remarks', '********* EXAMPLE REMARKS HERE *********'), + ('abuse-mailbox', 'example_email@example_domain.com'), + ('mnt-by', 'EXMP-CODE'), + ('created', '2000-01-01T00:00:00Z'), + ('last-modified', '2000-01-01T00:00:00Z'), + ('source', 'RIPE'), + ('', ''), + ('role', 'Example Cloud - Example Administration'), + ('address', 'Example Company Name'), + ('address', 'Example Street_2'), + ('address', 'Example City'), + ('address', 'GB'), + ('admin-c', 'SOME-PRSN-RIPE'), + ('tech-c', 'AAAA-EXMPL-RIPE'), + ('nic-hdl', 'BBBB-EXMPL-RIPE'), + ('created', '2000-01-01T00:00:00Z'), + ('last-modified', '2000-01-01T00:00:00Z'), + ('source', 'RIPE'), + ('mnt-by', 'EXMP-CODE'), + ('', ''), + ], + ], + ), + # IP Network Admin-C/Tech-C Person Example param( ip_network_seq=['2.2.2.2/24'], @@ -1383,7 +1969,7 @@ def test_run_asn_input(self, DEFAULT_ASN__PERSON_DETAILS_REQUEST_1, DEFAULT_ASN__PERSON_DETAILS_REQUEST_2, ], - asn_and_ip_network_to_unique_details_urls={ + marker_to_details_urls={ 'ASN': {}, 'IP Network': { '2.2.2.2/24': { @@ -1394,7 +1980,7 @@ def test_run_asn_input(self, }, expected_attrs=[ [ - ('Data for', '2.2.2.2/24'), + ('Unfiltered data for', '2.2.2.2/24'), ('Abuse Contact Emails', ['example_contact_email@example_domain.com']), ('person', 'Example Person_1'), @@ -1426,7 +2012,7 @@ def test_run_asn_input(self, DEFAULT_IP_NETWORK__PERSON_DETAILS_REQUEST_1, DEFAULT_IP_NETWORK__PERSON_DETAILS_REQUEST_2, ], - asn_and_ip_network_to_unique_details_urls={ + marker_to_details_urls={ 'ASN': {}, 'IP Network': { '1.1.1.1/24': { @@ -1443,7 +2029,7 @@ def test_run_asn_input(self, }, expected_attrs=[ [ - ('Data for', '1.1.1.1/24'), + ('Unfiltered data for', '1.1.1.1/24'), ('Abuse Contact Emails', ['example_contact_email@example_domain.com']), ('role', 'XXXX_ROLE'), @@ -1476,7 +2062,7 @@ def test_run_asn_input(self, ('', ''), ], [ - ('Data for', '2.2.2.2/24'), + ('Unfiltered data for', '2.2.2.2/24'), ('Abuse Contact Emails', ['example_contact_email@example_domain.com']), ('person', 'Example Person_1'), @@ -1496,7 +2082,7 @@ def test_run_asn_input(self, def test_run_ip_network_input(self, ip_network_seq, perform_request_mocked_responses, - asn_and_ip_network_to_unique_details_urls, + marker_to_details_urls, expected_attrs): with patch( "n6lib.ripe_api_client.RIPEApiClient._perform_single_request", @@ -1507,8 +2093,8 @@ def test_run_ip_network_input(self, attrs = ripe_api_client._get_attrs_data_from_unique_details_urls() self.assertEqual(attrs, expected_attrs) self.assertEqual( - ripe_api_client.asn_ip_network_to_details_urls, - asn_and_ip_network_to_unique_details_urls, + ripe_api_client.marker_to_details_urls, + marker_to_details_urls, ) @foreach( diff --git a/N6Lib/n6lib/tests/test_search_engine_api.py b/N6Lib/n6lib/tests/test_search_engine_api.py index 9e1cc4b..9562a69 100644 --- a/N6Lib/n6lib/tests/test_search_engine_api.py +++ b/N6Lib/n6lib/tests/test_search_engine_api.py @@ -1,7 +1,8 @@ -# Copyright (c) 2022 NASK. All rights reserved. +# Copyright (c) 2022-2023 NASK. All rights reserved. import unittest +import pytest from unittest_expander import ( expand, foreach, @@ -95,6 +96,7 @@ def test_set_index_error_not_list_in_dict(self, wrong_index): str(context.exception), ) + @pytest.mark.slow def test_index_document_correct(self): self.assertEqual(self.search_engine.index, {}) document_1 = SearchedDocument(1, "zawartość dokumentu") @@ -117,6 +119,7 @@ def test_index_document_correct(self): }, ) + @pytest.mark.slow def test_search_correct(self): for document in [SearchedDocument(1, "zawartość dokumentu"), SearchedDocument(2, "zawartość artykułu")]: @@ -131,6 +134,7 @@ def test_search_error_index_not_set(self): self.search_engine.search("dokument") self.assertEqual("index not set", str(context.exception)) + @pytest.mark.slow def test_search_error_wrong_search_type(self): with self.assertRaises(SearchEngineError) as context: self.search_engine.index_document(SearchedDocument(1, "dokument")) diff --git a/N6Lib/n6lib/unit_test_helpers.py b/N6Lib/n6lib/unit_test_helpers.py index d2cf2b7..4b3a1b6 100644 --- a/N6Lib/n6lib/unit_test_helpers.py +++ b/N6Lib/n6lib/unit_test_helpers.py @@ -206,10 +206,10 @@ def _patching_method(method_name, patcher_maker, target_autocompletion=True): True >>> m.reset_mock() - >>> t.do_patch() + >>> t.do_patch() # doctest: +ELLIPSIS Traceback (most recent call last): ... - TypeError: do_patch() missing 1 required positional argument: 'target' + TypeError: ...do_patch() missing 1 required positional argument: 'target' >>> t.do_patch('spam', sentinel.arg, kwarg=sentinel.kwarg) sentinel.mock_thing @@ -588,6 +588,10 @@ def patch_stdin(self, # # Other helper methods + @staticmethod + def raise_exc(exc): + raise exc + @staticmethod def regex_search(regex, text): if isinstance(regex, (str, bytes)): @@ -977,20 +981,20 @@ class AnyDictIncluding(_ExpectedObjectPlaceholder): >>> any_dict_including_foobar == any_dict_including_foobar True - >>> any_dict_including_foobar == AnyDictIncluding(foo=u'bar') + >>> any_dict_including_foobar == AnyDictIncluding(foo='bar') True >>> any_dict_including_foobar != any_dict_including_foobar False - >>> any_dict_including_foobar != AnyDictIncluding(foo=u'bar') + >>> any_dict_including_foobar != AnyDictIncluding(foo='bar') False - >>> any_dict_including_foobar == AnyDictIncluding(foo=u'barrrrr') + >>> any_dict_including_foobar == AnyDictIncluding(foo='barrrrr') False - >>> AnyDictIncluding(foo=u'barrrrr') == any_dict_including_foobar + >>> AnyDictIncluding(foo='barrrrr') == any_dict_including_foobar False - >>> any_dict_including_foobar != AnyDictIncluding(foo=u'barrrrr') + >>> any_dict_including_foobar != AnyDictIncluding(foo='barrrrr') True - >>> AnyDictIncluding(foo=u'barrrrr') != any_dict_including_foobar + >>> AnyDictIncluding(foo='barrrrr') != any_dict_including_foobar True """ @@ -1059,46 +1063,46 @@ class JSONWhoseContentIsEqualTo(_ExpectedObjectPlaceholder): >>> json1 == b'{"key": 42}' True - >>> json1 == u'{"key": 42}' + >>> json1 == '{"key": 42}' True >>> b'{"key": 42}' == json1 True - >>> u'{"key": 42}' == json1 + >>> '{"key": 42}' == json1 True >>> json1 != b'{"key": 42}' False - >>> json1 != u'{"key": 42}' + >>> json1 != '{"key": 42}' False >>> b'{"key": 42}' != json1 False - >>> u'{"key": 42}' != json1 + >>> '{"key": 42}' != json1 False >>> json2 = JSONWhoseContentIsEqualTo([42, 'spam', {'key': 42}]) >>> json2 == b'[42, "spam", {"key": 42}]' True - >>> json2 == u'[42, "spam", {"key": 42}]' + >>> json2 == '[42, "spam", {"key": 42}]' True >>> b'[42, "spam", {"key": 42}]' == json2 True - >>> u'[42, "spam", {"key": 42}]' == json2 + >>> '[42, "spam", {"key": 42}]' == json2 True >>> json2 != b'[42, "spam", {"key": 42}]' False - >>> json2 != u'[42, "spam", {"key": 42}]' + >>> json2 != '[42, "spam", {"key": 42}]' False >>> b'[42, "spam", {"key": 42}]' != json2 False - >>> u'[42, "spam", {"key": 42}]' != json2 + >>> '[42, "spam", {"key": 42}]' != json2 False >>> json1 == b'{"another-key": 42}' False - >>> json1 == u'{"key": 444442}' + >>> json1 == '{"key": 444442}' False >>> json1 == b'[{"key": 42}]' False - >>> json1 == u'"key"' + >>> json1 == '"key"' False >>> json1 == b'foo' False @@ -1110,11 +1114,11 @@ class JSONWhoseContentIsEqualTo(_ExpectedObjectPlaceholder): False >>> b'{"another-key": 42}' == json1 False - >>> u'{"key": 444442}' == json1 + >>> '{"key": 444442}' == json1 False >>> b'[{"key": 42}]' == json1 False - >>> u'"key"' == json1 + >>> '"key"' == json1 False >>> b'foo' == json1 False @@ -1127,11 +1131,11 @@ class JSONWhoseContentIsEqualTo(_ExpectedObjectPlaceholder): >>> json1 != b'{"another-key": 42}' True - >>> json1 != u'{"key": 444442}' + >>> json1 != '{"key": 444442}' True >>> json1 != b'[{"key": 42}]' True - >>> json1 != u'"key"' + >>> json1 != '"key"' True >>> json1 != b'foo' True @@ -1141,11 +1145,11 @@ class JSONWhoseContentIsEqualTo(_ExpectedObjectPlaceholder): True >>> b'{"another-key": 42}' != json1 True - >>> u'{"key": 444442}' != json1 + >>> '{"key": 444442}' != json1 True >>> b'[{"key": 42}]' != json1 True - >>> u'"key"' != json1 + >>> '"key"' != json1 True >>> b'foo' != json1 True @@ -1156,24 +1160,24 @@ class JSONWhoseContentIsEqualTo(_ExpectedObjectPlaceholder): >>> json1 == json1 True - >>> json1 == JSONWhoseContentIsEqualTo(data={u'key': 42}) + >>> json1 == JSONWhoseContentIsEqualTo(data={'key': 42}) True - >>> JSONWhoseContentIsEqualTo(data={u'key': 42}) == json1 + >>> JSONWhoseContentIsEqualTo(data={'key': 42}) == json1 True >>> json1 != json1 False - >>> json1 != JSONWhoseContentIsEqualTo(data={u'key': 42}) + >>> json1 != JSONWhoseContentIsEqualTo(data={'key': 42}) False - >>> JSONWhoseContentIsEqualTo(data={u'key': 42}) != json1 + >>> JSONWhoseContentIsEqualTo(data={'key': 42}) != json1 False - >>> json1 == JSONWhoseContentIsEqualTo(data={u'key': 444442}) + >>> json1 == JSONWhoseContentIsEqualTo(data={'key': 444442}) False - >>> JSONWhoseContentIsEqualTo(data={u'key': 444442}) == json1 + >>> JSONWhoseContentIsEqualTo(data={'key': 444442}) == json1 False - >>> json1 != JSONWhoseContentIsEqualTo(data={u'key': 444442}) + >>> json1 != JSONWhoseContentIsEqualTo(data={'key': 444442}) True - >>> JSONWhoseContentIsEqualTo(data={u'key': 444442}) != json1 + >>> JSONWhoseContentIsEqualTo(data={'key': 444442}) != json1 True >>> json1 == json2 diff --git a/N6Lib/n6lib/unpacking_helpers.py b/N6Lib/n6lib/unpacking_helpers.py index f61fc81..244f5eb 100644 --- a/N6Lib/n6lib/unpacking_helpers.py +++ b/N6Lib/n6lib/unpacking_helpers.py @@ -54,8 +54,8 @@ def iter_unzip_from_bytes(zipped, (without dir parts) we are interested in. If given (and not `None`) then only the specified files will be extracted, ignoring non-existent ones. Each filename will be, firstly, - coerced to `str` using the `os.fspath()` helper and then - the `as_unicode()` helper from `n6lib.common_helpers`. + coerced to `str` (using the `os.fspath()` helper and then + the `as_unicode()` helper from `n6lib.common_helpers`). `yielding_with_dirs` (default: False): If False -- dir names will be stripped off from yielded file names. If True -- file names will be yielded as found in the archive diff --git a/N6Lib/n6lib/url_helpers.py b/N6Lib/n6lib/url_helpers.py index 534fc6b..8a1fc52 100644 --- a/N6Lib/n6lib/url_helpers.py +++ b/N6Lib/n6lib/url_helpers.py @@ -1,15 +1,14 @@ -# Copyright (c) 2019-2021 NASK. All rights reserved. +# Copyright (c) 2019-2023 NASK. All rights reserved. +import collections import re import ipaddress from n6lib.common_helpers import ( as_bytes, as_unicode, - is_pure_ascii, limit_str, - lower_if_pure_ascii, - try_to_normalize_surrogate_pairs_to_proper_codepoints, + replace_surrogate_pairs_with_proper_codepoints, ) @@ -81,7 +80,8 @@ def does_look_like_url(s): It only checks whether the given string starts with some letter, optionally followed by letter|digit|dot|plus|minus characters, - separated with a colon from the rest of the string. + separated with a colon from the rest of the string which can + contain anything. >>> does_look_like_url('http://www.example.com') True @@ -116,9 +116,12 @@ def does_look_like_http_url_without_prefix(s): # TODO: more tests... def normalize_url(url, - transcode1st=False, - epslash=False, - rmzone=False): + *, + unicode_str=False, + merge_surrogate_pairs=False, + empty_path_slash=False, + remove_ipv6_zone=False, + norm_brief=None): r""" Apply to the given string (or binary data blob) as much of the basic URL/IRI normalization as possible, provided that no semantic changes @@ -130,46 +133,100 @@ def normalize_url(url, The URL (or URI, or IRI) to be normalized. Kwargs (optional): - `transcode1st` (bool; default: False): - Whether, before the actual URL normalization (see the - description in the steps 1-18 below...), the given `url` + `unicode_str` (bool; default: False): + Whether, *before* the actual URL normalization, the `url`, + if given as a `bytes`/`bytearray`, should be coerced to + `str` using the `utf-8` codec with the `surrogatepass` + error handler. + + This flag is supposed to be used only in the case of URLs + which were originally obtained as `str` instances (which + later, for some reasons, might be encoded to `bytes` or + `bytearray` using the `surrogatepass` error handler); if + garbage bytes are encountered then a `UnicodeDecodeError` + is raised. + + `merge_surrogate_pairs` (bool; default: False): + Whether, *before* the actual URL normalization but *after* + `unicode_str`-flag-related processing (if any), the `url` should be: - * if given as a bytes/bytearray instance: decoded using - the 'utf-8' codec with our custom error handler: - 'utf8_surrogatepass_and_surrogateescape'; - * otherwise (assuming a str instance): "transcoded" using - `try_to_normalize_surrogate_pairs_to_proper_codepoints()` + + * if given as a `bytes`/`bytearray` and `unicode_str` is + false -- processed in the following way: first try to + decode it using the `utf-8` codec with the `surrogatepass` + error handler; it that fails then the original `url` + argument, intact, becomes the result (only coerced to + `bytes` if it was given as a `bytearray`); otherwise, + apply `replace_surrogate_pairs_with_proper_codepoints()` + to the decoded content (to ensure that representation of + non-BMP characters is consistent...) and encode the result + using the `utf-8` codec with the `surrogatepass` error + handler; the resultant value is a `bytes` object; + + * otherwise (`url` given as a `str`, or `unicode_str` is + true => so, effectively, `url` is a `str`) -- processed by + applying `replace_surrogate_pairs_with_proper_codepoints()` (to ensure that representation of non-BMP characters is - consistent...). - `epslash` (bool; default: False): + consistent...); the resultant value is a `str` object. + + `empty_path_slash` (bool; default: False): Whether the *path* component of the given URL should be replaced with `/` if the `url`'s *scheme* is `http`, `https` or `ftp` *and* the *path* is empty (note that, generally, this normalization step does not change the URL semantics, with the exception of an URL being the request target of an `OPTIONS` HTTP request; see RFC 7230, section 2.7.3). - `rmzone` (bool; default: False): + + `remove_ipv6_zone` (bool; default: False): Whether the IPv6 zone identifier being a part of an IPv6 address in the `url`'s *host* component should be removed (note that, generally, IPv6 zone identifier has no meaning outside the local system it is related to; see RFC 6874, section 1). + `norm_brief` (iterable or None; default: None): + If not `None`, it should be a string (or another iterable + yielding strings of length 1) whose items are first letters + of any (zero or more) of the other keyword-only argument + names -- equivalent to setting the corresponding arguments + to `True` (useful in contexts where brevity is important). + If given, no other keyword arguments can be set to `True`. + Returns: A `str` object (`if a `str` was given) or a `bytes` object (if a - `bytes` or `bytearray` object was given *and* `transcode1st` was + `bytes` or `bytearray` object was given *and* `unicode_str` was false) representing the URL after a *best effort* but *keeping semantic equivalence* normalization (see below: the description of the algorithm). Raises: - `TypeError` if `url` is not a str or bytes/bytearray instance. + * `TypeError` -- if: + + * `url` is not a `str`/`bytes`/`bytearray`, + * `norm_brief` is given when (an)other keyword-only argument(s) + is/are also given; + + * `UnicodeDecodeError` -- if `unicode_str` is true and `url` is + such a `bytes`/`bytearray` that is not decodable to `str` using + the `utf-8` codec with the `surrogatepass` error handler; - The algorithm of normalization consists of the following steps [the - `+` operator in this description means *string concatenation*]: + * `ValueError` (other than `UnicodeDecodeError`) -- if: - 0. Optional `url` transcoding (see the above description of the - `transcode1st` argument). + * `norm_brief` contains any value not being the first letter + of another keyword-only argument; + * `norm_brief` contains duplicate values. + + *** + + The algorithm of normalization consists of the following steps: + + [Note #1: the `+` operator in this description means *string + concatenation*. Note #2: if a `bytes` object is processed, it is + treated as if it was a string; the UTF-8 encoding is then assumed + for character recognition and regular expression matching.] + + 0. Optional `url` decoding/recoding (see the above description of + the `unicode_str` and `merge_surrogate_pairs` arguments). 1. Try to split the `url` into two parts: the `scheme` component (matching the `scheme` group of the regular expression @@ -218,9 +275,9 @@ def normalize_url(url, incorrectness (i.e., `ipv6` could not be parsed as a valid IPv6 address) then leave `ipv6` intact. - 9. If `ipv6 zone` is *not* present, or the `rmzone` argument is - true, then set `ipv6 zone` to an empty string and skip to step - 11; otherwise proceed to step 10. + 9. If `ipv6 zone` is *not* present, or the `remove_ipv6_zone` + argument is true, then set `ipv6 zone` to an empty string and + skip to step 11; otherwise proceed to step 10. 10. If `ipv6 zone` consists only of ASCII characters then convert it to *lowercase*; otherwise leave it intact. @@ -250,8 +307,8 @@ def normalize_url(url, 15. If `path` is present then leave it intact and skip to step 17; otherwise proceed to step 16. - 16. If the `epslash` argument is true and `scheme` is one of: - "http", "https", "ftp" -- then set `path` to "/"; otherwise + 16. If the `empty_path_slash` argument is true and `scheme` is one + of: "http", "https", "ftp" -- then set `path` to "/"; otherwise set `path` to an empty string. 17. If `after path` is *not* present then set it to an empty @@ -263,25 +320,106 @@ def normalize_url(url, Ad 0: + >>> normalize_url('\U0010ffff') + '\U0010ffff' + >>> normalize_url('\U0010ffff', merge_surrogate_pairs=True) + '\U0010ffff' + >>> normalize_url('\U0010ffff', unicode_str=True) + '\U0010ffff' + >>> normalize_url('\U0010ffff', unicode_str=True, merge_surrogate_pairs=True) + '\U0010ffff' >>> normalize_url(b'\xf4\x8f\xbf\xbf') b'\xf4\x8f\xbf\xbf' - >>> normalize_url(b'\xf4\x8f\xbf\xbf', transcode1st=True) + >>> normalize_url(b'\xf4\x8f\xbf\xbf', merge_surrogate_pairs=True) + b'\xf4\x8f\xbf\xbf' + >>> normalize_url(b'\xf4\x8f\xbf\xbf', unicode_str=True) + '\U0010ffff' + >>> normalize_url(b'\xf4\x8f\xbf\xbf', unicode_str=True, merge_surrogate_pairs=True) '\U0010ffff' - >>> normalize_url('\udbff\udfff') # look at this! + >>> normalize_url('\udbff\udfff') '\udbff\udfff' - >>> normalize_url('\udbff\udfff', transcode1st=True) + >>> normalize_url('\udbff\udfff', merge_surrogate_pairs=True) '\U0010ffff' - >>> normalize_url('\U0010ffff') + >>> normalize_url('\udbff\udfff', unicode_str=True) + '\udbff\udfff' + >>> normalize_url('\udbff\udfff', unicode_str=True, merge_surrogate_pairs=True) '\U0010ffff' - >>> normalize_url('\U0010ffff', transcode1st=True) + >>> normalize_url(b'\xed\xaf\xbf\xed\xbf\xbf') + b'\xed\xaf\xbf\xed\xbf\xbf' + >>> normalize_url(b'\xed\xaf\xbf\xed\xbf\xbf', merge_surrogate_pairs=True) + b'\xf4\x8f\xbf\xbf' + >>> normalize_url(b'\xed\xaf\xbf\xed\xbf\xbf', unicode_str=True) + '\udbff\udfff' + >>> normalize_url(b'\xed\xaf\xbf\xed\xbf\xbf', unicode_str=True, merge_surrogate_pairs=True) '\U0010ffff' + >>> normalize_url('\udfff\udbff\udfff\udbff') + '\udfff\udbff\udfff\udbff' + >>> normalize_url('\udfff\udbff\udfff\udbff', merge_surrogate_pairs=True) + '\udfff\U0010ffff\udbff' + >>> normalize_url('\udfff\udbff\udfff\udbff', unicode_str=True) + '\udfff\udbff\udfff\udbff' + >>> normalize_url('\udfff\udbff\udfff\udbff', unicode_str=True, merge_surrogate_pairs=True) + '\udfff\U0010ffff\udbff' + >>> normalize_url(b'\xed\xbf\xbf\xed\xaf\xbf\xed\xbf\xbf\xed\xaf\xbf') + b'\xed\xbf\xbf\xed\xaf\xbf\xed\xbf\xbf\xed\xaf\xbf' + >>> normalize_url(b'\xed\xbf\xbf\xed\xaf\xbf\xed\xbf\xbf\xed\xaf\xbf', + ... merge_surrogate_pairs=True) + b'\xed\xbf\xbf\xf4\x8f\xbf\xbf\xed\xaf\xbf' + >>> normalize_url(b'\xed\xbf\xbf\xed\xaf\xbf\xed\xbf\xbf\xed\xaf\xbf', + ... unicode_str=True) + '\udfff\udbff\udfff\udbff' + >>> normalize_url(b'\xed\xbf\xbf\xed\xaf\xbf\xed\xbf\xbf\xed\xaf\xbf', + ... unicode_str=True, + ... merge_surrogate_pairs=True) + '\udfff\U0010ffff\udbff' + >>> normalize_url(b'\xed') # (non-UTF-8 garbage) + b'\xed' + >>> normalize_url(b'\xed', merge_surrogate_pairs=True) + b'\xed' + >>> normalize_url(b'\xed', unicode_str=True) # doctest: +IGNORE_EXCEPTION_DETAIL + Traceback (most recent call last): + ... + UnicodeDecodeError: ... + >>> normalize_url(b'\xed', + ... unicode_str=True, + ... merge_surrogate_pairs=True) # doctest: +IGNORE_EXCEPTION_DETAIL + Traceback (most recent call last): + ... + UnicodeDecodeError: ... + >>> normalize_url(b'\xed\xed\xbf\xbf\xed\xaf\xbf\xed\xbf\xbf\xed\xaf\xbf') + b'\xed\xed\xbf\xbf\xed\xaf\xbf\xed\xbf\xbf\xed\xaf\xbf' + >>> normalize_url(b'\xed\xed\xbf\xbf\xed\xaf\xbf\xed\xbf\xbf\xed\xaf\xbf', + ... merge_surrogate_pairs=True) + b'\xed\xed\xbf\xbf\xed\xaf\xbf\xed\xbf\xbf\xed\xaf\xbf' + >>> normalize_url(b'\xed\xed\xbf\xbf\xed\xaf\xbf\xed\xbf\xbf\xed\xaf\xbf', + ... unicode_str=True) # doctest: +IGNORE_EXCEPTION_DETAIL + Traceback (most recent call last): + ... + UnicodeDecodeError: ... + >>> normalize_url(b'\xed\xed\xbf\xbf\xed\xaf\xbf\xed\xbf\xbf\xed\xaf\xbf', + ... unicode_str=True, + ... merge_surrogate_pairs=True) # doctest: +IGNORE_EXCEPTION_DETAIL + Traceback (most recent call last): + ... + UnicodeDecodeError: ... Ad 0-2: >>> normalize_url(b'Blabla-bla!@#$ %^&\xc4\x85\xcc') b'Blabla-bla!@#$ %^&\xc4\x85\xcc' - >>> normalize_url(b'Blabla-bla!@#$ %^&\xc4\x85\xcc', transcode1st=True) + >>> normalize_url(b'Blabla-bla!@#$ %^&\xc4\x85\xcc', + ... merge_surrogate_pairs=True) + b'Blabla-bla!@#$ %^&\xc4\x85\xcc' + >>> normalize_url(b'Blabla-bla!@#$ %^&\xc4\x85\xed\xb3\x8c', + ... merge_surrogate_pairs=True) + b'Blabla-bla!@#$ %^&\xc4\x85\xed\xb3\x8c' + >>> normalize_url(b'Blabla-bla!@#$ %^&\xc4\x85\xed\xb3\x8c', + ... unicode_str=True) + 'Blabla-bla!@#$ %^&\u0105\udccc' + >>> normalize_url(b'Blabla-bla!@#$ %^&\xc4\x85\xed\xb3\x8c', + ... unicode_str=True, + ... merge_surrogate_pairs=True) 'Blabla-bla!@#$ %^&\u0105\udccc' >>> normalize_url('Blabla-bla!@#$ %^&\u0105\udccc') 'Blabla-bla!@#$ %^&\u0105\udccc' @@ -289,9 +427,20 @@ def normalize_url(url, Ad 0-1 + 3 + 5: - >>> normalize_url(b'SOME-scheme:Blabla-bla!@#$ %^&\xc4\x85\xcc') + >>> normalize_url(b'Some-Scheme:Blabla-bla!@#$ %^&\xc4\x85\xcc') + b'some-scheme:Blabla-bla!@#$ %^&\xc4\x85\xcc' + >>> normalize_url(b'Some-Scheme:Blabla-bla!@#$ %^&\xc4\x85\xcc', + ... merge_surrogate_pairs=True) b'some-scheme:Blabla-bla!@#$ %^&\xc4\x85\xcc' - >>> normalize_url(b'SOME-scheme:Blabla-bla!@#$ %^&\xc4\x85\xcc', transcode1st=True) + >>> normalize_url(b'Some-Scheme:Blabla-bla!@#$ %^&\xc4\x85\xed\xb3\x8c', + ... merge_surrogate_pairs=True) + b'some-scheme:Blabla-bla!@#$ %^&\xc4\x85\xed\xb3\x8c' + >>> normalize_url(b'SOME-scheme:Blabla-bla!@#$ %^&\xc4\x85\xed\xb3\x8c', + ... unicode_str=True) + 'some-scheme:Blabla-bla!@#$ %^&\u0105\udccc' + >>> normalize_url(b'SOME-scheme:Blabla-bla!@#$ %^&\xc4\x85\xed\xb3\x8c', + ... unicode_str=True, + ... merge_surrogate_pairs=True) 'some-scheme:Blabla-bla!@#$ %^&\u0105\udccc' >>> normalize_url('somE-sCHEmE:Blabla-bla!@#$ %^&\u0105\udccc') 'some-scheme:Blabla-bla!@#$ %^&\u0105\udccc' @@ -306,124 +455,270 @@ def normalize_url(url, >>> normalize_url(b'HtTP://[2001:0DB8:85A3:0000:0000:8A2E:3.112.115.52%25en1]') b'http://[2001:db8:85a3::8a2e:370:7334%25en1]' >>> normalize_url(b'HtTP://[2001:0DB8:85A3::8A2E:0370:7334]/fooBAR', - ... epslash=True) + ... empty_path_slash=True) b'http://[2001:db8:85a3::8a2e:370:7334]/fooBAR' >>> normalize_url(b'HtTP://[2001:0DB8:85A3:0000:0000:8A2E:3.112.115.52]:80') b'http://[2001:db8:85a3::8a2e:370:7334]' >>> normalize_url(b'HtTP://[2001:0DB8:85A3:0000:0000:8A2E:0370:7334%25en1]:80', - ... epslash=True) + ... empty_path_slash=True) b'http://[2001:db8:85a3::8a2e:370:7334%25en1]/' >>> normalize_url(b'HtTP://[2001:DB8:85A3::8A2E:3.112.115.52]', - ... rmzone=True) + ... remove_ipv6_zone=True) b'http://[2001:db8:85a3::8a2e:370:7334]' >>> normalize_url(b'HtTP://[2001:0db8:85a3:0000:0000:8a2e:0370:7334%25EN1]', - ... rmzone=True) + ... remove_ipv6_zone=True) b'http://[2001:db8:85a3::8a2e:370:7334]' >>> normalize_url(b'HtTP://[2001:0DB8:85A3:0000:0000:8A2E:3.112.115.52%25en1]', - ... rmzone=True, epslash=True) + ... remove_ipv6_zone=True, + ... empty_path_slash=True) b'http://[2001:db8:85a3::8a2e:370:7334]/' >>> normalize_url(b'HtTP://[2001:0DB8:85A3::8A2E:0370:7334%25en1]:80', - ... rmzone=True) + ... remove_ipv6_zone=True) b'http://[2001:db8:85a3::8a2e:370:7334]' >>> normalize_url(b'HtTP://[2001:DB8:85A3:0000:0000:8A2E:3.112.115.52%25en1]:80', - ... rmzone=True, epslash=True) + ... remove_ipv6_zone=True, + ... empty_path_slash=True) b'http://[2001:db8:85a3::8a2e:370:7334]/' + >>> normalize_url(b'HtTP://[2001:DB8:85A3::0123%25En1]:80#\xed\xaf\xbf\xed\xbf\xbf') + b'http://[2001:db8:85a3::123%25en1]#\xed\xaf\xbf\xed\xbf\xbf' + >>> normalize_url(b'HtTP://[2001:DB8:85A3::0123%25En1]:80#\xed\xaf\xbf\xed\xbf\xbf', + ... remove_ipv6_zone=True) + b'http://[2001:db8:85a3::123]#\xed\xaf\xbf\xed\xbf\xbf' + >>> normalize_url(b'HtTP://[2001:DB8:85A3::0123%25En1]:80#\xed\xaf\xbf\xed\xbf\xbf', + ... empty_path_slash=True) + b'http://[2001:db8:85a3::123%25en1]/#\xed\xaf\xbf\xed\xbf\xbf' + >>> normalize_url(b'HtTP://[2001:DB8:85A3::0123%25En1]:80#\xed\xaf\xbf\xed\xbf\xbf', + ... remove_ipv6_zone=True, + ... empty_path_slash=True) + b'http://[2001:db8:85a3::123]/#\xed\xaf\xbf\xed\xbf\xbf' + >>> normalize_url(b'HtTP://[2001:DB8:85A3::0123%25En1]:80#\xed\xaf\xbf\xed\xbf\xbf', + ... merge_surrogate_pairs=True, + ... remove_ipv6_zone=True, + ... empty_path_slash=True) + b'http://[2001:db8:85a3::123]/#\xf4\x8f\xbf\xbf' + >>> normalize_url(b'HtTP://[2001:DB8:85A3::0123%25En1]:80#\xed\xaf\xbf\xed\xbf\xbf', + ... unicode_str=True, + ... remove_ipv6_zone=True, + ... empty_path_slash=True) + 'http://[2001:db8:85a3::123]/#\udbff\udfff' + >>> normalize_url(b'HtTP://[2001:DB8:85A3::0123%25En1]:80#\xed\xaf\xbf\xed\xbf\xbf', + ... unicode_str=True, + ... merge_surrogate_pairs=True, + ... remove_ipv6_zone=True, + ... empty_path_slash=True) + 'http://[2001:db8:85a3::123]/#\U0010ffff' >>> normalize_url('HtTP://[2001:0DB8:85A3:0000:0000:8A2E:3.112.115.52]') 'http://[2001:db8:85a3::8a2e:370:7334]' >>> normalize_url('HtTP://[2001:0db8:85a3::8a2e:370:7334%25EN1]') 'http://[2001:db8:85a3::8a2e:370:7334%25en1]' >>> normalize_url('HtTP://[2001:0DB8:85A3:0000:0000:8A2E:0370:7334FAB%25eN1]', - ... epslash=True) + ... empty_path_slash=True) 'http://[2001:0DB8:85A3:0000:0000:8A2E:0370:7334FAB%25en1]/' >>> normalize_url('HtTP://[2001:0DB8:85A3:0000:0000:8a2e:3.112.115.52]', - ... epslash=True) + ... empty_path_slash=True) 'http://[2001:db8:85a3::8a2e:370:7334]/' >>> normalize_url('HtTP://[2001:0DB8:85A3:0000:0000:8A2E:0370:7334]:80') 'http://[2001:db8:85a3::8a2e:370:7334]' >>> normalize_url('HtTP://[2001:0DB8:85A3::8A2E:3.112.115.52%25en1]:80', - ... epslash=True) + ... empty_path_slash=True) 'http://[2001:db8:85a3::8a2e:370:7334%25en1]/' >>> normalize_url('HtTP://[2001:db8:85a3:0000:0000:8A2E:0370:7334]', - ... rmzone=True) + ... remove_ipv6_zone=True) 'http://[2001:db8:85a3::8a2e:370:7334]' >>> normalize_url('HtTP://[2001:0DB8:85A3:0000:0000:8A2E:3.112.115.52%25en1]/fooBAR', - ... rmzone=True) + ... remove_ipv6_zone=True) 'http://[2001:db8:85a3::8a2e:370:7334]/fooBAR' >>> normalize_url('HtTP://[2001:0DB8:85A3::8A2E:0370:7334%25en1]', - ... rmzone=True, epslash=True) + ... remove_ipv6_zone=True, + ... empty_path_slash=True) 'http://[2001:db8:85a3::8a2e:370:7334]/' >>> normalize_url('HtTP://[2001:0DB8:85A3:0000:0000:8A2E:3.112.115.52%25en1]:80', - ... rmzone=True) + ... remove_ipv6_zone=True) 'http://[2001:db8:85a3::8a2e:370:7334]' >>> normalize_url('HtTP://[2001:0DB8:85A3:0000:0000:8A2E:0370:7334%25en1]:80', - ... rmzone=True, epslash=True) + ... remove_ipv6_zone=True, + ... empty_path_slash=True) 'http://[2001:db8:85a3::8a2e:370:7334]/' + >>> normalize_url('HtTP://[2001:DB8:85A3::0123%25En1]:80#\udbff\udfff', + ... remove_ipv6_zone=True, + ... empty_path_slash=True) + 'http://[2001:db8:85a3::123]/#\udbff\udfff' + >>> normalize_url('HtTP://[2001:DB8:85A3::0123%25En1]:80#\udbff\udfff', + ... merge_surrogate_pairs=True, + ... remove_ipv6_zone=True, + ... empty_path_slash=True) + 'http://[2001:db8:85a3::123]/#\U0010ffff' + >>> normalize_url('HtTP://[2001:DB8:85A3::0123%25En1]:80#\udbff\udfff', + ... unicode_str=True, + ... remove_ipv6_zone=True, + ... empty_path_slash=True) + 'http://[2001:db8:85a3::123]/#\udbff\udfff' + >>> normalize_url('HtTP://[2001:DB8:85A3::0123%25En1]:80#\udbff\udfff', + ... unicode_str=True, + ... merge_surrogate_pairs=True, + ... remove_ipv6_zone=True, + ... empty_path_slash=True) + 'http://[2001:db8:85a3::123]/#\U0010ffff' >>> normalize_url(b'HtTPS://[2001:DB8:85A3:0000:0000:8A2E:3.112.115.52%25En1]:80') b'https://[2001:db8:85a3::8a2e:370:7334%25en1]:80' >>> normalize_url(b'HtTPS://[2001:DB8:85A3:0000:0000:8A2E:3.112.115.52%25en1]:80', - ... rmzone=True) + ... remove_ipv6_zone=True) b'https://[2001:db8:85a3::8a2e:370:7334]:80' >>> normalize_url(b'HtTPS://[2001:0db8:85a3::8a2E:3.112.115.52%25en1]:443', - ... rmzone=True) + ... remove_ipv6_zone=True) b'https://[2001:db8:85a3::8a2e:370:7334]' >>> normalize_url(b'HtTPS://[2001:DB8:85A3:0000:0000:8A2E:0370:7334%25eN\xc4\x851]:80', - ... epslash=True) + ... empty_path_slash=True) b'https://[2001:db8:85a3::8a2e:370:7334%25eN\xc4\x851]:80/' >>> normalize_url('HtTPS://[2001:0db8:85a3::8a2E:3.112.115.52%25En1]:443') 'https://[2001:db8:85a3::8a2e:370:7334%25en1]' >>> normalize_url('HtTPS://[2001:0DB8:85A3:0000:0000:8A2E:3.112.115.52%25eN\xc4\x851]:443', - ... epslash=True) + ... empty_path_slash=True) 'https://[2001:db8:85a3::8a2e:370:7334%25eN\xc4\x851]/' >>> normalize_url('HtTPS://[2001:0DB8:85A3::8A2E:0370:7334%25eN1]:80', - ... rmzone=True, epslash=True) + ... remove_ipv6_zone=True, + ... empty_path_slash=True) 'https://[2001:db8:85a3::8a2e:370:7334]:80/' >>> normalize_url('HtTPS://[2001:0DB8:85A3::8A2E:370:7334%25eN1]:443', - ... rmzone=True, epslash=True) + ... remove_ipv6_zone=True, + ... empty_path_slash=True) 'https://[2001:db8:85a3::8a2e:370:7334]/' Ad 0-1 + 3-4 + 12-18: - >>> normalize_url(b'HTTP://WWW.XyZ-\xc4\x85\xcc.eXamplE.com', epslash=True) + >>> normalize_url(b'HTTP://WWW.XyZ-\xc4\x85\xcc.eXamplE.com', + ... empty_path_slash=True) b'http://www.XyZ-\xc4\x85\xcc.example.com/' - >>> normalize_url(b'HTTP://WWW.XyZ-\xc4\x85\xcc.eXamplE.com', transcode1st=True) - 'http://www.XyZ-\u0105\udccc.example.com' + >>> normalize_url(b'HTTP://WWW.XyZ-\xc4\x85\xcc.eXamplE.com', + ... empty_path_slash=True, + ... merge_surrogate_pairs=True) + b'http://www.XyZ-\xc4\x85\xcc.example.com/' + >>> normalize_url(b'HTTP://WWW.XyZ-\xc4\x85\xcc.eXamplE.com', + ... merge_surrogate_pairs=True) + b'http://www.XyZ-\xc4\x85\xcc.example.com' + >>> normalize_url(b'HTTP://WWW.XyZ-\xc4\x85\xcc.eXamplE.com', + ... unicode_str=True) # doctest: +IGNORE_EXCEPTION_DETAIL + Traceback (most recent call last): + ... + UnicodeDecodeError: ... >>> normalize_url(b'HTTP://WWW.XyZ-\xc4\x85.eXamplE.com:80/fooBAR') b'http://www.XyZ-\xc4\x85.example.com/fooBAR' - >>> normalize_url(b'HtTP://WWW.XyZ-\xc4\x85.eXamplE.com:80', epslash=True) + >>> normalize_url(b'HtTP://WWW.XyZ-\xc4\x85.eXamplE.com:80', + ... empty_path_slash=True) b'http://www.XyZ-\xc4\x85.example.com/' - >>> normalize_url(b'HtTP://WWW.XyZ-\xc4\x85.eXamplE.com:80/fooBAR', epslash=True) + >>> normalize_url(b'HtTP://WWW.XyZ-\xc4\x85.eXamplE.com:80/fooBAR', + ... empty_path_slash=True) b'http://www.XyZ-\xc4\x85.example.com/fooBAR' - >>> normalize_url(b'HTTP://WWW.XyZ-\xc4\x85\xcc.eXamplE.com', transcode1st=True) - 'http://www.XyZ-\u0105\udccc.example.com' + >>> normalize_url(b'HTTP://WWW.XyZ-\xc4\x85\xcc.eXamplE.com', + ... merge_surrogate_pairs=True) + b'http://www.XyZ-\xc4\x85\xcc.example.com' + >>> normalize_url(b'HTTP://WWW.XyZ-\xc4\x85\xcc.eXamplE.com', + ... unicode_str=True) # doctest: +IGNORE_EXCEPTION_DETAIL + Traceback (most recent call last): + ... + UnicodeDecodeError: ... >>> normalize_url('HTtp://WWW.XyZ-\u0105\udccc.eXamplE.com:80') 'http://www.XyZ-\u0105\udccc.example.com' >>> normalize_url('HTtp://WWW.XyZ-\u0105.eXamplE.com:80/') 'http://www.XyZ-\u0105.example.com/' - >>> normalize_url('hTTP://WWW.XyZ-\u0105.eXamplE.com:80', epslash=True) + >>> normalize_url('hTTP://WWW.XyZ-\u0105.eXamplE.com:80', + ... empty_path_slash=True) 'http://www.XyZ-\u0105.example.com/' >>> normalize_url(b'HTTPS://WWW.XyZ-\xc4\x85.eXamplE.com:80') b'https://www.XyZ-\xc4\x85.example.com:80' >>> normalize_url(b'HTTPS://WWW.XyZ-\xc4\x85.eXamplE.com:80/fooBAR') b'https://www.XyZ-\xc4\x85.example.com:80/fooBAR' - >>> normalize_url(b'HTTPs://WWW.XyZ-\xc4\x85.eXamplE.com:443', epslash=True) + >>> normalize_url(b'HTTPs://WWW.XyZ-\xc4\x85.eXamplE.com:443', + ... empty_path_slash=True) + b'https://www.XyZ-\xc4\x85.example.com/' + >>> normalize_url(b'HTTPs://WWW.XyZ-\xc4\x85.eXamplE.com:443', + ... empty_path_slash=True, + ... merge_surrogate_pairs=True) b'https://www.XyZ-\xc4\x85.example.com/' - >>> normalize_url(b'HTTPs://WWW.XyZ-\xc4\x85.eXamplE.com:443', epslash=True, transcode1st=True) + >>> normalize_url(b'HTTPs://WWW.XyZ-\xc4\x85.eXamplE.com:443', + ... empty_path_slash=True, + ... unicode_str=True) 'https://www.XyZ-\u0105.example.com/' - >>> normalize_url('httpS://WWW.XyZ-\u0105.eXamplE.com:80', epslash=True) + >>> normalize_url('httpS://WWW.XyZ-\u0105.eXamplE.com:80', + ... empty_path_slash=True) 'https://www.XyZ-\u0105.example.com:80/' - >>> normalize_url('httpS://WWW.XyZ-\u0105.eXamplE.com:80/fooBAR', epslash=True) + >>> normalize_url('httpS://WWW.XyZ-\u0105.eXamplE.com:80/fooBAR', + ... empty_path_slash=True) 'https://www.XyZ-\u0105.example.com:80/fooBAR' >>> normalize_url('hTtpS://WWW.XyZ-\u0105.eXamplE.com:443') 'https://www.XyZ-\u0105.example.com' - >>> normalize_url('httpS://WWW.XyZ-\u0105.eXamplE.com:80/fooBAR', epslash=True, - ... transcode1st=True) + >>> normalize_url('httpS://WWW.XyZ-\u0105.eXamplE.com:80/fooBAR', + ... empty_path_slash=True, + ... merge_surrogate_pairs=True) + 'https://www.XyZ-\u0105.example.com:80/fooBAR' + >>> normalize_url('httpS://WWW.XyZ-\u0105.eXamplE.com:80/fooBAR', + ... empty_path_slash=True, + ... unicode_str=True) 'https://www.XyZ-\u0105.example.com:80/fooBAR' + + Ad use of the `norm_brief` keyword argument: + + >>> normalize_url(b'HtTP://[2001:DB8:85A3::0123%25En1]:80#\xed\xaf\xbf\xed\xbf\xbf', + ... norm_brief='') + b'http://[2001:db8:85a3::123%25en1]#\xed\xaf\xbf\xed\xbf\xbf' + >>> normalize_url(b'HtTP://[2001:DB8:85A3::0123%25En1]:80#\xed\xaf\xbf\xed\xbf\xbf', + ... norm_brief='r') + b'http://[2001:db8:85a3::123]#\xed\xaf\xbf\xed\xbf\xbf' + >>> normalize_url(b'HtTP://[2001:DB8:85A3::0123%25En1]:80#\xed\xaf\xbf\xed\xbf\xbf', + ... norm_brief='e') + b'http://[2001:db8:85a3::123%25en1]/#\xed\xaf\xbf\xed\xbf\xbf' + >>> normalize_url(b'HtTP://[2001:DB8:85A3::0123%25En1]:80#\xed\xaf\xbf\xed\xbf\xbf', + ... norm_brief='er') + b'http://[2001:db8:85a3::123]/#\xed\xaf\xbf\xed\xbf\xbf' + >>> normalize_url(b'HtTP://[2001:DB8:85A3::0123%25En1]:80#\xed\xaf\xbf\xed\xbf\xbf', + ... norm_brief=['r', 'e']) + b'http://[2001:db8:85a3::123]/#\xed\xaf\xbf\xed\xbf\xbf' + >>> normalize_url(b'HtTP://[2001:DB8:85A3::0123%25En1]:80#\xed\xaf\xbf\xed\xbf\xbf', + ... norm_brief=iter(['r', 'e', 'm'])) + b'http://[2001:db8:85a3::123]/#\xf4\x8f\xbf\xbf' + >>> normalize_url(b'HtTP://[2001:DB8:85A3::0123%25En1]:80#\xed\xaf\xbf\xed\xbf\xbf', + ... norm_brief='uer') + 'http://[2001:db8:85a3::123]/#\udbff\udfff' + >>> normalize_url(b'HtTP://[2001:DB8:85A3::0123%25En1]:80#\xed\xaf\xbf\xed\xbf\xbf', + ... norm_brief='emru') + 'http://[2001:db8:85a3::123]/#\U0010ffff' + + >>> normalize_url(b'HtTP://[2001:DB8:85A3::0123%25En1]:80#\xed\xaf\xbf\xed\xbf\xbf', + ... unicode_str=True, + ... norm_brief='emru') + Traceback (most recent call last): + ... + TypeError: when `norm_brief` is given, no other keyword arguments can be set to true + + >>> normalize_url(b'HtTP://[2001:DB8:85A3::0123%25En1]:80#\xed\xaf\xbf\xed\xbf\xbf', + ... norm_brief='ueMqrb') + Traceback (most recent call last): + ... + ValueError: unknown flags in `norm_brief`: 'M', 'q', 'b' + + >>> normalize_url(b'HtTP://[2001:DB8:85A3::0123%25En1]:80#\xed\xaf\xbf\xed\xbf\xbf', + ... norm_brief='rueummur') + Traceback (most recent call last): + ... + ValueError: duplicate flags in `norm_brief`: 'r', 'u', 'm' """ + if norm_brief is not None: + if unicode_str or merge_surrogate_pairs or empty_path_slash or remove_ipv6_zone: + raise TypeError( + 'when `norm_brief` is given, no other ' + 'keyword arguments can be set to true') + (unicode_str, + merge_surrogate_pairs, + empty_path_slash, + remove_ipv6_zone) = _parse_norm_brief(norm_brief) + if isinstance(url, bytearray): - url = as_bytes(url) - if transcode1st: - url = _transcode(url) + url = bytes(url) + if unicode_str and isinstance(url, bytes): + url = as_unicode(url, 'surrogatepass') + if merge_surrogate_pairs: + url = _merge_surrogate_pairs(url) scheme = _get_scheme(url) if scheme is None: # does not look like a URL at all @@ -439,13 +734,69 @@ def normalize_url(url, # -> the only normalized component is *scheme* return scheme + rest before_host = _get_before_host(match) - host = _get_host(match, rmzone) + host = _get_host(match, remove_ipv6_zone) port = _get_port(match, scheme) - path = _get_path(match, scheme, epslash) + path = _get_path(match, scheme, empty_path_slash) after_path = _get_after_path(match) return scheme + before_host + host + port + path + after_path +# *EXPERIMENTAL* (likely to be changed or removed in the future +# without any warning/deprecation/etc.) +def prepare_norm_brief(*, + unicode_str=False, + merge_surrogate_pairs=False, + empty_path_slash=False, + remove_ipv6_zone=False): + r""" + A convenience helper: prepare the `normalize_url()`'s `norm_brief` + keyword argument value based on any other `normalize_url()`'s + keyword-only arguments (see the docs of `normalize_url()`...). + + It is guaranteed that characters in the returned string are sorted + and unique. + + >>> prepare_norm_brief() + '' + >>> prepare_norm_brief(unicode_str=True) + 'u' + >>> prepare_norm_brief(remove_ipv6_zone=True, empty_path_slash=True) + 'er' + >>> prepare_norm_brief(unicode_str=True, merge_surrogate_pairs=True, empty_path_slash=True) + 'emu' + + >>> raw_url = b'HtTP://[2001:DB8:85A3::0123%25En1]:80#\xed\xaf\xbf\xed\xbf\xbf' + >>> a = normalize_url( + ... raw_url, + ... unicode_str=True, + ... merge_surrogate_pairs=True, + ... remove_ipv6_zone=True, + ... empty_path_slash=True) + >>> my_norm_brief = prepare_norm_brief( + ... unicode_str=True, + ... merge_surrogate_pairs=True, + ... remove_ipv6_zone=True, + ... empty_path_slash=True) + >>> b = normalize_url(raw_url, norm_brief=my_norm_brief) + >>> a == b == 'http://[2001:db8:85a3::123]/#\U0010ffff' + True + >>> my_norm_brief + 'emru' + """ + def gen(): + if unicode_str: + yield 'u' + if merge_surrogate_pairs: + yield 'm' + if empty_path_slash: + yield 'e' + if remove_ipv6_zone: + yield 'r' + norm_brief = ''.join(sorted(gen())) + assert norm_brief == ''.join(sorted(frozenset(norm_brief))) # sorted and unique + return norm_brief + + # *EXPERIMENTAL* (likely to be changed or removed in the future # without any warning/deprecation/etc.) def make_provisional_url_search_key(url_orig): @@ -453,26 +804,73 @@ def make_provisional_url_search_key(url_orig): >>> mk = make_provisional_url_search_key >>> mk('http://\u0106ma.eXample.COM:80/\udcdd\ud800Ala-ma-kota\U0010FFFF\udccc') 'SY:http://\u0106ma.example.com/\ufffdAla-ma-kota\ufffd' + >>> mk('http://\u0106ma.eXample.COM:80/\ud800\udcddAla-ma-kota\U0010FFFF\udccc') + 'SY:http://\u0106ma.example.com/\ufffdAla-ma-kota\ufffd' >>> mk(b'HTTP://\xc4\x86ma.eXample.COM:/\xdd\xffAla-ma-kota\xf4\x8f\xbf\xbf\xcc') 'SY:http://\u0106ma.example.com/\ufffdAla-ma-kota\ufffd' >>> mk(b'HTTP://\xc4\x86ma.eXample.COM/\xddAla-ma-kota\xf4\x8f\xbf\xbf\xed\xb3\x8c') 'SY:http://\u0106ma.example.com/\ufffdAla-ma-kota\ufffd' + >>> mk(b'HTTP://\xc4\x86ma.eXample.COM:/\xed\xa0\x80\xed\xb3\x9dAla-ma-kota\xef\xbf\xbd\xcc') + 'SY:http://\u0106ma.example.com/\ufffdAla-ma-kota\ufffd\ufffd' + + >>> mk('') + Traceback (most recent call last): + ... + ValueError: given value is empty + + >>> mk(b'') + Traceback (most recent call last): + ... + ValueError: given value is empty """ if not isinstance(url_orig, (str, bytes, bytearray)): - raise TypeError('{!a} is neither `str` nor `bytes`/`bytearray`'.format(url_orig)) + raise TypeError(f'{url_orig!a} is neither `str` nor `bytes`/`bytearray`') if not url_orig: raise ValueError('given value is empty') - url_proc = url_orig - url_proc = normalize_url(url_proc, transcode1st=True, epslash=True, rmzone=True) - assert isinstance(url_proc, str) + + common_norm_options = dict( + empty_path_slash=True, + remove_ipv6_zone=True, + ) + try: + url_proc = normalize_url( + url_orig, + unicode_str=True, + merge_surrogate_pairs=True, + **common_norm_options) + except UnicodeDecodeError: + # here we have *neither* the strict UTF-8 encoding *nor* + # a more "liberal" variant of it that allows surrogates + # -> let's replace all non-compliant bytes with lone + # surrogates (considering that below they will be + # replaced with `REPLACEMENT CHARACTER` anyway...) + url_proc = normalize_url( + as_unicode(url_orig, 'surrogateescape'), + **common_norm_options) + # Let's get rid of surrogate and non-BMP code points -- because of: + # # * the mess with the MariaDB's "utf8" 3-bytes encoding, + # + # * the mess with surrogates (including those produced by the + # `surrogateescape` error handler to "smuggle" non-compliant + # bytes, also those which could themselves represent a part + # of an already encoded surrogate...). + # + # Historically, we used to want to avoid also: + # # * the mess with differences in handling of surrogates between - # Python versions (especially, 2.x vs. 3.x), - # * the mess with UCS-2 vs. UCS-4 builds of Python 2.x. + # Python versions (especially, 2.7 vs. modern 3.x), + # + # * the mess with UCS-2 vs. UCS-4 builds of Python 2.7. + # + # Every series of surrogate and/or non-BMP character code points is + # replaced with exactly one `REPLACEMENT CHARACTER` (Unicode U+FFFD). url_proc = _SURROGATE_OR_NON_BMP_CHARACTERS_SEQ_REGEX.sub('\ufffd', url_proc) url_proc = limit_str(url_proc, char_limit=500) url_proc = PROVISIONAL_URL_SEARCH_KEY_PREFIX + url_proc + + assert isinstance(url_proc, str) return url_proc @@ -652,14 +1050,45 @@ def make_provisional_url_search_key(url_orig): # Non-public local helpers # -def _transcode(url): +def _parse_norm_brief(norm_brief): + opt_seq = tuple(norm_brief) + opts = dict.fromkeys(opt_seq, True) + if len(opts) < len(opt_seq): + duplicates = [opt for opt, n in collections.Counter(opt_seq).items() if n > 1] + raise ValueError( + f"duplicate flags in `norm_brief`: " + f"{', '.join(map(ascii, duplicates))}") + unicode_str = opts.pop('u', False) + merge_surrogate_pairs = opts.pop('m', False) + empty_path_slash = opts.pop('e', False) + remove_ipv6_zone = opts.pop('r', False) + if opts: + raise ValueError( + f"unknown flags in `norm_brief`: " + f"{', '.join(map(ascii, opts))}") + return unicode_str, merge_surrogate_pairs, empty_path_slash, remove_ipv6_zone + + +def _merge_surrogate_pairs(url): if isinstance(url, bytes): - ### FIXME: for byte strings we do not ensure that representation - ### of non-BMP characters is consistent! (probably we should...) - url = url.decode('utf-8', 'utf8_surrogatepass_and_surrogateescape') + try: + decoded = as_unicode(url, 'surrogatepass') + except UnicodeDecodeError: + # here we have *neither* the strict UTF-8 encoding *nor* + # a more "liberal" variant of it that allows surrogates + # -> let's return the given `url` intact + pass + else: + # let's ensure that representation of non-BMP characters is + # consistent (note: any unpaired surrogates are left intact) + with_surrogate_pairs_merged = replace_surrogate_pairs_with_proper_codepoints(decoded) + url = as_bytes(with_surrogate_pairs_merged, 'surrogatepass') + assert isinstance(url, bytes) else: - # to ensure that representation of non-BMP characters is consistent - url = try_to_normalize_surrogate_pairs_to_proper_codepoints(url) + # let's ensure that representation of non-BMP characters is + # consistent (note: any unpaired surrogates are left intact) + url = replace_surrogate_pairs_with_proper_codepoints(url) + assert isinstance(url, str) return url @@ -669,7 +1098,7 @@ def _get_scheme(url): if simple_match is None: return None scheme = simple_match.group('scheme') - assert scheme and is_pure_ascii(scheme) + assert scheme and scheme.isascii() scheme = scheme.lower() return scheme @@ -680,12 +1109,12 @@ def _get_before_host(match): return before_host -def _get_host(match, rmzone): +def _get_host(match, remove_ipv6_zone): assert match.group('host') if match.group('ipv6_addr'): before_ipv6_addr = _get_before_ipv6_addr(match) ipv6_addr = _get_ipv6_addr(match) - after_ipv6_addr = _get_after_ipv6_addr(match, rmzone) + after_ipv6_addr = _get_after_ipv6_addr(match, remove_ipv6_zone) host = before_ipv6_addr + ipv6_addr + after_ipv6_addr else: host = _get_hostname_or_ip(match) @@ -715,7 +1144,7 @@ def _get_ipv6_addr(match): ipv6_addr = conv(ipv6_addr) except ipaddress.AddressValueError: ipv6_addr = match.group('ipv6_addr') - assert is_pure_ascii(ipv6_addr) + assert ipv6_addr.isascii() return ipv6_addr @@ -735,13 +1164,15 @@ def _convert_ipv4_to_ipv6_suffix(ipv6_suffix_in_ipv4_format): return ipv6_suffix -def _get_after_ipv6_addr(match, rmzone): +def _get_after_ipv6_addr(match, remove_ipv6_zone): after_ipv6_addr = match.group('after_ipv6_addr') closing_bracket = _proper_conv(match)(']') assert after_ipv6_addr and after_ipv6_addr.endswith(closing_bracket) - if rmzone: + if remove_ipv6_zone: return closing_bracket - return lower_if_pure_ascii(after_ipv6_addr) + if after_ipv6_addr.isascii(): + return after_ipv6_addr.lower() + return after_ipv6_addr def _get_hostname_or_ip(match): @@ -750,8 +1181,10 @@ def _get_hostname_or_ip(match): sep_regex = (DOMAIN_LABEL_SEPARATOR_UTF8_BYTES_REGEX if isinstance(hostname_or_ip, bytes) else DOMAIN_LABEL_SEPARATOR_REGEX) dot = _proper_conv(match)('.') - return dot.join(lower_if_pure_ascii(label) # <- we do not want to touch non-pure-ASCII labels - for label in sep_regex.split(hostname_or_ip)) + return dot.join( + (label.lower() if label.isascii() # we do not want to touch non-pure-ASCII labels + else label) + for label in sep_regex.split(hostname_or_ip)) def _get_port(match, scheme): @@ -765,10 +1198,10 @@ def _get_port(match, scheme): return port -def _get_path(match, scheme, epslash): +def _get_path(match, scheme, empty_path_slash): conv = _proper_conv(match) path = match.group('path') or conv('') - if (epslash + if (empty_path_slash and as_bytes(scheme) in (b'http', b'https', b'ftp') and not path): path = conv('/') diff --git a/N6Lib/requirements b/N6Lib/requirements index 3e0c1e3..611cd4e 100644 --- a/N6Lib/requirements +++ b/N6Lib/requirements @@ -11,7 +11,7 @@ requests==2.25.1 rt<2.0.0 bcrypt==3.1.7 passlib==1.7.4 -dnspython==1.16.0 +dnspython==2.4 M2Crypto==0.38 MarkupSafe<2.1.0 Jinja2<3.0.0 @@ -20,13 +20,14 @@ PyJWT==1.7.1 python-keycloak==2.5.0 # pip install section -ruamel.yaml<0.17.22 -pymisp==2.4.119.1 geoip2 -more_itertools +importlib_resources>=5.12,<5.13 lxml +more_itertools +pymisp==2.4.119.1 pyotp==2.3.0 PyStemmer==2.0.1 pystempel==1.2.0 pytest==7.1.2 +ruamel.yaml<0.17.22 unittest_expander==0.4.4 diff --git a/N6Portal/react_app/src/components/forms/validation/validators.ts b/N6Portal/react_app/src/components/forms/validation/validators.ts index e2d1f9c..44d376d 100644 --- a/N6Portal/react_app/src/components/forms/validation/validators.ts +++ b/N6Portal/react_app/src/components/forms/validation/validators.ts @@ -148,21 +148,30 @@ export const mustBePortNumber: Validate = (value) => ? true : 'validation_mustBePortNumber'; -const validateIpAddress = (value: string) => { - const parts = value.split('.'); - - const partValid = (part: string) => { - if (part.length > 3 || part.length === 0) return false; - - if (part[0] === '0' && part !== '0') return false; - - if (!part.match(/^\d+$/)) return false; - - const numeric = +part | 0; +const validateIpAddress = (value: string, isPartOfIpNetwork = false) => { + // Examples of IP addresses presumed invalid: + // - Single IP address: 0.0.0.0 (However, 0.0.0.0/0-32 is accepted as a valid IP network range) + // - IP addresses with leading zeros: 1.1.1.01 + // - IP addresses exceeding the standard value range: 1.1.1.256 + // - IP addresses with excessive digit count: 1.1.1.1111 + // - IP addresses with too many segments: 1.1.1.1.1 + + const octets = value.split('.'); + if (octets.length !== 4) return false; + + // The IP address 0.0.0.0 is accepted as a valid network address in CIDR notation + // (e.g., 0.0.0.0/24), but is not permissible as an individual, standalone IP address. + if (octets.every((octet) => octet === '0') && !isPartOfIpNetwork) return false; + const octetValid = (octet: string) => { + if (octet.length > 3 || octet.length === 0) return false; + if (octet.startsWith('0') && octet.length > 1) return false; + if (!octet.match(/^\d+$/)) return false; + + const numeric = Number(octet); return numeric >= 0 && numeric <= 255; }; - return parts.length === 4 && parts.every(partValid); + return octets.every(octetValid); }; export const mustBeIpNetwork: Validate = (value) => { @@ -173,7 +182,7 @@ export const mustBeIpNetwork: Validate = (value) => { const checkIpNetwork = (value: string) => { const result = value.match(ipNetworkRegex); - return result !== null && validateIpAddress(result[1]) && validateMask(result[2]); + return result !== null && validateIpAddress(result[1], true) && validateMask(result[2]); }; return !value || (typeof value === 'string' && checkIpNetwork(value)) ? true : 'validation_mustBeIpNetwork'; diff --git a/N6Portal/react_app/src/components/pages/account/Account.tsx b/N6Portal/react_app/src/components/pages/account/Account.tsx index 72e94e8..f3aafc2 100644 --- a/N6Portal/react_app/src/components/pages/account/Account.tsx +++ b/N6Portal/react_app/src/components/pages/account/Account.tsx @@ -70,7 +70,7 @@ const Account: FC = () => {
-

{messages['account_email_notigications']}

+

{messages['account_email_notifications']}

{data.email_notifications.email_notification_language && (
diff --git a/N6Portal/react_app/src/dictionary/index.ts b/N6Portal/react_app/src/dictionary/index.ts index e14f0bf..5218cc9 100644 --- a/N6Portal/react_app/src/dictionary/index.ts +++ b/N6Portal/react_app/src/dictionary/index.ts @@ -94,7 +94,7 @@ export const dictionary = { account_user_id: 'Login', account_org_id: 'Domena organizacji', account_available_resources: 'Dostępne zasoby', - account_email_notigications: 'Powiadomienia e-mail', + account_email_notifications: 'Powiadomienia e-mail', account_email_notification_language: 'Język powiadomień', account_email_notification_addresses: 'Adresy e-mail dla powiadomień', account_email_notification_times: 'Godziny powiadomień', @@ -469,14 +469,14 @@ export const dictionary = { account_user_id: 'User login', account_org_id: 'User organization', account_available_resources: 'Available resources', - account_email_notigications: 'E-mail notification settings', + account_email_notifications: 'E-mail notification settings', account_email_notification_language: 'Notification language', account_email_notification_addresses: 'Notification addresses', account_email_notification_times: 'Notification times', account_email_notification_business_days_only: 'Notifications on business days only', account_yes: 'Yes', account_no: 'No', - account_inside_criteria: ",Inside' resource events criteria", + account_inside_criteria: "'Inside' resource events criteria", account_asn_seq: 'ASN filter', account_cc_seq: 'CC filter', account_fqdn_seq: 'FQDN filter', diff --git a/N6SDK/n6sdk/_api_test_tool/api_test_tool.py b/N6SDK/n6sdk/_api_test_tool/api_test_tool.py index f07d964..3849625 100644 --- a/N6SDK/n6sdk/_api_test_tool/api_test_tool.py +++ b/N6SDK/n6sdk/_api_test_tool/api_test_tool.py @@ -1,6 +1,6 @@ #!/usr/bin/env python -# Copyright (c) 2015-2021 NASK. All rights reserved. +# Copyright (c) 2015-2023 NASK. All rights reserved. """ This tool is a part of *n6sdk*. It can analyse and verify an @@ -14,11 +14,11 @@ import random import sys from collections import defaultdict +from importlib.resources import files from urllib.parse import urlencode, urlparse import requests import requests.packages.urllib3 -from pkg_resources import Requirement, resource_filename, cleanup_resources from n6sdk._api_test_tool.client import APIClient from n6sdk._api_test_tool.data_test import DataSpecTest @@ -30,14 +30,9 @@ def iter_config_base_lines(): - try: - filename = resource_filename(Requirement.parse('n6sdk'), - 'n6sdk/_api_test_tool/config_base.ini') - with open(filename, 'rb') as f: - for line in f.read().splitlines(): - yield line.decode('utf-8') - finally: - cleanup_resources() + with files('n6sdk').joinpath('_api_test_tool/config_base.ini').open('rb') as f: + for line in f.read().splitlines(): + yield line.decode('utf-8') def get_config(path): config = configparser.RawConfigParser() @@ -161,8 +156,8 @@ def main(): optional_params_keys = data_range.keys() - constant_params.keys() optional_params_keys = ds_test.all_param_keys.intersection(optional_params_keys) for _ in range(MAX_RETRY): - rand_keys = random.sample(optional_params_keys, 2) - rand_vals = (random.sample(data_range[val], 1)[0] for val in rand_keys) + rand_keys = random.sample(list(optional_params_keys), 2) + rand_vals = (random.sample(list(data_range[val]), 1)[0] for val in rand_keys) optional_params = dict(zip(rand_keys, rand_vals)) legal_query_url = make_url(base_url, constant_params, optional_params) test_legal_ok = True @@ -200,7 +195,7 @@ def main(): illegal_query_urls = [] illegal_keys = data_range.keys() - ds_test.all_param_keys - composed_keys illegal_keys = illegal_keys.difference(additional_attributes) - illegal_vals = (random.sample(data_range[val], 1)[0] for val in illegal_keys) + illegal_vals = (random.sample(list(data_range[val]), 1)[0] for val in illegal_keys) illegal_params = dict(zip(illegal_keys, illegal_vals)) for key, val in illegal_params.items(): @@ -243,7 +238,7 @@ def main(): for optional_key in optional_params_keys: if len(data_range[optional_key]) >= MINIMUM_VALUE_NUMBER: keys_list.append(optional_key) - rand_val = random.sample(data_range[optional_key], 1)[0] + rand_val = random.sample(list(data_range[optional_key]), 1)[0] opt_param = {optional_key: rand_val} legal_query_url = (make_url(base_url, constant_params, opt_param)) if args.verbose: @@ -270,7 +265,7 @@ def main(): report.section("Testing queries with a LEGAL param, using different values", 5) test_key = random.choice(keys_list) - random_val_list = random.sample(data_range[test_key], MINIMUM_VALUE_NUMBER) + random_val_list = random.sample(list(data_range[test_key]), MINIMUM_VALUE_NUMBER) test_list_legal_ok = True for test_value in random_val_list: opt_param = {test_key: test_value} diff --git a/N6SDK/n6sdk/addr_helpers.py b/N6SDK/n6sdk/addr_helpers.py index 5bc1e0b..2f6e07c 100644 --- a/N6SDK/n6sdk/addr_helpers.py +++ b/N6SDK/n6sdk/addr_helpers.py @@ -1,4 +1,4 @@ -# Copyright (c) 2013-2021 NASK. All rights reserved. +# Copyright (c) 2013-2023 NASK. All rights reserved. import logging import socket @@ -17,8 +17,7 @@ LOGGER = logging.getLogger(__name__) -def ip_network_as_tuple(ip_network_str): - # type: (str) -> tuple[str, int] +def ip_network_as_tuple(ip_network_str: str) -> tuple[str, int]: """ >>> ip_network_as_tuple('10.20.30.40/24') ('10.20.30.40', 24) @@ -28,8 +27,10 @@ def ip_network_as_tuple(ip_network_str): return ip_str, prefixlen -def ip_network_tuple_to_min_max_ip(ip_network_tuple): - # type: (tuple[str, int]) -> tuple[int, int] +def ip_network_tuple_to_min_max_ip(ip_network_tuple: tuple[str, int], + *, + force_min_ip_greater_than_zero: bool = False, + ) -> tuple[int, int]: """ >>> ip_network_tuple_to_min_max_ip(('10.20.30.41', 24)) (169090560, 169090815) @@ -37,16 +38,67 @@ def ip_network_tuple_to_min_max_ip(ip_network_tuple): (169090601, 169090601) >>> ip_network_tuple_to_min_max_ip(('10.20.30.41', 0)) (0, 4294967295) + >>> ip_network_tuple_to_min_max_ip(('0.0.0.0', 0)) + (0, 4294967295) + >>> ip_network_tuple_to_min_max_ip(('255.255.255.255', 0)) + (0, 4294967295) + >>> ip_network_tuple_to_min_max_ip(('0.0.2.0', 24)) + (512, 767) + >>> ip_network_tuple_to_min_max_ip(('0.0.1.0', 24)) + (256, 511) + >>> ip_network_tuple_to_min_max_ip(('0.0.0.0', 24)) + (0, 255) + >>> ip_network_tuple_to_min_max_ip(('0.0.0.2', 32)) + (2, 2) + >>> ip_network_tuple_to_min_max_ip(('0.0.0.1', 32)) + (1, 1) + >>> ip_network_tuple_to_min_max_ip(('0.0.0.0', 32)) + (0, 0) + + >>> ip_network_tuple_to_min_max_ip(('10.20.30.41', 24), + ... force_min_ip_greater_than_zero=True) + (169090560, 169090815) + >>> ip_network_tuple_to_min_max_ip(('10.20.30.41', 32), + ... force_min_ip_greater_than_zero=True) + (169090601, 169090601) + >>> ip_network_tuple_to_min_max_ip(('10.20.30.41', 0), + ... force_min_ip_greater_than_zero=True) # min IP forced to 1 + (1, 4294967295) + >>> ip_network_tuple_to_min_max_ip(('0.0.0.0', 0), + ... force_min_ip_greater_than_zero=True) # min IP forced to 1 + (1, 4294967295) + >>> ip_network_tuple_to_min_max_ip(('255.255.255.255', 0), + ... force_min_ip_greater_than_zero=True) # min IP forced to 1 + (1, 4294967295) + >>> ip_network_tuple_to_min_max_ip(('0.0.2.0', 24), + ... force_min_ip_greater_than_zero=True) + (512, 767) + >>> ip_network_tuple_to_min_max_ip(('0.0.1.0', 24), + ... force_min_ip_greater_than_zero=True) + (256, 511) + >>> ip_network_tuple_to_min_max_ip(('0.0.0.0', 24), + ... force_min_ip_greater_than_zero=True) # min IP forced to 1 + (1, 255) + >>> ip_network_tuple_to_min_max_ip(('0.0.0.2', 32), + ... force_min_ip_greater_than_zero=True) + (2, 2) + >>> ip_network_tuple_to_min_max_ip(('0.0.0.1', 32), + ... force_min_ip_greater_than_zero=True) + (1, 1) + >>> ip_network_tuple_to_min_max_ip(('0.0.0.0', 32), + ... force_min_ip_greater_than_zero=True) # min IP forced to 1 + (1, 0) """ ip_str, prefixlen = ip_network_tuple ip_int = ip_str_to_int(ip_str) min_ip = (((1 << prefixlen) - 1) << (32 - prefixlen)) & ip_int + if force_min_ip_greater_than_zero and min_ip <= 0: + min_ip = 1 max_ip = (((1 << (32 - prefixlen)) - 1)) | ip_int return min_ip, max_ip -def ip_str_to_int(ip_str): - # type: (str) -> int +def ip_str_to_int(ip_str: str) -> int: """ >>> ip_str_to_int('10.20.30.41') 169090601 diff --git a/N6SDK/n6sdk/encoding_helpers.py b/N6SDK/n6sdk/encoding_helpers.py index 4222551..6c642cc 100644 --- a/N6SDK/n6sdk/encoding_helpers.py +++ b/N6SDK/n6sdk/encoding_helpers.py @@ -1,4 +1,4 @@ -# Copyright (c) 2013-2021 NASK. All rights reserved. +# Copyright (c) 2013-2023 NASK. All rights reserved. # # For some parts of the source code of the # `provide_custom_unicode_error_handlers()` function: @@ -233,8 +233,8 @@ def as_unicode(obj, decode_error_handling='strict'): # TODO: rename to `as_str` ... def __bytes__(self): return b'never used' ... def __repr__(self): return 'foo' ... - >>> as_unicode(Hard()) - 'foo' + >>> as_unicode(Hard()) == 'foo' + True """ if isinstance(obj, memoryview): @@ -249,6 +249,84 @@ def as_unicode(obj, decode_error_handling='strict'): # TODO: rename to `as_str` return s +def as_str_with_minimum_esc(obj): + r""" + Similar to `as_unicode`, except that: + + * if a :class:`bytes`/:class:`bytearray`/:class:`memoryview` + object is given then it is always decoded to `str` using the + `backslashreplace` error handler, i.e., any non-UTF-8 bytes + (including those belonging to encoded surrogate codepoints) + are escaped using the `\x...` notation; + + * if a :class:`str` or any other object is given then the string + obtained by calling `as_unicode()` is additionally processed in + such a way that any surrogates it contains are escaped using the + `\u...` notation; + + * additionaly, in both cases, any backslashes present in the + original text are escaped -- by doubling them (i.e., each `\` + becomes `\\`). + + >>> as_str_with_minimum_esc('') + '' + >>> as_str_with_minimum_esc(b'') + '' + + >>> as_str_with_minimum_esc('\\Wy\u0142\xf3w \udcdd!') == '\\\\Wy\u0142\xf3w \\udcdd!' + True + >>> as_str_with_minimum_esc(ValueError('Wy\u0142\xf3w \udcdd!')) == 'Wy\u0142\xf3w \\udcdd!' + True + + >>> as_str_with_minimum_esc(b'\\Wy\xc5\x82\xc3\xb3w \xdd!') == '\\\\Wy\u0142\xf3w \\xdd!' + True + >>> as_str_with_minimum_esc(bytearray(b'\xc5\x82\xc3\xb3w \xdd!')) == '\u0142\xf3w \\xdd!' + True + >>> as_str_with_minimum_esc(memoryview(b'\\Wy \xdd!')) == '\\\\Wy \\xdd!' + True + + >>> as_str_with_minimum_esc(42) == '42' + True + >>> as_str_with_minimum_esc([{True: bytearray(b'abc')}]) == "[{True: bytearray(b'abc')}]" + True + >>> as_str_with_minimum_esc('\\') == r'\\' + True + >>> as_str_with_minimum_esc(b'\\') == r'\\' + True + >>> as_str_with_minimum_esc(['\\']) == r"['\\\\']" + True + >>> as_str_with_minimum_esc([b'\\']) == r"[b'\\\\']" + True + >>> str([''' '" ''']) == r'''[' \'" ']''' + True + >>> as_str_with_minimum_esc([''' '" ''']) == r'''[' \\'" ']''' + True + >>> str([b''' '" ''']) == r'''[b' \'" ']''' + True + >>> as_str_with_minimum_esc([b''' '" ''']) == r'''[b' \\'" ']''' + True + + >>> class Hard(object): + ... def __str__(self): raise UnicodeError + ... def __bytes__(self): return b'never used' + ... def __repr__(self): return 'foo \udcdd bar \\ spam \udbff\udfff' + ... + >>> as_str_with_minimum_esc(Hard()) == 'foo \\udcdd bar \\\\ spam \\udbff\\udfff' + True + """ + if isinstance(obj, memoryview): + obj = bytes(obj) + if isinstance(obj, (bytes, bytearray)): + bin = obj.replace(b'\\', b'\\\\') + s = bin.decode('utf-8', 'backslashreplace') + else: + s = as_unicode(obj) + s = s.replace('\\', '\\\\') + bin = s.encode('utf-8', 'backslashreplace') + s = bin.decode('utf-8') + return s + + _ASCII_PY_IDENTIFIER_INVALID_CHAR = re.compile(r'[^0-9a-zA-Z_]', re.ASCII) def ascii_py_identifier_str(obj): @@ -359,34 +437,65 @@ def ascii_py_identifier_str(obj): return s -def try_to_normalize_surrogate_pairs_to_proper_codepoints(s): +def replace_surrogate_pairs_with_proper_codepoints(s): r""" - Do our best to ensure that representation of non-BMP characters - is consistent. - - >>> s = '\ud800' + '\udfff' - >>> s - '\ud800\udfff' - >>> try_to_normalize_surrogate_pairs_to_proper_codepoints(s) - '\U000103ff' - - >>> s = '\U000103ff' - >>> s - '\U000103ff' - >>> try_to_normalize_surrogate_pairs_to_proper_codepoints(s) - '\U000103ff' - - Lone surrogates are left intact: - - >>> s = '\ud800' - >>> s - '\ud800' - >>> try_to_normalize_surrogate_pairs_to_proper_codepoints(s) - '\ud800' + Make representation of non-BMP characters consistent, by replacing + surrogate pairs with the actual corresponding non-BMP codepoints. + Lone (unpaired) surrogates are left intact. + + >>> s = '\ud83c' + '\udf40' + >>> print(ascii(s)) + '\ud83c\udf40' + >>> res = replace_surrogate_pairs_with_proper_codepoints(s) + >>> print(ascii(res)) + '\U0001f340' + + >>> s = '\ud800' + '\udc00' + '\ud800' + '\udfff' + '\udbff' + '\udc00' + '\udbff' + '\udfff' + >>> print(ascii(s)) + '\ud800\udc00\ud800\udfff\udbff\udc00\udbff\udfff' + >>> res = replace_surrogate_pairs_with_proper_codepoints(s) + >>> print(ascii(res)) + '\U00010000\U000103ff\U0010fc00\U0010ffff' + + Any already-non-BMP codepoints are, obviously, left intact: + + >>> s = '\U0001f340' + >>> print(ascii(s)) + '\U0001f340' + >>> res = replace_surrogate_pairs_with_proper_codepoints(s) + >>> print(ascii(res)) + '\U0001f340' + + Lone/not-properly-paired surrogates are left intact we well: + + >>> s = '\ud83c' + >>> print(ascii(s)) + '\ud83c' + >>> res = replace_surrogate_pairs_with_proper_codepoints(s) + >>> print(ascii(res)) + '\ud83c' + + >>> s = '\udf40' + '\ud83c' # not a proper surrogate pair (wrong order) + >>> print(ascii(s)) + '\udf40\ud83c' + >>> res = replace_surrogate_pairs_with_proper_codepoints(s) + >>> print(ascii(res)) + '\udf40\ud83c' + + >>> s = 'asdfghj' + '\ud83c' + '\udf40' + '\udf40' + '\ud83c' + '\U0001f340' + 'qwertyu' + >>> print(ascii(s)) + 'asdfghj\ud83c\udf40\udf40\ud83c\U0001f340qwertyu' + >>> res = replace_surrogate_pairs_with_proper_codepoints(s) + >>> print(ascii(res)) + 'asdfghj\U0001f340\udf40\ud83c\U0001f340qwertyu' """ if not isinstance(s, str): - raise TypeError('{!a} is not a `str`'.format(s)) - return s.encode('utf-16', 'surrogatepass').decode('utf-16', 'surrogatepass') + raise TypeError(f'{s!a} is not a `str`') + res = s.encode('utf-16', 'surrogatepass').decode('utf-16', 'surrogatepass') + assert not _PROPER_SURROGATE_PAIR.search(res) + return res + +_PROPER_SURROGATE_PAIR = re.compile('[\uD800-\uDBFF][\uDC00-\uDFFF]') def provide_custom_unicode_error_handlers( @@ -503,7 +612,7 @@ def provide_custom_unicode_error_handlers( ... u'\udced\udca0' # mess converted to surrogates ... u'\x7f' # proper code point (ascii DEL) ... u'\ud800' # surrogate '\ud800' (smallest one) - ... u'\udfff' # surrogate '\udfff' (biggest one) + ... u'\udfff' # surrogate '\udfff' (biggest one) [note: *not* merged with one above] ... u'\udcee\udcbf\udcc0' # mess converted to surrogates ... u'\ue000' # proper code point '\ue000' (bigger than biggest surrogate) ... u'\udce6' # mess converted to surrogate diff --git a/N6SDK/n6sdk/exceptions.py b/N6SDK/n6sdk/exceptions.py index 53084e3..5efbefb 100644 --- a/N6SDK/n6sdk/exceptions.py +++ b/N6SDK/n6sdk/exceptions.py @@ -1,4 +1,4 @@ -# Copyright (c) 2013-2021 NASK. All rights reserved. +# Copyright (c) 2013-2023 NASK. All rights reserved. import collections.abc as collections_abc @@ -190,19 +190,19 @@ class FieldValueTooLongError(FieldValueError): ... checked_value=['foo'], max_length=42) Traceback (most recent call last): ... - TypeError: __init__() missing 1 required keyword-only argument: 'field' + TypeError: ...__init__() missing 1 required keyword-only argument: 'field' >>> FieldValueTooLongError( # doctest: +ELLIPSIS ... field='sth', max_length=42) Traceback (most recent call last): ... - TypeError: __init__() missing 1 required keyword-only argument: 'checked_value' + TypeError: ...__init__() missing 1 required keyword-only argument: 'checked_value' >>> FieldValueTooLongError( # doctest: +ELLIPSIS ... field='sth', checked_value=['foo']) Traceback (most recent call last): ... - TypeError: __init__() missing 1 required keyword-only argument: 'max_length' + TypeError: ...__init__() missing 1 required keyword-only argument: 'max_length' """ def __init__(self, *args, diff --git a/N6SDK/n6sdk/tests/test_data_spec_fields.py b/N6SDK/n6sdk/tests/test_data_spec_fields.py index b0638d6..00e38df 100644 --- a/N6SDK/n6sdk/tests/test_data_spec_fields.py +++ b/N6SDK/n6sdk/tests/test_data_spec_fields.py @@ -1,4 +1,4 @@ -# Copyright (c) 2013-2021 NASK. All rights reserved. +# Copyright (c) 2013-2023 NASK. All rights reserved. import collections import copy @@ -2321,8 +2321,8 @@ def cases__clean_param_value(self): expected='123.45.67.8', ) yield case( - given='0.0.0.0', - expected='0.0.0.0', + given='0.0.0.1', + expected='0.0.0.1', ) yield case( given='255.255.255.255', @@ -2411,8 +2411,8 @@ def cases__clean_result_value(self): expected='123.45.67.8', ) yield case( - given='0.0.0.0', - expected='0.0.0.0', + given='0.0.0.1', + expected='0.0.0.1', ) yield case( given=bytearray(b'255.255.255.255'), @@ -4399,7 +4399,7 @@ def cases__clean_result_value(self): '\udced\udca0' # mess converted to surrogates '\x7f' # proper code point (ascii DEL) '\ud800' # surrogate '\ud800' (smallest one) - '\udfff' # surrogate '\udfff' (biggest one) + '\udfff' # surrogate '\udfff' (biggest one) [note: *not* merged with one above] '\udcee\udcbf\udcc0' # mess converted to surrogates '\ue000' # proper code point '\ue000' (bigger than biggest surr.) '\udce6' # mess converted to surrogate diff --git a/README.md b/README.md index 7e0625f..55c999b 100644 --- a/README.md +++ b/README.md @@ -1,6 +1,6 @@ # n6 -[*n6* (Network Security Incident eXchange)](https://www.cert.pl/en/posts/2018/06/n6-released-as-open-source/) +[*n6* (Network Security Incident eXchange)](https://cert.pl/en/n6/) is a system to collect, manage and distribute security information on a large scale. Distribution is realized through a simple REST API and a [web interface](https://n6portal.cert.pl/) that authorized users can use @@ -13,7 +13,7 @@ and incidents in their networks. - **Home page:** [github.com/CERT-Polska/n6](https://github.com/CERT-Polska/n6) - **Documentation:** [n6.readthedocs.io](https://n6.readthedocs.io) -The project is developed for [CERT Polska](https://www.cert.pl/en/). +The project is developed by [CERT Polska](https://www.cert.pl/en/). Contact us via e-mail: [n6@cert.pl](mailto:n6@cert.pl). diff --git a/docker/base/Dockerfile b/docker/base/Dockerfile index 4bbf7aa..a6f7777 100644 --- a/docker/base/Dockerfile +++ b/docker/base/Dockerfile @@ -10,6 +10,30 @@ ARG apt-proxy-nask # TODO: get rid of, no longer necessary, Python-2-related stuff... +RUN \ + apt-get update && \ + apt-get install -y \ + build-essential \ + ca-certificates \ + gnupg \ + curl; + +RUN \ + # Create a directory for the new repository's keyring, if it doesn't exist + mkdir -p /etc/apt/keyrings; + +RUN \ + # Download the new repository's GPG key and save it in the keyring directory + curl -fsSL https://deb.nodesource.com/gpgkey/nodesource-repo.gpg.key | gpg --dearmor -o /etc/apt/keyrings/nodesource.gpg; + +RUN \ + echo "deb [signed-by=/etc/apt/keyrings/nodesource.gpg] https://deb.nodesource.com/node_16.x nodistro main" \ + | tee /etc/apt/sources.list.d/nodesource.list; + +RUN \ + apt-get update && \ + apt-get install -y nodejs; + RUN \ # install base dependencies echo 'Acquire::Retries "5";' > /etc/apt/apt.conf.d/99AcquireRetries; \ @@ -20,6 +44,7 @@ RUN \ apt-get update && \ apt-get install -y \ apache2 \ + inotify-tools \ build-essential \ curl \ default-libmysqlclient-dev \ @@ -55,8 +80,6 @@ RUN \ libreadline-dev \ libsqlite3-dev \ libbz2-dev && \ - curl -fsSL https://deb.nodesource.com/setup_14.x | bash -; \ - apt-get install -y nodejs; \ npm install -g npm@latest && \ npm install node-sass && \ bash -c "echo 'ServerName localhost' >> /etc/apache2/apache2.conf"; \ @@ -116,8 +139,7 @@ RUN set -ex; \ virtualenv --python=/usr/bin/python2.7 env; \ . env/bin/activate; \ pip install --upgrade pip -i https://pypi.python.org/simple/; \ - pip install --upgrade 'setuptools<45.0.0'; \ - pip install --upgrade wheel; \ + pip install --upgrade 'setuptools<45.0.0' wheel; \ # workaround against crash during normal install of httplib2 (needed by some test tools...) wget https://files.pythonhosted.org/packages/92/92/478727070c62def583e645ceeba18e69df266bf78e11639bc787c2386421/httplib2-0.20.1.tar.gz; \ tar xf httplib2-0.20.1.tar.gz; \ @@ -149,8 +171,8 @@ RUN \ python3.9 -m venv env_py3k; \ . env_py3k/bin/activate; \ pip install --upgrade pip -i https://pypi.python.org/simple/; \ - pip install --upgrade setuptools; \ - pip install --upgrade wheel; \ + # setuptools is locked as a part of a temporary workaround (TODO: fix it when #7792 is done + see: #8823) + pip install --upgrade 'setuptools<68.0.0' wheel; \ # install tools for n6 tests pip install --no-cache-dir \ unittest_expander==0.4.4 \ diff --git a/docker/mysql/Dockerfile b/docker/mysql/Dockerfile index ed11134..6dcd815 100644 --- a/docker/mysql/Dockerfile +++ b/docker/mysql/Dockerfile @@ -14,8 +14,6 @@ RUN apt-get update && \ mariadb-plugin-tokudb \ libjemalloc1 -#Fixes db creation condition in line 181 -RUN sed -i.bak 's%"$DATADIR/mysql"%"$DATADIR/mysql/$@"%g' /usr/local/bin/docker-entrypoint.sh RUN ["/bin/bash", "-c", "rm -rf /etc/mysql/conf.d/*"] COPY ./etc/mysql/conf.d/mariadb.cnf /etc/mysql/conf.d/mariadb.cnf diff --git a/docs/src/install_and_conf/docker.md b/docs/src/install_and_conf/docker.md index 52d688a..a2e7c51 100644 --- a/docs/src/install_and_conf/docker.md +++ b/docs/src/install_and_conf/docker.md @@ -58,9 +58,7 @@ that it will work on other systems such as Windows or Mac OS. !!! note - Make sure you are using _Docker-Compose_, not _Docker Compose_ - as n6 currently works on _Docker-Compose v1_ not v2. - For more information check [this migration note](https://docs.docker.com/compose/migrate/) + _n6_ currently works with _Docker-Compose_. The project should build just fine with _Docker Compose_, but at this moment, we cannot guarantee that everything works on _Docker Compose V2_. ## Building the environment diff --git a/docs/src/install_and_conf/step_by_step/pipeline_config.md b/docs/src/install_and_conf/step_by_step/pipeline_config.md index 4c92a65..30eb10c 100644 --- a/docs/src/install_and_conf/step_by_step/pipeline_config.md +++ b/docs/src/install_and_conf/step_by_step/pipeline_config.md @@ -20,11 +20,11 @@ The configuration files should have been created in `/home/dataman/.n6`. ```bash $ ls /home/dataman/.n6/ -00_global.conf 02_archiveraw.conf 07_aggregator.conf 09_auth_db.conf -10_generator_stream_api.conf 11_mailing.conf 23_filter.conf 25_splunk_emitter.conf -00_pipeline.conf 05_enrich.conf 07_comparator.conf 10_generator_rest_api.conf -11_jinja_rendering.conf 21_recorder.conf 60_abuse_ch.conf 60_misp.conf -60_spam404_com.conf [...] logging.conf +00_global.conf 00_pipeline.conf 02_archiveraw.conf 05_enrich.conf 07_aggregator.conf +[...] +60_abuse_ch.conf 60_amqp.conf 60_cert_pl.conf 60_cesnet_cz.conf 60_dan_tv.conf +[...] +logging.conf ``` ## Logging diff --git a/etc/mysql/initdb/1_create_tables.sql b/etc/mysql/initdb/1_create_tables.sql index aea7280..178e3a8 100644 --- a/etc/mysql/initdb/1_create_tables.sql +++ b/etc/mysql/initdb/1_create_tables.sql @@ -12,7 +12,7 @@ CREATE TABLE n6.event ( origin ENUM('c2','dropzone','proxy','p2p-crawler','p2p-drone','sinkhole','sandbox','honeypot','darknet','av','ids','waf'), restriction ENUM('public','need-to-know','internal') NOT NULL, confidence ENUM('low','medium','high') NOT NULL, - category ENUM('bots','cnc','dos-victim','malurl','phish','proxy','sandbox-url','scanning','server-exploit','spam','other','spam-url','amplifier','tor','dos-attacker','vulnerable','backdoor','dns-query','flow','flow-anomaly','fraud','leak','webinject','malware-action','deface', 'scam') NOT NULL, + category ENUM('bots','cnc','dos-victim','malurl','phish','proxy','sandbox-url','scanning','server-exploit','spam','other','spam-url','amplifier','tor','dos-attacker','vulnerable','backdoor','dns-query','flow','flow-anomaly','fraud','leak','webinject','malware-action','deface','scam') NOT NULL, time DATETIME NOT NULL, name VARCHAR(255), md5 BINARY(16), @@ -52,7 +52,6 @@ CREATE TABLE n6.event ( PARTITION p_max VALUES LESS THAN MAXVALUE ); - CREATE TABLE n6.client_to_event ( id Binary(16) NOT NULL, client VARCHAR(32), diff --git a/etc/n6/00_global.conf b/etc/n6/00_global.conf index f6ac8c5..bc8c7db 100644 --- a/etc/n6/00_global.conf +++ b/etc/n6/00_global.conf @@ -1,19 +1,19 @@ [rabbitmq] -host=rabbit +host = rabbit # `url` is a deprecated (and generally not used) legacy alias for `host` -url=%(host)s -port=5671 +url = %(host)s +port = 5671 # if you want to use SSL, the `ssl` option must be set to 1 and the # following options must be set to appropriate file paths: -ssl=1 -path_to_cert=~/certs -ssl_ca_certs=%(path_to_cert)s/n6-CA/cacert.pem -ssl_certfile=%(path_to_cert)s/cert.pem -ssl_keyfile=%(path_to_cert)s/key.pem +ssl = 1 +path_to_cert = ~/certs +ssl_ca_certs = %(path_to_cert)s/n6-CA/cacert.pem +ssl_certfile = %(path_to_cert)s/cert.pem +ssl_keyfile = %(path_to_cert)s/key.pem # AMQP heartbeat interval for most of the components -heartbeat_interval=30 +heartbeat_interval = 30 # AMQP heartbeat interval for parser components -heartbeat_interval_parsers=600 +heartbeat_interval_parsers = 600 diff --git a/etc/n6/21_recorder.conf b/etc/n6/21_recorder.conf index ca56de8..4c31604 100644 --- a/etc/n6/21_recorder.conf +++ b/etc/n6/21_recorder.conf @@ -10,3 +10,10 @@ uri = mysql://root:password@mysql/n6 ;echo = 0 ;wait_timeout = 28800 + +# Which database API exceptions' error codes should be considered *fatal*, +# i.e., should make the n6recorder script requeue the AMQP input message +# and then immediately exit with a non-zero status (by default, only one +# error code is considered *fatal*: 1021 which represents the *disk full* +# condition -- see: https://mariadb.com/kb/en/mariadb-error-codes/). +;fatal_db_api_error_codes = 1021, diff --git a/etc/n6/60_misp.conf b/etc/n6/60_misp.conf index 436fdb6..d98092d 100644 --- a/etc/n6/60_misp.conf +++ b/etc/n6/60_misp.conf @@ -74,7 +74,7 @@ days_for_first_run = 15 # A standard collector-state-loading-and-saving-related setting; # its default value value should be OK in nearly all cases. -;state_dir = ~/.n6state :: path +;state_dir = ~/.n6state diff --git a/etc/n6/60_spam404_com.conf b/etc/n6/60_spam404_com.conf index 2ecda0b..b7dfff2 100644 --- a/etc/n6/60_spam404_com.conf +++ b/etc/n6/60_spam404_com.conf @@ -1,4 +1,4 @@ -# collectors +# collector [Spam404ComScamListBlCollector] url = https://raw.githubusercontent.com/Dawsey21/Lists/master/main-blacklist.txt diff --git a/etc/ssl/generate_ca.sh b/etc/ssl/generate_ca.sh index 52598be..1eed76b 100755 --- a/etc/ssl/generate_ca.sh +++ b/etc/ssl/generate_ca.sh @@ -5,9 +5,9 @@ set -ex DAYS=1365 OPENSSL_CNF=openssl.cnf -mkdir -p n6-CA/certs n6-CA/private -touch n6-CA/index.txt -touch n6-CA/index.txt.attr +mkdir -p generated_certs/n6-CA/certs generated_certs/n6-CA/private +touch generated_certs/n6-CA/index.txt +touch generated_certs/n6-CA/index.txt.attr # use the command below to generate a new CA file -openssl req -x509 -config $OPENSSL_CNF -newkey rsa:2048 -days $DAYS -out n6-CA/cacert.pem -outform PEM -subj /CN=n6-CA/ -nodes +openssl req -x509 -config $OPENSSL_CNF -newkey rsa:2048 -days $DAYS -out generated_certs/n6-CA/cacert.pem -outform PEM -subj /CN=n6-CA/ -nodes diff --git a/etc/ssl/generate_certs.sh b/etc/ssl/generate_certs.sh index c6211ab..eb3ec01 100755 --- a/etc/ssl/generate_certs.sh +++ b/etc/ssl/generate_certs.sh @@ -9,8 +9,8 @@ OPENSSL_CNF=openssl.cnf # increment the serial number if the certificate with the same serial # number already exists -echo 12 > n6-CA/serial +echo 12 > generated_certs/n6-CA/serial -openssl genrsa -out key.pem 2048 -openssl req -new -key key.pem -out req.csr -outform PEM -subj /CN=$CN/O=$ORG/ -nodes -openssl ca -config $OPENSSL_CNF -in req.csr -out cert.pem -days $DAYS -notext -batch -extensions server_and_client_ca_extensions +openssl genrsa -out generated_certs/key.pem 2048 +openssl req -new -key generated_certs/key.pem -out generated_certs/req.csr -outform PEM -subj /CN=$CN/O=$ORG/ -nodes +openssl ca -config $OPENSSL_CNF -in generated_certs/req.csr -out generated_certs/cert.pem -days $DAYS -notext -batch -extensions server_and_client_ca_extensions diff --git a/etc/ssl/generated_certs/cert.pem b/etc/ssl/generated_certs/cert.pem index 7744c20..08dc25d 100644 --- a/etc/ssl/generated_certs/cert.pem +++ b/etc/ssl/generated_certs/cert.pem @@ -1,18 +1,18 @@ -----BEGIN CERTIFICATE----- MIIC9jCCAd6gAwIBAgIBEjANBgkqhkiG9w0BAQsFADAQMQ4wDAYDVQQDDAVuNi1D -QTAeFw0xOTExMjYxNTE4NDVaFw0yMzA4MjIxNTE4NDVaMDIxGjAYBgNVBAMMEWxv +QTAeFw0yMzA4MjUxMTMxMjlaFw0yNzA1MjExMTMxMjlaMDIxGjAYBgNVBAMMEWxv Z2luQGV4YW1wbGUuY29tMRQwEgYDVQQKDAtleGFtcGxlLmNvbTCCASIwDQYJKoZI -hvcNAQEBBQADggEPADCCAQoCggEBAPGwKSn3AwRNcvv+KQvVaaQTwsKvjPEdYmst -HweFvCJ+U7nquIQRs/HdqmNgUHGXH2jXWwGTRU0i/H+qmQCKaPLCI+mcvu57Cbu6 -LXIsvbO3h0GX+UZBGUrt2xKALKuMDQDuNCOYgEhlNgi4RmYkLBRZUwsY5/j5x8MY -Tk9wbQPdw/t8U6JhnEdBNBLZsGFCsBvwqdTmQYmromIeFef4Q5AfVkHgx/6QuqGR -QVGn9smv7tdS5HmKof+g8p+avB2kYuDqLtg/v5rjOZ6SEYeRCzZIpFFwM2ANC7ji -ZOZzw1CjsaQLkOwGqa9BvNuMBRo4dkDg7odLxaMxbFIonMOwDlcCAwEAAaM5MDcw +hvcNAQEBBQADggEPADCCAQoCggEBAL3HcpserlIdt46ssyHQNuFFOX3ujkY2KEYT +acQPgBp7tk55oFoWiyqCqyEOO9Ririy7+c4qK/76CNFXrGy+DxgR1425SeRTBK/3 +gZilfHneWv7OZAUnyVgyqNU8/zeH9DOIL3O1HfCaxHX3rWNEWtXO6XOvdU3gFjLP +k2wPbpHZkL64qlbj5BCknJ8XE9wku46fqzBjV6sxnrQbhoxA1Mlr9+1k9d/31gNS +afT8p2SD6tQ9Y8xPvX0ODyAU29jo6z9Gv+RQMc3zAhCJwapZf+WqViyrYG7tGHDX +W+kTVakuvHLAg9hPxOboo4CvjAsQU14gJrrewoHbqEYd1mUGf+UCAwEAAaM5MDcw CQYDVR0TBAIwADALBgNVHQ8EBAMCBaAwHQYDVR0lBBYwFAYIKwYBBQUHAwIGCCsG -AQUFBwMBMA0GCSqGSIb3DQEBCwUAA4IBAQAdRy2FBeliMlB6Qg4CjBzUFoQ0SQzg -WGy2Gpq9LqFnrL3AFGk5BWI7hXqpSci+vw78ry6qgrIZluZbCxk6qNSJ4iiWiDNG -VCzbJoornZWzZOdxtSTN1+Ejdw4Q5eTHzgM4lU3n+ZkNiD1c2jO3xl6dyUVeBQwk -2OLCKSbd5I96UxfUUQWSxV3iH8prx2zN4kOcWuUlwKsanozxRAjO1btbXnwHxCSb -/axJSkzZtmsX3bBpfrwJJiEk7wvNkipxK1+KTfSP0kakVs2ktGLaqa9HTvgqwe6u -UiHt9zVHyXI013QeXn36F37/Q9e0vb7jz2v+Mb35h1dPUwYK4ThUO8AI +AQUFBwMBMA0GCSqGSIb3DQEBCwUAA4IBAQBBE2cgPQR8DbgmtWa43+y60e14DCJX +awWsN+PyDUmjGnozx8a4bes/GOMXqMmWYT/Ol6YKw42VDXJbtYSIZJUnc31IYVM6 +vVQTZrB3LKGkMEJQZejVcUYxw88mfXD76DLjvepX6sCyHkBXv3EYhw8KLJv0xgR0 +ZSVwbd4TurhL6fR0ZA6U/+SpNrRp472S17fLxb5HXmMx4c3WtlO+OEKK38os2D1e +YPDMroO1Fvr0/pkwkXSebba8nXNjaUKQ3ySsdO7sktmM1XL4euwqCwjKAnZ/T9Sf +KB3VgNAaoZbjIv/N2xGIjzBxf72n/GIc9ApKBKglMOqum7CbBtHjF+fS -----END CERTIFICATE----- diff --git a/etc/ssl/generated_certs/key.pem b/etc/ssl/generated_certs/key.pem index 3487b41..a5753ae 100644 --- a/etc/ssl/generated_certs/key.pem +++ b/etc/ssl/generated_certs/key.pem @@ -1,27 +1,27 @@ -----BEGIN RSA PRIVATE KEY----- -MIIEowIBAAKCAQEA8bApKfcDBE1y+/4pC9VppBPCwq+M8R1iay0fB4W8In5Tueq4 -hBGz8d2qY2BQcZcfaNdbAZNFTSL8f6qZAIpo8sIj6Zy+7nsJu7otciy9s7eHQZf5 -RkEZSu3bEoAsq4wNAO40I5iASGU2CLhGZiQsFFlTCxjn+PnHwxhOT3BtA93D+3xT -omGcR0E0EtmwYUKwG/Cp1OZBiauiYh4V5/hDkB9WQeDH/pC6oZFBUaf2ya/u11Lk -eYqh/6Dyn5q8HaRi4Oou2D+/muM5npIRh5ELNkikUXAzYA0LuOJk5nPDUKOxpAuQ -7Aapr0G824wFGjh2QODuh0vFozFsUiicw7AOVwIDAQABAoIBAHyiH1gorUGWvuj8 -FCaqEyQtnI3RAZmFUa97QTkb2fzfsEV7qVNR3b2oVamRjWpGSEhEZgXV8DLrC9K6 -ItSIi75EJ0jdMAjDIi3QwIbUU69NwU4uFLoJ8AUXy5Uqy95bBomoTPLePakXqFmu -zX72wFRuC8j5OwbFqCIPcrK8gzsuLMWKR52m9qyTVfv4Og2MnNvvJiZoCVOMx0wm -9cu+m23lmZfR5++B03Nrx96Vb7dkZw6z6lk0qwZ/jxqC6jSWsVpKCjIGN+xfign3 -9XSrt/Cf6+e9N5L8dPDFk/R6aq4M4gRJ2XAAFgNtNkPryZWpmeijyBya/C9kNLu6 -hqC3wAECgYEA/AFmXZIR1r9nbJO2oK72p37iW6OAVNv0eHi1TCpsC1VrSs7xqe47 -Jdv+TWcw+AJA+hUhkQRvJQS+SEIDHZOwU5wWt+RQUlgwfeH/DTzPBlauSJWmsdbp -vkwYpIkRfJ+0Suj3lEmnKWsQgy4FCzxI4RRsNuMYCSd2uwLxF4URSAECgYEA9YTl -DJOj4WEc34eDXjBaURXrmMdF3qcvZpBu3Q0M6FstL44dz4wYHIi9Ewu5ReKzt57+ -FOkxJorCBVbR+HitBeYdVL6LZMHO9EOSzxyaWjlxHeUbaRzgtG8KnfWXH+9DsZ9D -ooJ8iv7+hYI+fQWx5XNAv4Qus1hd6i6XCWqgllcCgYEAyP6Y1cK+Rait5dS+0dQa -2KcEBZEXtxckGr4z48bmG/gKNkVuTFm9hUm8v6GxVe3+Qzh9aDvAJidtWRaFg56b -AWS6XftU8QhzzMNm+PjqKiUSpsPti2RdVDE/amQEtYBvfVvos7Y3BHrnValrznVL -r3HpibGBJzP5p9kFz/uLOAECgYB/Hkupc1fKfYmBgpxVzBs3GG5fL+3RFibIp8d+ -1B72vx5qHN76csKZI4MhtVQ8BuCeFcff88zq87T5JraYO4L6JubQ1cc+Z8pLViFQ -8rJIPK2AmPrUNYtyYHvSxTF161/VO2y9W2o4XUZSwdiwyp4M+ttvTXUQjpQxh+XT -jk2PCQKBgApXleft99Lo1B2Ja9y9gVqFh02i537cTrjz+g3cPLEk0/ZCFsyBPicL -3EDF/DdbzQesOj2QDuWIdHrQlxjzHLkuWrxNdhD24ntesqqJ1JeUuEnrawgZBQ0S -kR8Bz0jxs7n/xRmf2Fm7e9qWCzEtqMT+71j7xNMECT3F8FuaHcv5 +MIIEowIBAAKCAQEAvcdymx6uUh23jqyzIdA24UU5fe6ORjYoRhNpxA+AGnu2Tnmg +WhaLKoKrIQ471GKuLLv5zior/voI0VesbL4PGBHXjblJ5FMEr/eBmKV8ed5a/s5k +BSfJWDKo1Tz/N4f0M4gvc7Ud8JrEdfetY0Ra1c7pc691TeAWMs+TbA9ukdmQvriq +VuPkEKScnxcT3CS7jp+rMGNXqzGetBuGjEDUyWv37WT13/fWA1Jp9PynZIPq1D1j +zE+9fQ4PIBTb2OjrP0a/5FAxzfMCEInBqll/5apWLKtgbu0YcNdb6RNVqS68csCD +2E/E5uijgK+MCxBTXiAmut7CgduoRh3WZQZ/5QIDAQABAoIBAD/hXuZwEWV6s0rH +PxTmrVJupseJAUMI/812w2dHgGtpsRgBQMSSSg3pJglebS00ekR8kb2f3GdbapRs +2pFP+Gy1tMTz1beRakaBHZJwQdIT0rVqa6iAl/mkM8hzW8Upcj3WXYKpOIbA0diT +oj2DyL+nglV6fhXUlCROUuVQ3HsjGHq3DwZo9+orWURPFQ0qJWwmzj9zyGfeqW4o +3ssncJ327c908WT5TYqnFQ2zbEUdAOWf8K1eVBj+qyVR0Qt0LcDinf8R5Uf6pavp +AGRHvPWx0e/RLE3LGv9QLQBttV/aFeOj1xmmNwalv/JN3MXQynMJkX0xVc8x7Pg0 +tYgl7gECgYEA34aCuZN5ldZovOxSaau0wO2Pet+h4fP8h0B0T0Pz4NQJelJJTEG3 +6ngC/3UH0ymxeOsFY9RgE5CTGCgE57fpy4rFmahPfieQKDOdK8ZXBG4q+jPz+8EN +rAmKOclZ3X5eywSyi7Mlnopvbhjakit5oPORcCc+CFclOVicyZQBmqECgYEA2VnS +xQUNNXMm5ieN1A5gEuEqP4+hM0tXPaUb5uJ6ZlF8KbMHOngbxuuTYGr4Mdmy8xxQ +8k5hL/FeXnux7ZGv9zcrVXgdzWQCDG4Lfzkv0bJHJnpf5G+V4wWlR3ALSPDvKatm +dVHN6GHNeaQEJfmciupw890rQK67HqUSOmPNQsUCgYEA1asoFSsjI5dUgZvJ39dS +LsYnzIYvoeVwNP3o8Mh9PSKTeMll5a5Al7Jm8zk05KbLTlIi0d32hV1DLuk6XyKQ +K5CY+RxJ+Mbq53MHQAwVrFd/X7L//Fz7q8NmzXxrGe6twJXZ8u6p/FZK1EyPywAi +ATgzg6kPhDuBisLpBUwlCGECgYBbBuJdvxLcGsDkzRX6BJc58PEXs5iIefosiBSa +aqlmZqQAdskFNL41Xf3X+JS/k/P9GxPPlwHe6VBjc5x96XDvsTxFzEt29HZjLEdG +XfXn9akUtVWpvw3gCUJCG+ut/bG8GtuLMNpyg1thoU5XdSWZjDwH9c2ihks5t1pd +9+REHQKBgAsxaXAhf+DDxsTaWqd0P3DYbxDbhdcTGyyuxPdhGbFHGfvFd8hy/Swh +fVce3uq5rjDMqOjfwTis+QyzxmdKtZAtB/96YycE4WEp8a+/yvEdtsU14eepFpn+ +oQoQVagfxzM+BakxRbgAAeZOLNSZNaRzH776Gfnd3tumKBvXLPXF -----END RSA PRIVATE KEY----- diff --git a/etc/ssl/generated_certs/n6-CA/cacert.pem b/etc/ssl/generated_certs/n6-CA/cacert.pem index 6178399..d9f3684 100644 --- a/etc/ssl/generated_certs/n6-CA/cacert.pem +++ b/etc/ssl/generated_certs/n6-CA/cacert.pem @@ -1,17 +1,17 @@ -----BEGIN CERTIFICATE----- -MIICyzCCAbOgAwIBAgIUB38iI1/opTI6eAPENxYqcPXe9PUwDQYJKoZIhvcNAQEL -BQAwEDEOMAwGA1UEAwwFbjYtQ0EwHhcNMTkxMTI2MTUxODQ1WhcNMjMwODIyMTUx -ODQ1WjAQMQ4wDAYDVQQDDAVuNi1DQTCCASIwDQYJKoZIhvcNAQEBBQADggEPADCC -AQoCggEBAJh2JlwSFwrmv/0fBBMWSgoRhW32tRLmdV1g4L5cN+ME17/WRUz3WYIX -rvu4RxEjvzLIp5zFJ3Hgct10nCdrV3IZlXNEBjv0mNRBV6qkBZ2/KfXAeszuJtkz -yUJxGrpSPTqQeJP6zFV4JBklra1Gim9S/HFtVavaP13QZnIySuJIsq3UGjJiucRI -2KEoYVvzjSRmV3vHThVatc1Sl4D0Iuxj9cUSI2g6Ra6DOpgYd8Cj3ebOsUXzrw5f -2Zs7DNii178M8nCwPnYdw03cuEHBmD72yX/mdL8x4jLnfZw4XZUgcMSsETScZi6E -wfMvnijRqZQzmI8XB/TTmPgiqv8w2/cCAwEAAaMdMBswDAYDVR0TBAUwAwEB/zAL -BgNVHQ8EBAMCAQYwDQYJKoZIhvcNAQELBQADggEBAFQalvZesccQXIUcVgTGV3a+ -FLb290aQr/rtxlVwUYcBf+zwLzH0qi57F+py8LeqAR/jT2If5BOT9Ow+6qqCfh3K -5VNYwJUImMKvCcBFd9ctlUVCl8CP1OYD/sof2gcnkug53z0sauhLB4ZMg/htH2Lg -DblVgeiE0tTjZmmUu9J83u2NXm7h+RL6Hp44M26kMmqRgaHA2Ps6CFIV5ns9pJIK -lhdbvi3jps+ZvpQQgOhATxL1+hhHq3EIIT9yXvgMk4EBnNAukvmWUeloWpfwWkox -vGonKwM/VfaxyTQJSddfhEWCKPDM/C1IsC9AH3fk29Od2oi1Qi6bJWoltOM5wys= +MIICyzCCAbOgAwIBAgIUV7lyh0ajr2D/pzTFOKd1qJGLoJwwDQYJKoZIhvcNAQEL +BQAwEDEOMAwGA1UEAwwFbjYtQ0EwHhcNMjMwODI1MTEzMTE3WhcNMjcwNTIxMTEz +MTE3WjAQMQ4wDAYDVQQDDAVuNi1DQTCCASIwDQYJKoZIhvcNAQEBBQADggEPADCC +AQoCggEBALbXJhi1uMw5BOUt96QFmhkAuBvMSkxjMN8b/iMemRbX/DJ6OwUSTyCn +3gCT0sy8xnWC7NXmug3TWl2Bh4Inh5P7ehoHX6y4rv5D9YWcFRleryGzPUCzIOAs +FEi216+wFW628EZ5eekQz/sc355jXipBLt+ed852VSaG4KZUyFUxH2XwSJ6JiYNd +av4UFIxT/n1SC3JJbAkv/UCJgxKDFrWRraN0yZFhu41M6GO6BeOVL1U7wFN8gwHB +XcRCroNwVHRwGfrpmwn80BRguuezuxuz0n8gTRgPH2tJ+plYOJbymo9PwEgOSnXR +zv1BytJbR4n7yixlh0Xyq1nftZcNLbkCAwEAAaMdMBswDAYDVR0TBAUwAwEB/zAL +BgNVHQ8EBAMCAQYwDQYJKoZIhvcNAQELBQADggEBAIwNu28hQlD6rCzatssQzBlJ +3SYeY97F6Y3uStqgUzCOylSJ/yNTKzJdfHhLXHaA7QV+zal+ceAarcnwuRZ5e0Kf +8Yzm2UM2SJkVN2Hs/u+JVAN9KdudyZ8ZYUZDcRoWRNjEYCYcUWb071ibFNEtZQpx +eoJeNiRQMPNi0GN4E/MA9GKsbZPkRB+JF151Vpn+k0kGjD7QEQ23BBIYtihSQqLU +AZ1VfMDi263vJiPTRHnyuSodBqTJLZFLH5qrBtq8QcEpXuYuHTx59AZnF0wcFTum +rhy3K+AveVkR8OFZcrrIIY56CN+sVXKa7SEVzM/zNNH7UOHWJe6Dwojta+cWEnU= -----END CERTIFICATE----- diff --git a/etc/ssl/generated_certs/n6-CA/certs/12.pem b/etc/ssl/generated_certs/n6-CA/certs/12.pem index 7744c20..08dc25d 100644 --- a/etc/ssl/generated_certs/n6-CA/certs/12.pem +++ b/etc/ssl/generated_certs/n6-CA/certs/12.pem @@ -1,18 +1,18 @@ -----BEGIN CERTIFICATE----- MIIC9jCCAd6gAwIBAgIBEjANBgkqhkiG9w0BAQsFADAQMQ4wDAYDVQQDDAVuNi1D -QTAeFw0xOTExMjYxNTE4NDVaFw0yMzA4MjIxNTE4NDVaMDIxGjAYBgNVBAMMEWxv +QTAeFw0yMzA4MjUxMTMxMjlaFw0yNzA1MjExMTMxMjlaMDIxGjAYBgNVBAMMEWxv Z2luQGV4YW1wbGUuY29tMRQwEgYDVQQKDAtleGFtcGxlLmNvbTCCASIwDQYJKoZI -hvcNAQEBBQADggEPADCCAQoCggEBAPGwKSn3AwRNcvv+KQvVaaQTwsKvjPEdYmst -HweFvCJ+U7nquIQRs/HdqmNgUHGXH2jXWwGTRU0i/H+qmQCKaPLCI+mcvu57Cbu6 -LXIsvbO3h0GX+UZBGUrt2xKALKuMDQDuNCOYgEhlNgi4RmYkLBRZUwsY5/j5x8MY -Tk9wbQPdw/t8U6JhnEdBNBLZsGFCsBvwqdTmQYmromIeFef4Q5AfVkHgx/6QuqGR -QVGn9smv7tdS5HmKof+g8p+avB2kYuDqLtg/v5rjOZ6SEYeRCzZIpFFwM2ANC7ji -ZOZzw1CjsaQLkOwGqa9BvNuMBRo4dkDg7odLxaMxbFIonMOwDlcCAwEAAaM5MDcw +hvcNAQEBBQADggEPADCCAQoCggEBAL3HcpserlIdt46ssyHQNuFFOX3ujkY2KEYT +acQPgBp7tk55oFoWiyqCqyEOO9Ririy7+c4qK/76CNFXrGy+DxgR1425SeRTBK/3 +gZilfHneWv7OZAUnyVgyqNU8/zeH9DOIL3O1HfCaxHX3rWNEWtXO6XOvdU3gFjLP +k2wPbpHZkL64qlbj5BCknJ8XE9wku46fqzBjV6sxnrQbhoxA1Mlr9+1k9d/31gNS +afT8p2SD6tQ9Y8xPvX0ODyAU29jo6z9Gv+RQMc3zAhCJwapZf+WqViyrYG7tGHDX +W+kTVakuvHLAg9hPxOboo4CvjAsQU14gJrrewoHbqEYd1mUGf+UCAwEAAaM5MDcw CQYDVR0TBAIwADALBgNVHQ8EBAMCBaAwHQYDVR0lBBYwFAYIKwYBBQUHAwIGCCsG -AQUFBwMBMA0GCSqGSIb3DQEBCwUAA4IBAQAdRy2FBeliMlB6Qg4CjBzUFoQ0SQzg -WGy2Gpq9LqFnrL3AFGk5BWI7hXqpSci+vw78ry6qgrIZluZbCxk6qNSJ4iiWiDNG -VCzbJoornZWzZOdxtSTN1+Ejdw4Q5eTHzgM4lU3n+ZkNiD1c2jO3xl6dyUVeBQwk -2OLCKSbd5I96UxfUUQWSxV3iH8prx2zN4kOcWuUlwKsanozxRAjO1btbXnwHxCSb -/axJSkzZtmsX3bBpfrwJJiEk7wvNkipxK1+KTfSP0kakVs2ktGLaqa9HTvgqwe6u -UiHt9zVHyXI013QeXn36F37/Q9e0vb7jz2v+Mb35h1dPUwYK4ThUO8AI +AQUFBwMBMA0GCSqGSIb3DQEBCwUAA4IBAQBBE2cgPQR8DbgmtWa43+y60e14DCJX +awWsN+PyDUmjGnozx8a4bes/GOMXqMmWYT/Ol6YKw42VDXJbtYSIZJUnc31IYVM6 +vVQTZrB3LKGkMEJQZejVcUYxw88mfXD76DLjvepX6sCyHkBXv3EYhw8KLJv0xgR0 +ZSVwbd4TurhL6fR0ZA6U/+SpNrRp472S17fLxb5HXmMx4c3WtlO+OEKK38os2D1e +YPDMroO1Fvr0/pkwkXSebba8nXNjaUKQ3ySsdO7sktmM1XL4euwqCwjKAnZ/T9Sf +KB3VgNAaoZbjIv/N2xGIjzBxf72n/GIc9ApKBKglMOqum7CbBtHjF+fS -----END CERTIFICATE----- diff --git a/etc/ssl/generated_certs/n6-CA/index.txt b/etc/ssl/generated_certs/n6-CA/index.txt index c366da2..1d05217 100644 --- a/etc/ssl/generated_certs/n6-CA/index.txt +++ b/etc/ssl/generated_certs/n6-CA/index.txt @@ -1 +1 @@ -V 230822151845Z 12 unknown /CN=login@example.com/O=example.com +V 270521113129Z 12 unknown /CN=login@example.com/O=example.com diff --git a/etc/ssl/generated_certs/n6-CA/private/cakey.pem b/etc/ssl/generated_certs/n6-CA/private/cakey.pem index ad69c1e..4db1278 100644 --- a/etc/ssl/generated_certs/n6-CA/private/cakey.pem +++ b/etc/ssl/generated_certs/n6-CA/private/cakey.pem @@ -1,28 +1,28 @@ -----BEGIN PRIVATE KEY----- -MIIEvQIBADANBgkqhkiG9w0BAQEFAASCBKcwggSjAgEAAoIBAQCYdiZcEhcK5r/9 -HwQTFkoKEYVt9rUS5nVdYOC+XDfjBNe/1kVM91mCF677uEcRI78yyKecxSdx4HLd -dJwna1dyGZVzRAY79JjUQVeqpAWdvyn1wHrM7ibZM8lCcRq6Uj06kHiT+sxVeCQZ -Ja2tRopvUvxxbVWr2j9d0GZyMkriSLKt1BoyYrnESNihKGFb840kZld7x04VWrXN -UpeA9CLsY/XFEiNoOkWugzqYGHfAo93mzrFF868OX9mbOwzYote/DPJwsD52HcNN -3LhBwZg+9sl/5nS/MeIy532cOF2VIHDErBE0nGYuhMHzL54o0amUM5iPFwf005j4 -Iqr/MNv3AgMBAAECggEAWUaWH8PYCmIkeiv3TtX2dP7diw6z6WVZxFw+Pjnx39Wu -IH9wBSmyGCOWK4A03Sx5gVtOCtbHyj8MA/GMnuiURBQHF3/1xpXsCB5bH2j8gOq3 -v1f+kJHD3dwdNfLVG9WcVPbUJJLvo/y95pvRn213EskdWa4URJWAFYc69hPNFcNk -swnLwz3Cc+kaOyzNKt848dUfCdHt58nF4GILt3NH+s3wJIDYKGCfFS5aGzM9ZnYE -brYouuROsQmJ3EDHXIiRUtzom61DPiZFk0qw1eAscb9siD3/zMpfdx24SrsE5rW+ -B7JArfbmMBH8H81HLXyF5xdO3piZjzQzQTdAYiBYwQKBgQDJgMZLkdJEASedyXQk -F4XQh83PhtsvUCM3X6439uSOPBFF7a9raNd/ZQyL/H3pCSMt7EA4XNngOrc5CYmx -nA10AUuUxP9MJdfFk4+pAN5J8tvqKZdTwSucLAXBhRrbMdCY9a3h0BhIRuGwBFof -8AGrAmdwuMJCJTTXST1q+vIpgwKBgQDBsfJwInLbXHtg6vnTw3coPspDVZAeLADO -/2MUEZ5yflURl8QU9EfGTTdk2D7560yHY3VJdebismjjWHdSm/NGoi0yYunKDWUU -PPDeJv+TIKOAqrfjLNAGTBBT6l80x/9H8vXN1j5JFdn01g/9YFUfJweDUlvekxBb -nCWnKkpdfQKBgH7RGvuolKrUBzjY9s1YOJGbRr5bY0sNpnxNLXpvWjziNQTLqGFz -JF07HYBksmGdrJRUYb6XQVBL49Bz3kL3scfWoNjKetpT3s6sJff5Ye0seZeQAXtm -0amCU0UOHm7hlSUPShYaP44NfjCnLIl5JbOY2b0pqqiyfeUYZR0VPp89AoGBAJN7 -l/rT8BqhD3ybTkB068zkCoQ8qTCgFrmGcf188OWC1elAYtgFrIUMlGof0cvf4vSP -wWV+9Z+VcxHwcWKgRht5LurXr+XeTyGayViN3zo6tuQomT3MCFVTI3eR1I5O3kz9 -bTYetGxXzA6F08T8zbObtzfBxRvzZJgsi+r944PNAoGARW8QrC+nz+HS5dwXFCyO -EwFQ3cQuVv6dDBP/WRWQh4S+6IE8BBBAa19yp5/xRu8htMewY+l/xVzSXrjo1HOp -gTSXOhlAPJtqb7LwKcLlo9oVDfujjooebZmnZENyw1jR1n2vxCl52OB2gNpBcs3g -BwLMyCxImJDdxLa0Pk5kHXU= +MIIEvgIBADANBgkqhkiG9w0BAQEFAASCBKgwggSkAgEAAoIBAQC21yYYtbjMOQTl +LfekBZoZALgbzEpMYzDfG/4jHpkW1/wyejsFEk8gp94Ak9LMvMZ1guzV5roN01pd +gYeCJ4eT+3oaB1+suK7+Q/WFnBUZXq8hsz1AsyDgLBRIttevsBVutvBGeXnpEM/7 +HN+eY14qQS7fnnfOdlUmhuCmVMhVMR9l8EieiYmDXWr+FBSMU/59UgtySWwJL/1A +iYMSgxa1ka2jdMmRYbuNTOhjugXjlS9VO8BTfIMBwV3EQq6DcFR0cBn66ZsJ/NAU +YLrns7sbs9J/IE0YDx9rSfqZWDiW8pqPT8BIDkp10c79QcrSW0eJ+8osZYdF8qtZ +37WXDS25AgMBAAECggEBAKvVl6AxaPPFwqwAf3RPL3vACUdWv5z6u7ty+2zWHNoz +MnneFgm2I1d1bFbulnaEE5/s49hDdyf3Mj4etdPEgs640RAVTf1ttEiSZfSjs9Cc +A8uQQEjGEGeeBZIxBJPA0OO0WixhjglUG6LMh/y7NoxPplXTAJWw8GW87PRlScGF +SWCtgpc+OLsxjOqpUu9thUOtD1mmuqHgupIEGFj6tbLLyBk7rB0wFzxHDuD5IMzU +tqtTKuA4bXWacUIHzmtbL/aJYfkpTzb4BCXaSoxW3NgnhvBpD/IaOAQXi/UacRpJ +27qAyFED1ZCTXK/ctuFWMU3k/IWts8HtqIE0sSjNf2ECgYEA6Gz6faTlyXYqjOq0 +AxedNtgj30i5egti0SyjXW6udJSxT2AYtx/a+h5fFDWeACo/kCQSdcHgRCHhmGS2 +cUUNNVfMaAsGqk2+ICG5glpHfOL7R5Gi6l0FChVbjWFOVmfRC60UcmvX3M7Oafn8 +bTUiWQURfGFkfIhrSBqQhcamBksCgYEAyWKrfxOSdbI/XpLVfNqPZB6rjLqUBHF3 +zFYkBHexl5kjsInnGxXNbdJsXommTmLTVGCA0MoOkorZ/XL+yYzR3fdCd4EdD8pp +vAbgWDaQrmouJ853bqeUGYhGeasxl7+gwHFkW4TYRGcZ0TLH6iQq2wRfq4zINbxK +7Vf+Ru8ZaYsCgYEAhOEpJIQNy2v/T6kvWUU64IwZliIhyCCSUjxO+a+5lXUdGeA6 +wRc5Ph33BbrRpg6BYIr+8svwx4MHUvThSUjNEF4twp3rJZpkxEIDqP6sOD4cowIk +PhEIPIeRW/bxrnyUCzTcp734H4ksgXImWtkx1esL4CxeIsRrcUGetpyndpkCgYA0 +lSbiT2H2iUwyjXRg3VCDe96fKDht0JLPL87Hu9kLFFlVRyyozdCN1Func5mQ7gzw +AyKfYaLccJTqsJQGXFaP9nfMbFICRX/GMKVzYwvz/pV+n1Jf+jGZWRPNwP15+fcn +SHRD0TQG6ES9ctzwLfFiromsaV39aeTGhCtIqjWgcQKBgAXnsDzeScIEwmq/BmmM +awr7arldvMSV7LVxg1/O6LfXIQOfWXNMUpeo+rkJrhpeW0bh+NyQV19kO2VpWolQ +RpJxWhKTiTpYXK84d5JatuU3MFvROzdwd0KLaqBsZw54SCBVN9otrMmYZYqrAqA9 +vFQDMZAQ6wzpmNL2fjyXj0gK -----END PRIVATE KEY----- diff --git a/etc/ssl/generated_certs/req.csr b/etc/ssl/generated_certs/req.csr index 7bd4129..7694013 100644 --- a/etc/ssl/generated_certs/req.csr +++ b/etc/ssl/generated_certs/req.csr @@ -1,16 +1,16 @@ -----BEGIN CERTIFICATE REQUEST----- MIICdzCCAV8CAQAwMjEaMBgGA1UEAwwRbG9naW5AZXhhbXBsZS5jb20xFDASBgNV BAoMC2V4YW1wbGUuY29tMIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEA -8bApKfcDBE1y+/4pC9VppBPCwq+M8R1iay0fB4W8In5Tueq4hBGz8d2qY2BQcZcf -aNdbAZNFTSL8f6qZAIpo8sIj6Zy+7nsJu7otciy9s7eHQZf5RkEZSu3bEoAsq4wN -AO40I5iASGU2CLhGZiQsFFlTCxjn+PnHwxhOT3BtA93D+3xTomGcR0E0EtmwYUKw -G/Cp1OZBiauiYh4V5/hDkB9WQeDH/pC6oZFBUaf2ya/u11LkeYqh/6Dyn5q8HaRi -4Oou2D+/muM5npIRh5ELNkikUXAzYA0LuOJk5nPDUKOxpAuQ7Aapr0G824wFGjh2 -QODuh0vFozFsUiicw7AOVwIDAQABoAAwDQYJKoZIhvcNAQELBQADggEBAFFJtST7 -qq3TFhcojOqGrW7AcquNJysxmCluhzwkh3Dag7GI+tw8CzVuNDlk9NHIn2AN6OwV -x5+V+QWQ/dGuqrunHupaM1zg+bTgKj3tPcokI8HkQOy95M0GuDMU5FaiWs/rcV9o -Z4CtYCa8H+2XoeI5EiDSILPluz7d8dDmtz+rB4L+0Zo228JPYkX9mJZULKDm1BYr -K9+yPhfPLVOpNeJmpsAdakRnJhrb85QEaT1esr9Ur75u2BoK0DbCJbjTAeuB/V8P -4zZbY8YRAcERiCESwQDWnw1OQdLh4bfrp/YQWQTe09/x0vDBUkvkRrnKT5frinxj -CARuhGSR85l5u1E= +vcdymx6uUh23jqyzIdA24UU5fe6ORjYoRhNpxA+AGnu2TnmgWhaLKoKrIQ471GKu +LLv5zior/voI0VesbL4PGBHXjblJ5FMEr/eBmKV8ed5a/s5kBSfJWDKo1Tz/N4f0 +M4gvc7Ud8JrEdfetY0Ra1c7pc691TeAWMs+TbA9ukdmQvriqVuPkEKScnxcT3CS7 +jp+rMGNXqzGetBuGjEDUyWv37WT13/fWA1Jp9PynZIPq1D1jzE+9fQ4PIBTb2Ojr +P0a/5FAxzfMCEInBqll/5apWLKtgbu0YcNdb6RNVqS68csCD2E/E5uijgK+MCxBT +XiAmut7CgduoRh3WZQZ/5QIDAQABoAAwDQYJKoZIhvcNAQELBQADggEBAHfY35sg +215yyw7ZLsyXNILacyeHPEiCazYs3wT6HVhQlRODiztkn/lfLNGmJnFLaSzfKgzm +cyKyG7LXGC5ibTGT+Nk3+Aa2oEVCuFXYAPRa699L60DqCXPRN5ze6NdrEnw6lKrR +1n70sGDwOIJKCHas8FyA3Abte25MLH2P42zRcT9zOg6fBKeArfg9CHmdK9b8aaK5 +fyDmB67Ng2LzYkrfqYP4MfoT5xU1+ti0x5IySxqHoxnEAU3oWqjw8s5extRJ2H/7 +e+LMTFveeVsbExiJexbX3Ep2MN+BCB22cIUXrsCFU+X3CMWRG29rrXw/MiH+4NF6 +VMVL39/xLKrj9gc= -----END CERTIFICATE REQUEST----- diff --git a/etc/ssl/openssl.cnf b/etc/ssl/openssl.cnf index 13aefb6..0ca769b 100644 --- a/etc/ssl/openssl.cnf +++ b/etc/ssl/openssl.cnf @@ -2,8 +2,8 @@ default_ca = n6-CA [ n6-CA ] -dir = n6-CA -ca_dir = n6-CA +dir = generated_certs/n6-CA +ca_dir = generated_certs/n6-CA certificate = $ca_dir/cacert.pem database = $dir/index.txt new_certs_dir = $dir/certs @@ -29,7 +29,7 @@ organizationalUnitName = optional basicConstraints = CA:false [ req ] -dir = n6-CA +dir = generated_certs/n6-CA default_bits = 2048 default_keyfile = $dir/private/cakey.pem default_md = sha256 diff --git a/pytest.ini b/pytest.ini index 0fb6e5f..704b79c 100644 --- a/pytest.ini +++ b/pytest.ini @@ -1,2 +1,6 @@ [pytest] + doctest_optionflags = + +markers = + slow: marks tests as slow (deselect with '-m "not slow"')