Skip to content

Commit

Permalink
Implement TickerDetector class for validating and searching ticker sy…
Browse files Browse the repository at this point in the history
…mbols; update TickerOrchestrator to utilize new validation methods; enhance tests for ticker detection functionality.
  • Loading branch information
zoharbabin committed Jan 19, 2025
1 parent ce9c84e commit 47848f0
Show file tree
Hide file tree
Showing 3 changed files with 155 additions and 18 deletions.
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -232,8 +232,8 @@ Below is a comprehensive table detailing the library’s primary methods across
| **Method** | **Internal Logic** | **Dependencies** | **Purpose** | **Usage (CLI vs Python)** |
|------------|---------------------|------------------|-------------|---------------------------|
| **1. `cli.main()`** <br/>*(in `cli.py`)* | 1. Uses Click to parse command-line arguments for the main ticker plus optional peers and CSV path.<br/>2. Instantiates a `TickerOrchestrator` and calls `analyze_company()`. | - Built-in Python <br/>- [Click](https://click.palletsprojects.com/) <br/>- `TickerOrchestrator.analyze_company()` from `orchestrator.py` | Provides the command-line entry point (`edgar-analytics`) for analyzing one or more tickers and outputting results. | **CLI**: Invoked automatically via `edgar-analytics TICKER [PEERS...] --csv file.csv`. <br/>**Python**: Not typically called directly; used by the console script defined in `setup.py`. |
| **2. `orchestrator.validate_ticker_symbol(ticker)`** <br/>*(in `orchestrator.py`)* | Checks if the ticker is 1–10 alphanumeric characters. Returns `True` if valid, `False` otherwise. | None (pure Python) | Ensures malformed or suspicious ticker symbols are rejected. | **CLI**: Invoked internally before processing each ticker.<br/>**Python**: Not usually called by end users. Used by `analyze_company()`. |
| **3. `orchestrator.TickerOrchestrator.analyze_company(ticker, peers, csv_path)`** <br/>*(public method)* | 1. Validates main ticker using `validate_ticker_symbol`.<br/>2. Creates a new `Company` object (from `edgartools`).<br/>3. Gathers metrics via `_analyze_ticker_for_metrics()` for main ticker and each valid peer.<br/>4. Aggregates results into a `metrics_map`.<br/>5. Uses `ReportingEngine.summarize_metrics_table()` to produce logs and optional CSV. | - `validate_ticker_symbol()`<br/>- `edgartools.Company`<br/>- `_analyze_ticker_for_metrics()`<br/>- `ReportingEngine` | Primary orchestration method for analyzing a main ticker (and optional peers). Fetches snapshots, multi-year data, forecasts, and runs final reporting. | **CLI**: Called implicitly when you run `edgar-analytics TICKER PEERS...`. <br/>**Python**: Manually instantiate `TickerOrchestrator` and call `.analyze_company()`. Example: <br/>```python<br/>orch = TickerOrchestrator()<br/>orch.analyze_company("AAPL", ["MSFT"], csv_path="out.csv")<br/>``` |
| **2. `TickerDetector.validate_ticker_symbol(ticker)`** <br/>*(in `orchestrator.py`)* | Checks if the entire string matches the valid ticker pattern: 1–5 uppercase letters with optional “.” or “-” plus 1–4 alphanumeric chars (e.g. "BRK.B", "RY.TO"). <br/>Returns `True` if valid, `False` otherwise. | None (pure Python) | Ensures malformed or suspicious ticker symbols are rejected based on a stricter EDGAR-friendly regex. | **CLI**: Invoked internally before processing each ticker.<br/>**Python**: Not usually called by end users. Used by `analyze_company()`. |
| **3. `orchestrator.TickerOrchestrator.analyze_company(ticker, peers, csv_path)`** <br/>*(public method)* | 1. Validates the main ticker using `TickerDetector.validate_ticker_symbol(ticker)`.<br/>2. Creates a new `Company` object (from `edgartools`).<br/>3. Gathers metrics via `_analyze_ticker_for_metrics()` for main ticker and each valid peer.<br/>4. Aggregates results into a `metrics_map`.<br/>5. Uses `ReportingEngine.summarize_metrics_table()` to produce logs and optional CSV. | - `TickerDetector.validate_ticker_symbol(ticker)`<br/>- `edgartools.Company`<br/>- `_analyze_ticker_for_metrics()`<br/>- `ReportingEngine` | Primary orchestration method for analyzing a main ticker (and optional peers). Fetches snapshots, multi-year data, forecasts, and runs final reporting. | **CLI**: Called implicitly when you run `edgar-analytics TICKER PEERS...`. <br/>**Python**: Manually instantiate `TickerOrchestrator` and call `.analyze_company()`. Example: <br/>```python<br/>orch = TickerOrchestrator()<br/>orch.analyze_company("AAPL", ["MSFT"], csv_path="out.csv")<br/>``` |
| **4. `orchestrator.TickerOrchestrator._analyze_ticker_for_metrics(ticker)`** <br/>*(private method)* | 1. Instantiates an `edgartools.Company` object.<br/>2. Fetches latest 10-K/10-Q with `get_single_filing_snapshot()`.<br/>3. Calls `retrieve_multi_year_data()` for multi-year stats.<br/>4. Runs `forecast_revenue_arima()` for forecasting.<br/>5. Gathers additional quarterly info (`analyze_quarterly_balance_sheets()`) and checks alerts. | - `edgartools.Company`<br/>- `metrics.get_single_filing_snapshot()`<br/>- `multi_period_analysis.retrieve_multi_year_data()`<br/>- `forecasting.forecast_revenue_arima()`<br/>- `multi_period_analysis.analyze_quarterly_balance_sheets()`<br/>- `multi_period_analysis.check_additional_alerts_quarterly()` | Consolidates data gathering, metric calculations, forecasts, and alert compilation for a single ticker. | **CLI**: Internally used by `analyze_company()`. <br/>**Python**: Not intended for direct use—rely on `analyze_company()`. |
| **5. `orchestrator.TickerOrchestrator.main()`** <br/>*(demonstration method)* | 1. Demonstrates usage by analyzing “AAPL” with peers “MSFT”/”GOOGL.”<br/>2. Writes CSV to `analysis_outputs/summary.csv`. | - `analyze_company()`<br/>- Built-in Python | Simple example if `python orchestrator.py` is run directly; not used in typical workflows. | **CLI**: Not a CLI command by default. <br/>**Python**: Run `python orchestrator.py` to see a usage example. |
| **6. `metrics.compute_ratios_and_metrics(balance_df, income_df, cash_df)`** | 1. Finds relevant rows (Revenue, Net Income, etc.) via `synonyms_utils.find_synonym_value()`. <br/>2. Calculates key ratios (Current Ratio, D/E, margins, FCF, etc.). <br/>3. Constructs a dictionary of standard metrics + any alerts. | - `synonyms_utils.find_synonym_value()` <br/>- `flip_sign_if_negative_expense()`<br/>- `config.ALERTS_CONFIG`<br/>- `numpy`, `pandas` | Core routine to compute financial metrics from 3 DataFrames (Balance, Income, Cash Flow). | **CLI**: Automatically called for each ticker filing snapshot. <br/>**Python**: Call if you have your own data frames: <br/>```python<br/>metrics_dict = compute_ratios_and_metrics(bs, inc, cf)<br/>``` |
Expand Down
60 changes: 46 additions & 14 deletions edgar_analytics/orchestrator.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
multi_period_analysis) to obtain results. It then delegates presentation/reporting
responsibilities to reporting.py.
"""
import re

import logging
from typing import Dict, Any, List, Optional
Expand All @@ -22,22 +23,53 @@
)
from .reporting import ReportingEngine

class TickerDetector:
"""
Manages detection of valid public company ticker symbols using regex patterns
(with class-level regex objects for efficient memory and speed usage).
def validate_ticker_symbol(ticker: str) -> bool:
This class encapsulates ticker-detection logic:
- A search-based pattern to locate ticker-like substrings within larger text.
- A full-match pattern to validate if an entire string is a valid ticker.
"""
Validate ticker symbols to prevent misuse or injection-like strings.

Parameters
----------
ticker : str
A company's ticker symbol.
# Allow exactly 1 to 5 uppercase letters, and at most ONE optional ".XYZ" or "-XYZ" group:
_TICKER_REGEX = re.compile(
r"\b[A-Z]{1,5}(?:[.\-][A-Z0-9]{1,4})?\b"
)
_TICKER_FULLMATCH_REGEX = re.compile(
r"^[A-Z]{1,5}(?:[.\-][A-Z0-9]{1,4})?$"
)

Returns
-------
bool
True if the ticker is deemed valid, False otherwise.
"""
return 1 <= len(ticker) <= 10 and ticker.isalnum()
@classmethod
def search(cls, text: str) -> re.Match | None:
"""
Perform a regex search on the given text to find a valid ticker substring.
:param text: The text string to search.
:type text: str
:return: A regex Match object if a valid ticker substring is found; otherwise None.
:rtype: re.Match or None
:raises ValueError: If the provided text is not a string.
"""
if not isinstance(text, str):
raise ValueError("Input must be a string.")
return cls._TICKER_REGEX.search(text)

@classmethod
def validate_ticker_symbol(cls, ticker: str) -> bool:
"""
Validate if the entire input string is exactly one valid ticker symbol.
:param ticker: The string to validate.
:type ticker: str
:return: True if `ticker` fully matches a valid ticker format; otherwise False.
:rtype: bool
:raises ValueError: If the provided ticker is not a string.
"""
if not isinstance(ticker, str):
raise ValueError("Ticker must be a string.")
return bool(cls._TICKER_FULLMATCH_REGEX.fullmatch(ticker))


class TickerOrchestrator:
Expand Down Expand Up @@ -79,7 +111,7 @@ def analyze_company(
-------
None
"""
if not validate_ticker_symbol(ticker):
if not TickerDetector.validate_ticker_symbol(ticker):
self.logger.error("Invalid main ticker: %s", ticker)
return

Expand All @@ -93,7 +125,7 @@ def analyze_company(

self.logger.info("Comparing %s with peers: %s", ticker, peers)
for peer in peers:
if validate_ticker_symbol(peer):
if TickerDetector.validate_ticker_symbol(peer):
peer_data = self._analyze_ticker_for_metrics(peer)
metrics_map[peer] = peer_data
else:
Expand Down
109 changes: 107 additions & 2 deletions tests/test_orchestrator.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
"""
tests/test_orchestrator.py
Tests for orchestrator.py (TickerOrchestrator).
Tests for orchestrator.py (TickerOrchestrator) and the TickerDetector class.
We mock external Edgar calls, forecast calls, etc., to avoid real network operations.
ReportingEngine tests have been moved to test_reporting.py.
"""
Expand All @@ -10,7 +10,7 @@
from unittest.mock import patch
from pathlib import Path

from edgar_analytics.orchestrator import TickerOrchestrator
from edgar_analytics.orchestrator import TickerOrchestrator, TickerDetector
from edgar_analytics.reporting import ReportingEngine


Expand Down Expand Up @@ -103,3 +103,108 @@ def test_analyze_company_exception_in_creation(caplog):
orchestrator.analyze_company("AAPL", peers=[])

assert "Failed to create Company object for AAPL: Creation error" in caplog.text


# ---------------------------------------------------------------------
# Test for the TickerDetector
# ---------------------------------------------------------------------

class TestTickerDetector:
"""
Test suite for the TickerDetector class, ensuring robust coverage of
validate_ticker_symbol(...) and search(...) functionalities.
"""

@pytest.mark.parametrize("valid_ticker", [
"AAPL",
"MSFT",
"GOOG",
"TSLA",
"BRK.A",
"BRK.B",
"RY.TO",
"NGG.L", # Deliberately not part of the regex suffix -> Should remain invalid by default,
# but if you decide to allow ".L", confirm or adjust the pattern.
# If you truly want to allow ".L", ensure the pattern includes that logic.
"BABA",
"VTI",
"ABC-1",
"SHOP.TO",
"A-B", # A dash with suffix
"BRK.A1", # suffix alphanumeric
"A-123", # multiple digits suffix
])
def test_validate_ticker_symbol_valid(self, valid_ticker):
"""
TickerDetector.validate_ticker_symbol should return True for valid tickers
that match the class-level regex pattern.
"""
# Some entries might need pattern refinements if we want them truly valid.
# Adjust test or pattern as needed.
assert TickerDetector.validate_ticker_symbol(valid_ticker) is True, (
f"Expected '{valid_ticker}' to be recognized as valid."
)

@pytest.mark.parametrize("invalid_ticker", [
"", # empty
"aaaaaa", # 6 letters => invalid
"AAPL1", # missing '.' or '-' for suffix
"AAPL@", # invalid character
"12345", # no letters, only digits => does not match
"BRK..A", # double dot not in pattern
"RY--TO", # double dash not in pattern
"AB.C.D", # multiple suffix groups? This might or might not pass depending on pattern
"A#B", # invalid character
"aapl", # lowercase is not allowed by the pattern
"A-BB-C", # multiple suffix segments might fail if only 1 suffix is allowed
# or if pattern doesn't allow multiple segments
None, # not even a string => should raise ValueError
])
def test_validate_ticker_symbol_invalid(self, invalid_ticker):
"""
TickerDetector.validate_ticker_symbol should return False or raise ValueError
if the ticker is not a string, too long, or doesn't match the allowed pattern.
"""
if not isinstance(invalid_ticker, str):
# Non-string inputs should raise a ValueError
with pytest.raises(ValueError):
TickerDetector.validate_ticker_symbol(invalid_ticker)
else:
# For all other invalid string cases, the method should return False.
assert TickerDetector.validate_ticker_symbol(invalid_ticker) is False, (
f"Expected '{invalid_ticker}' to be recognized as invalid."
)

@pytest.mark.parametrize("sample_text,expected_match", [
("i like AAPL and MSFT", "AAPL"), # 'i' lowercase because I is a valid ticker (IntelSat Global Holdings)
("The quick brown fox jumps over the lazy dog", None),
("BRK.A soared today", "BRK.A"),
("Check out ABC-1 or SHOP.TO in the market", "ABC-1"), # returns first match
("No real ticker here!", None),
])
def test_search(self, sample_text, expected_match):
"""
TickerDetector.search(...) should return a re.Match if a valid ticker substring
is found; None otherwise. If multiple tickers are present, it returns the first match.
"""
match = TickerDetector.search(sample_text)
if expected_match is None:
assert match is None, (
f"Expected no match for '{sample_text}', but got '{match.group(0)}'."
)
else:
assert match is not None, (
f"Expected a match for '{sample_text}' but got None."
)
assert match.group(0) == expected_match, (
f"Expected first match = '{expected_match}', got '{match.group(0)}'."
)

def test_search_non_string_raises_valueerror(self):
"""
search(...) should raise ValueError if passed a non-string input.
"""
with pytest.raises(ValueError):
TickerDetector.search(None)
with pytest.raises(ValueError):
TickerDetector.search(12345)

0 comments on commit 47848f0

Please sign in to comment.