Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shortened middle name prevents anonymization #200

Open
akaliuta opened this issue Oct 30, 2024 · 0 comments
Open

Shortened middle name prevents anonymization #200

akaliuta opened this issue Oct 30, 2024 · 0 comments

Comments

@akaliuta
Copy link

If a full name contains shortened middle name or postfix like 'jr.' Anonymize doesn't consider it as a name to replace.

When running the following piece of code:

from llm_guard.util import configure_logger
from llm_guard.input_scanners import Anonymize
from llm_guard.vault import Vault
from loguru import logger

configure_logger('CRITICAL')

# Create vault
vault = Vault()

# Init scanner
anonymize_scanner = Anonymize(vault)

name_list = {'Sam Jackson','Samuel Leroy Jackson','Samuel L. Jackson','Robert Downey jr.'}

for name in name_list:
    anon_this = f"My name is {name}" 
    sanitized_text, is_valid, risk_score = anonymize_scanner.scan(anon_this)
    if is_valid:
        logger.success(f"Text is clean: {sanitized_text}")
    else:
        logger.error(f"Sanitized text: {sanitized_text} ({name})")
        logger.info(f"Text is safe (risk estimation): {is_valid} ({risk_score})")

I get:

2024-10-30 13:46:19.002 | SUCCESS  | __main__:<module>:20 - Text is clean: My name is Robert Downey jr.
Entity CUSTOM doesn't have the corresponding recognizer in language : en
2024-10-30 13:46:19.147 | SUCCESS  | __main__:<module>:20 - Text is clean: My name is Samuel L. Jackson
Entity CUSTOM doesn't have the corresponding recognizer in language : en
2024-10-30 13:46:19.287 | ERROR    | __main__:<module>:22 - Sanitized text: My name is [REDACTED_PERSON_1] (Sam Jackson)
2024-10-30 13:46:19.287 | INFO     | __main__:<module>:23 - Text is safe (risk estimation): False (1.0)
Entity CUSTOM doesn't have the corresponding recognizer in language : en
2024-10-30 13:46:19.427 | ERROR    | __main__:<module>:22 - Sanitized text: My name is [REDACTED_PERSON_2] (Samuel Leroy Jackson)
2024-10-30 13:46:19.427 | INFO     | __main__:<module>:23 - Text is safe (risk estimation): False (1.0)

While I would expect it to replace all names. Could you please take a look and let me know if I'm missing smth. Thank you in advance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant