Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Risk score return value #190

Open
SamuelJanoska opened this issue Sep 23, 2024 · 1 comment
Open

Risk score return value #190

SamuelJanoska opened this issue Sep 23, 2024 · 1 comment

Comments

@SamuelJanoska
Copy link

Hi,

I'd like to ask if the current logic for returning risk score values is going to stay that way, or if there are any plans to change. I noticed that if the prompt is evaluated as valid, score is set to zero, which I found misleading. I saw this in the BanTopics scanner, but it's true for multiple scanners.

image

@SamuelJanoska SamuelJanoska changed the title Risk score Risk score return value Sep 23, 2024
@rlangenfeld-twin
Copy link

rlangenfeld-twin commented Nov 21, 2024

Going to bump this we are encountering the same issue with version 3.15.

def scan(self, prompt: str) -> tuple[str, bool, float]:
        if prompt.strip() == "":
            return prompt, True, 0.0

        output_model = self._classifier(prompt, self._topics, multi_label=False)
        label_score = dict(zip(output_model["labels"], output_model["scores"]))

        max_score = round(max(output_model["scores"]) if output_model["scores"] else 0, 2)
        if max_score > self._threshold:
            LOGGER.warning(
                "Topics detected for the prompt",
                scores=label_score,
            )

            return prompt, False, calculate_risk_score(max_score, self._threshold)

        LOGGER.debug(
            "No banned topics detected",
            scores=label_score,
        )

        return prompt, True, 0.0
        

This method will always returns 0.0 which makes it harder for us to know how close certain topics where to being above the threshold (in our usecase this would be extremely helpful).

I think it would be a fairly straightforward change to return the calculated_risk_score on the last line instead of just 0.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants