Risk score return value #190

SamuelJanoska · 2024-09-23T05:32:00Z

Hi,

I'd like to ask if the current logic for returning risk score values is going to stay that way, or if there are any plans to change. I noticed that if the prompt is evaluated as valid, score is set to zero, which I found misleading. I saw this in the BanTopics scanner, but it's true for multiple scanners.

rlangenfeld-twin · 2024-11-21T16:58:07Z

Going to bump this we are encountering the same issue with version 3.15.

def scan(self, prompt: str) -> tuple[str, bool, float]:
        if prompt.strip() == "":
            return prompt, True, 0.0

        output_model = self._classifier(prompt, self._topics, multi_label=False)
        label_score = dict(zip(output_model["labels"], output_model["scores"]))

        max_score = round(max(output_model["scores"]) if output_model["scores"] else 0, 2)
        if max_score > self._threshold:
            LOGGER.warning(
                "Topics detected for the prompt",
                scores=label_score,
            )

            return prompt, False, calculate_risk_score(max_score, self._threshold)

        LOGGER.debug(
            "No banned topics detected",
            scores=label_score,
        )

        return prompt, True, 0.0

This method will always returns 0.0 which makes it harder for us to know how close certain topics where to being above the threshold (in our usecase this would be extremely helpful).

I think it would be a fairly straightforward change to return the calculated_risk_score on the last line instead of just 0.0

SamuelJanoska changed the title ~~Risk score~~ Risk score return value Sep 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Risk score return value #190

Risk score return value #190

SamuelJanoska commented Sep 23, 2024

rlangenfeld-twin commented Nov 21, 2024 •

edited

Loading

Risk score return value #190

Risk score return value #190

Comments

SamuelJanoska commented Sep 23, 2024

rlangenfeld-twin commented Nov 21, 2024 • edited Loading

rlangenfeld-twin commented Nov 21, 2024 •

edited

Loading