Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

We achieved a higher score when using Llama 3-70B directly as the judge #1

Open
Lucas-TY opened this issue Nov 17, 2024 · 0 comments
Open

Comments

@Lucas-TY
Copy link

Hi,

Our method, You Know What I'm Saying - Jailbreak Attack via Implicit Reference, used the "Past Tense" as one of our baselines. We achieved a higher ASR score using direct evaluation with Llama3-70B as the judge(Compare to the result on your paper). We have included our results for the "Past Tense" here for your reference.

If you would like, you can also upload this zip file to the JailbreakBench leaderboard. We believe we have followed their requirements (but please make sure all information in the zip file is correct, as we might have made some mistakes).

submission.zip

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant