Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: rollback BUCC revision #1706

Merged
merged 5 commits into from
Jan 8, 2025
Merged

fix: rollback BUCC revision #1706

merged 5 commits into from
Jan 8, 2025

Conversation

Samoed
Copy link
Collaborator

@Samoed Samoed commented Jan 4, 2025

Checklist

  • Run tests locally to make sure nothing is broken using make test.
  • Run the formatter to format the code using make lint.

Closing: #1694

Results for sentence-transformers/LaBSE, since I can't reproduce results for intfloat/multilingual-e5-small. Maybe they used for query: as prefix here.

Leaderboard (BUCC) PR (BUCC) BUCC.v2
de-en 99.35 99.3633 99.4294
fr-en 98.72 98.716 98.9489
ru-en 97.78 97.7755 97.4614
zh-en 99.16 99.1575 99.2277

Copy link
Contributor

@KennethEnevoldsen KennethEnevoldsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally intfloat/multilingual-e5-small is a bit hard to reproduce if not run by our implementation and even if it is, the task prompts might change making it harder to reproduce (we could consider logging this).

Comment on lines +52 to +53
# BUCC outputs a dict instead of a Dataset
subsets = list(pair_elements)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't we change the dataset instead?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to change it, but it resulted in a different number of sentence1 and sentence2, which prevented me from creating a dataset.Dataset.

Copy link
Collaborator

@isaac-chung isaac-chung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if any changes here address this comment in the linked issue. Could you please take a look at that and the paper to confirm what's missing? (i.e. wrong split used/uploaded, should be a cross-lingual retrieval task)

@Samoed
Copy link
Collaborator Author

Samoed commented Jan 8, 2025

For now, I've made changes to ensure that BUCC is working. After that, we can investigate how it’s supposed to work and deprecate this

@isaac-chung
Copy link
Collaborator

isaac-chung commented Jan 8, 2025

How it's supposed to work

It's ok to do this in a separate PR. As mentioned here, hope my comment in the issue would give you a head start.

@isaac-chung isaac-chung changed the title fix: bucc fix: rollback BUCC revision Jan 8, 2025
@isaac-chung isaac-chung merged commit 9bcb52f into main Jan 8, 2025
10 checks passed
@isaac-chung isaac-chung deleted the rollback_bucc branch January 8, 2025 12:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants