Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Coresets API proposal #89

Merged
merged 3 commits into from
Nov 5, 2020
Merged

Coresets API proposal #89

merged 3 commits into from
Nov 5, 2020

Conversation

Dref360
Copy link
Member

@Dref360 Dref360 commented Oct 2, 2020

Summary:

New coreset API that I would like to implement.

Features:

Related to #67

@Dref360 Dref360 changed the title Create 02-coresets.md Coresets API proposal Oct 2, 2020
class BaseCoreset:
def get_ranks(features: Array, logits: Array) -> Indices:
# K-Means or something
pass
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we have an option for selecting the top n or sth?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In heuristics we do not have that option.
I will add it.


def filter(logits, heuristic, proportion=1.)-> Indices:
# Use `heuristic` to rank features and return the top `proportion` indices.
pass
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we have a way to give more importance to coreset results or to heuristic results?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how to merge them.
The idea of this method is to use BALD to create a pool of potential candidates (ie. the top 10%) and select from them with a coreset. I will remove it as we don't know if it works.

Copy link
Collaborator

@parmidaatg parmidaatg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is a nice proposal. We'll see in action what other changes needed. I'll merge

@parmidaatg parmidaatg merged commit d085f67 into master Nov 5, 2020
@parmidaatg parmidaatg deleted the Dref360-patch-1 branch November 5, 2020 18:51
@GeorgePearse
Copy link
Collaborator

Did this idea get anywhere in the end? e.g. a weighted balance between coreset selection and some other heuristic (e.g. BALD) ?

@GeorgePearse
Copy link
Collaborator

I'd be very tempted to work on it. With the view that we'd be creating a class to combine any two (or more?) strategies with some weighting.

Independently of that I think being able to intialize with coreset (and other diversity metrics) with a very simple API would be nice for beginners.

e.g. for the minute I have

baal_data_module.active_set.label_randomly(hyperparams.initial_pool)

it would be nice to have

baal_data_module.active_set.label_init(hyperparams.initial_pool, strategy='coreset')

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants