Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

targetless datapacks #18

Open
kowey opened this issue Mar 20, 2015 · 2 comments
Open

targetless datapacks #18

kowey opened this issue Mar 20, 2015 · 2 comments

Comments

@kowey
Copy link
Contributor

kowey commented Mar 20, 2015

Observation that @moreymat made: for hygienic reasons, we really should not expose the datapack targets during decoding time. This is actually a bit trickier to implement than I'd anticipated.

Sure we could slap in something like

    def targetless(self):
        '''
        Return a variant of the datapack in which the target has
        been set to None.

        This is information that really should not be visible to
        during decoding time and is here for hygienic purposes

        :rtype: DataPack
        '''
        return DataPack(edus=self.edus,
                        pairings=self.pairings,
                        data=self.data,
                        target=None,
                        labels=self.labels)

The problem at the moment is the oracles. Our implementation of oracles is actually pretty crude. If the model is literally the string 'oracle', we return the datapack target… so the oracles for now need to see the targets.

OK so in principle you could say that Oracles should really be learners like any other, implementing some sort of fit function that trivially memorises the target. That's fine except that you also need to introduce another layer of bureaucracy to your test harnesses that “learns” the oracle on the test data instead of the training data… yuck

So I'm not sure which is worse.

@moreymat
Copy link
Contributor

I think we can live with the possibility of a contamination for now.
We can enforce the separation between data and targets later.

2015-03-20 9:23 GMT+01:00 Eric Kow notifications@github.com:

Observation that @moreymat https://github.com/moreymat made: for
hygienic reasons, we really should not expose the datapack targets during
decoding time. This is actually a bit trickier to implement than I'd
anticipated.

Sure we could slap in something like

def targetless(self):
    '''        Return a variant of the datapack in which the target has        been set to None.        This is information that really should not be visible to        during decoding time and is here for hygienic purposes        :rtype: DataPack        '''
    return DataPack(edus=self.edus,
                    pairings=self.pairings,
                    data=self.data,
                    target=None,
                    labels=self.labels)

The problem at the moment is the oracles. Our implementation of oracles is
actually pretty crude. If the model is literally the string 'oracle', we
return the datapack target… so the oracles for now need to see the targets.

OK so in principle you could say that Oracles should really be learners
like any other, implementing some sort of fit function that trivially
memorises the target. That's fine except that you also need to
introduce another layer of bureaucracy to your test harnesses that “learns”
the oracle on the test data instead of the training data… yuck

So I'm not sure which is worse.


Reply to this email directly or view it on GitHub
https://github.com/kowey/attelo/issues/18.

@kowey
Copy link
Contributor Author

kowey commented May 21, 2015

Maybe one way to deal with this is to make target private by convention (._target).
The oracles can grab them, but it's clear from the API that we Don't Approve otherwise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants