targetless datapacks #18

kowey · 2015-03-20T08:23:34Z

Observation that @moreymat made: for hygienic reasons, we really should not expose the datapack targets during decoding time. This is actually a bit trickier to implement than I'd anticipated.

Sure we could slap in something like

    def targetless(self):
        '''
        Return a variant of the datapack in which the target has
        been set to None.

        This is information that really should not be visible to
        during decoding time and is here for hygienic purposes

        :rtype: DataPack
        '''
        return DataPack(edus=self.edus,
                        pairings=self.pairings,
                        data=self.data,
                        target=None,
                        labels=self.labels)

The problem at the moment is the oracles. Our implementation of oracles is actually pretty crude. If the model is literally the string 'oracle', we return the datapack target… so the oracles for now need to see the targets.

OK so in principle you could say that Oracles should really be learners like any other, implementing some sort of fit function that trivially memorises the target. That's fine except that you also need to introduce another layer of bureaucracy to your test harnesses that “learns” the oracle on the test data instead of the training data… yuck

So I'm not sure which is worse.

moreymat · 2015-03-20T08:50:18Z

I think we can live with the possibility of a contamination for now.
We can enforce the separation between data and targets later.

2015-03-20 9:23 GMT+01:00 Eric Kow notifications@github.com:

Observation that @moreymat https://github.com/moreymat made: for
hygienic reasons, we really should not expose the datapack targets during
decoding time. This is actually a bit trickier to implement than I'd
anticipated.

Sure we could slap in something like
def targetless(self):
    '''        Return a variant of the datapack in which the target has        been set to None.        This is information that really should not be visible to        during decoding time and is here for hygienic purposes        :rtype: DataPack        '''
    return DataPack(edus=self.edus,
                    pairings=self.pairings,
                    data=self.data,
                    target=None,
                    labels=self.labels)
The problem at the moment is the oracles. Our implementation of oracles is
actually pretty crude. If the model is literally the string 'oracle', we
return the datapack target… so the oracles for now need to see the targets.

OK so in principle you could say that Oracles should really be learners
like any other, implementing some sort of fit function that trivially
memorises the target. That's fine except that you also need to
introduce another layer of bureaucracy to your test harnesses that “learns”
the oracle on the test data instead of the training data… yuck

So I'm not sure which is worse.

—
Reply to this email directly or view it on GitHub
https://github.com/kowey/attelo/issues/18.

kowey · 2015-05-21T18:40:16Z

Maybe one way to deal with this is to make target private by convention (._target).
The oracles can grab them, but it's clear from the API that we Don't Approve otherwise.

kowey added the enhancement label Mar 20, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

targetless datapacks #18

targetless datapacks #18

kowey commented Mar 20, 2015

moreymat commented Mar 20, 2015

kowey commented May 21, 2015

targetless datapacks #18

targetless datapacks #18

Comments

kowey commented Mar 20, 2015

moreymat commented Mar 20, 2015

kowey commented May 21, 2015