Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bug in BPE cache #12

Merged
merged 1 commit into from
Jan 13, 2023
Merged

Conversation

andrew-healey
Copy link
Contributor

@andrew-healey andrew-healey commented Sep 20, 2022

bramses reported this bug first in #8.

Recreation

const {encode} = require('gpt-3-encoder')

const str = 'toString'
const encoded = encode(str)
// TypeError: bpe(...).split is not a function

Explanation

The bpe() method of the encoder uses a cache like so:

const cache = {};

// ...

function bpe(token){
  if(token in cache) {
    return cache[token];
  }

  // otherwise, find the right word for the token...

  cache[token] = word
  return word
}

But in has some unexpected behavior:

"toString" in {}; // true
({}).toString // f toString() { [native code] }

So calling this.bpe("toString") returns the function Object.prototype.toString. This causes an error.

Fix

I changed the implementation to use a JS Map, which is a better cache data structure that avoids this bug.

const cache = new Map;
cache.has("toString"); // false

@andrew-healey andrew-healey changed the title Fix bug with BPE cache Fix bug in BPE cache Sep 20, 2022
@NickHeiner
Copy link

I applied this PR locally, and confirmed that it fixes #8.

@NickHeiner
Copy link

I applied this in my fork: https://www.npmjs.com/package/@nick.heiner/gpt-3-encoder.

@asontha
Copy link

asontha commented Jan 13, 2023

Can we get this merged?

@nickwalton nickwalton merged commit 8d31919 into latitudegames:master Jan 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants