Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Export and run LLMs in C++ #1197

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open

Export and run LLMs in C++ #1197

wants to merge 4 commits into from

Conversation

awni
Copy link
Member

@awni awni commented Jan 9, 2025

Comments:

  • So far only tested and working with model_type="llama"
  • Requires MLX PR Dynamic broadcasting for shapeless compile/export mlx#1722
  • C++ does tokenization and streaming detokenization but no chat templates
  • C++ is same speed as Python.
  • Compilation helps both C++ and Python. On M1 Max it's about 0.7 toks/sec for 4-bit LLama 3 8B. Toks/sec goes from 61.5 to 62.2.

llms/export/tokenizer.cpp Show resolved Hide resolved
llms/export/tokenizer.cpp Outdated Show resolved Hide resolved
Copy link
Member

@angeloskath angeloskath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks and works great!

(std::chrono::duration_cast<std::chrono::nanoseconds>(x).count() / 1e9)
#define time_now() std::chrono::high_resolution_clock::now()

// Maybe compile
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are already doing that :-)

splits.push_back(segment.size());

while (one_step_merge(segment, splits)) {
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, any particular reason that you didn't write this in one line and omitted the ;?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants