release-v1.0.1

yhcvb released this 09 May 09:37

· 6 commits to main since this release

d59d017

Optimize model conversion memory occupation
Optimize inference memory occupation
Increase prefill speed
Reduce initialization time
Improve quantization accuracy
Add support for Gemma, ChatGLM3, MiniCPM, InternLM2, and Phi-3
Add Server invocation
Add inference interruption interface
Add logprob and token_id to the return value

Assets 2