Skip to content

release-v1.0.1

Compare
Choose a tag to compare
@yhcvb yhcvb released this 09 May 09:37
· 6 commits to main since this release
  • Optimize model conversion memory occupation
  • Optimize inference memory occupation
  • Increase prefill speed
  • Reduce initialization time
  • Improve quantization accuracy
  • Add support for Gemma, ChatGLM3, MiniCPM, InternLM2, and Phi-3
  • Add Server invocation
  • Add inference interruption interface
  • Add logprob and token_id to the return value