WIP as at Sep 2024
A collection of notable foundational models, including language models, multimodal models, and vision models.
https://huggingface.co/collections/dylanhogg
https://github.com/uncbiag/Awesome-Foundation-Models
https://huggingface.co/collections/taufiqdp/llms-658d4b6c8ea59eaa35665e60
https://github.com/Lightning-AI/litgpt#choose-from-20-llms
LMSYS Chatbot Arena Leaderboard
Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference
Papers with Code: State-of-the-art (SOTA) results for all tasks
https://cloud.google.com/model-garden
https://ai.azure.com/explore/models
https://aws.amazon.com/bedrock/
https://en.wikipedia.org/wiki/Large_language_model#List
https://huggingface.co/facebook
https://arxiv.org/abs/2407.21783 - The Llama 3 Herd of Models
https://en.wikipedia.org/wiki/Llama_(language_model)
https://huggingface.co/papers/2407.21783
https://huggingface.co/models?other=llama-3
https://github.com/meta-llama/llama3
https://ai.meta.com/blog/meta-llama-3/
https://arxiv.org/abs/2405.09818 - Chameleon: Mixed-Modal Early-Fusion Foundation Models
https://huggingface.co/collections/facebook/chameleon-668da9663f80d483b4c61f58
https://github.com/facebookresearch/chameleon
https://huggingface.co/collections/apple
https://huggingface.co/EPFL-VILAB
https://arxiv.org/abs/2407.21075 - Apple Intelligence Foundation Language Models
https://machinelearning.apple.com/research/apple-intelligence-foundation-language-models
https://machinelearning.apple.com/research/introducing-apple-foundation-models
https://arxiv.org/abs/2406.09406 - 4M-21: An Any-to-Any Vision Model for Tens of Tasks and Modalities
https://arxiv.org/abs/2312.06647 - 4M: Massively Multimodal Masked Modeling
https://github.com/apple/ml-4m/
https://huggingface.co/collections/EPFL-VILAB/4m-models-660193abe3faf4b4d98a2742
https://huggingface.co/collections/google
A Family of Highly Capable Multimodal Models
https://arxiv.org/abs/2312.11805 - Gemini: A Family of Highly Capable Multimodal Models
https://arxiv.org/pdf/2403.05530 - Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
https://en.wikipedia.org/wiki/Gemini_(language_model)
https://deepmind.google/technologies/gemini/
A family of lightweight, open, text-to-text, decoder-only LLMs, built from the research and technology used to create Gemini models
https://arxiv.org/abs/2408.00118 - Gemma 2: Improving Open Language Models at a Practical Size
https://arxiv.org/abs/2403.08295 - Gemma: Open Models Based on Gemini Research and Technology
https://ai.google.dev/gemma/docs/model_card_2
https://huggingface.co/collections/google/gemma-2-release-667d6600fd5220e7b967f315
PaliGemma is a powerful open VLM inspired by PaLI-3
https://arxiv.org/abs/2407.07726 - PaliGemma: A versatile 3B VLM for transfer
https://huggingface.co/collections/google/paligemma-release-6643a9ffbf57de2ae0448dda
https://ai.google.dev/gemma/docs/paligemma
https://www.kaggle.com/models/google/paligemma
https://github.com/google-research/big_vision/blob/main/big_vision/configs/proj/paligemma/README.md
CodeGemma is a collection of powerful, lightweight models that can perform a variety of coding tasks like fill-in-the-middle code completion, code generation, natural language understanding, mathematical reasoning, and instruction following.
https://arxiv.org/abs/2406.11409 - CodeGemma: Open Code Models Based on Gemma
https://huggingface.co/blog/codegemma
https://huggingface.co/collections/google/codegemma-release-66152ac7b683e2667abdee11
https://ai.google.dev/gemma/docs/codegemma
https://www.kaggle.com/models/google/codegemma
https://huggingface.co/microsoft
Phi-3 is a family of open AI models developed by Microsoft.
https://arxiv.org/abs/2404.14219 - Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
https://huggingface.co/collections/microsoft/phi-3-6626e15e9585a200d2d761e3
https://azure.microsoft.com/en-us/products/phi-3
https://azure.microsoft.com/en-us/blog/introducing-phi-3-redefining-whats-possible-with-slms/
https://onnxruntime.ai/blogs/accelerating-phi-3
A novel vision foundation model with a unified, prompt-based representation for a variety of computer vision and vision-language tasks.
https://arxiv.org/abs/2311.06242 - Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks
https://huggingface.co/collections/microsoft/florence-6669f44df0d87d9c3bfb76de
(Florence uses DaViT as the vision encoder) https://arxiv.org/abs/2204.03645 - DaViT: Dual Attention Vision Transformers
DeepSeek (深度求索), founded in 2023, is a Chinese company dedicated to making AGI a reality.
Unravel the mystery of AGI with curiosity. Answer the essential question with long-termism.
https://github.com/deepseek-ai
https://huggingface.co/deepseek-ai
https://arxiv.org/search/cs?searchtype=author&query=DeepSeek-AI
https://github.com/huggingface/open-r1 (Fully open reproduction of DeepSeek-R1 by Huggingface)
https://www.chinatalk.media/p/deepseek-ceo-interview-with-chinas
https://composio.dev/blog/notes-on-the-new-deepseek-r1/
DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token.
https://github.com/deepseek-ai/DeepSeek-V3
https://huggingface.co/deepseek-ai/DeepSeek-V3
https://huggingface.co/collections/deepseek-ai/deepseek-v3-676bc4546fb4876383c4208b
https://arxiv.org/abs/2412.19437 - DeepSeek-V3 Technical Report
Our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1.
https://github.com/deepseek-ai/DeepSeek-R1
https://huggingface.co/deepseek-ai/DeepSeek-R1
https://huggingface.co/collections/deepseek-ai/deepseek-r1-678e1e131c0169c0bc89728d
https://arxiv.org/abs/2501.12948 - DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
https://huggingface.co/blog/idefics2
https://huggingface.co/HuggingFaceM4/idefics2-8b
https://molmo.allenai.org/blog
https://huggingface.co/collections/allenai/molmo-66f379e6fe3b8ef090a8ca19
https://www.arxiv.org/abs/2409.17146
TODO
https://huggingface.co/NousResearch
https://nousresearch.com/releases/
Explainable & Accessible AI
https://huggingface.co/nomic-ai
https://huggingface.co/nomic-ai/modernbert-embed-base
https://huggingface.co/nomic-ai/nomic-embed-text-v1.5
TODO
https://lmsys.org/blog/2023-03-30-vicuna/
https://huggingface.co/lmsys/vicuna-7b-v1.5
https://en.wikipedia.org/wiki/Vicuna_LLM
TODO
https://huggingface.co/collections/Qwen/qwen-65c0e50c3f1ab89cb8704144
https://huggingface.co/collections/Qwen/qwen2-6659360b33528ced941e557f
https://qwenlm.github.io/blog/qwen2/
https://github.com/QwenLM/Qwen2
https://arxiv.org/abs/2407.10671 (Qwen2 Technical Report)
https://huggingface.co/collections/Qwen/qwen2-vl-66cee7455501d7126940800d
https://github.com/QwenLM/Qwen2-VL
https://qwenlm.github.io/blog/qwen2-vl/
TODO
https://en.wikipedia.org/wiki/Mistral_AI
https://mistral.ai/technology/#models
https://docs.mistral.ai/getting-started/open_weight_models/
https://arxiv.org/abs/2310.06825 - Mistral 7B https://huggingface.co/papers/2310.06825
https://huggingface.co/Anthropic
https://www.anthropic.com/research
TODO
https://en.wikipedia.org/wiki/Cohere
https://docs.cohere.com/docs/the-cohere-platform
TODO
TODO
https://huggingface.co/Twitter
Introducing Grok-1.5, our latest model capable of long context understanding and advanced reasoning.
https://huggingface.co/xai-org/grok-1
https://github.com/xai-org/grok-1
The Technology Innovation Institute (TII) is a leading global research center dedicated to pushing the frontiers of knowledge.
https://arxiv.org/abs/2311.16867 - The Falcon Series of Open Language Models
https://arxiv.org/abs/2407.14885 - Falcon2-11B Technical Report
https://huggingface.co/collections/tiiuae/falcon2-6641c2f0b98ddf3fe49b4012
Nemotron-4: open models for Synthetic Data Generation (SDG). Includes Base, Instruct, and Reward models.
https://arxiv.org/abs/2406.11704 - Nemotron-4 340B Technical Report
https://huggingface.co/collections/nvidia/nemotron-4-340b-666b7ebaf1b3867caf2f1911
https://research.nvidia.com/publication/2024-06_nemotron-4-340b
The Nemotron 3 8B Family of models is optimized for building production-ready generative AI applications for the enterprise.
https://huggingface.co/collections/nvidia/nemotron-3-8b-6553adeb226f6ab4ffc356f9
A family of compressed models obtained via pruning and knowledge distillation
https://arxiv.org/abs/2407.14679
https://huggingface.co/collections/nvidia/minitron-669ac727dc9c86e6ab7f0f3e
https://en.wikipedia.org/wiki/AI21_Labs
Jamba is an open-weights large language model (LLM) developed by AI21 Labs using the Mamba deep learning architecture
https://en.wikipedia.org/wiki/Jamba_(language_model) https://www.ai21.com/jamba
TODO
Cartesia is building next-gen foundation models using new subquadratic architectures
TODO
https://huggingface.co/cartesia-ai
https://huggingface.co/cartesia-ai/Rene-v0.1-1.3b-pytorch
TODO
TODO
TODO
TODO