Foundational Models

CodeGemma is a collection of powerful, lightweight models that can perform a variety of coding tasks like fill-in-the-middle code completion, code generation, natural language understanding, mathematical reasoning, and instruction following.

https://arxiv.org/abs/2406.11409 - CodeGemma: Open Code Models Based on Gemma

https://huggingface.co/blog/codegemma

https://huggingface.co/collections/google/codegemma-release-66152ac7b683e2667abdee11

https://ai.google.dev/gemma/docs/codegemma

https://www.kaggle.com/models/google/codegemma

Company: Microsoft

https://huggingface.co/microsoft

Phi-3

Phi-3 is a family of open AI models developed by Microsoft.

https://arxiv.org/abs/2404.14219 - Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

https://huggingface.co/collections/microsoft/phi-3-6626e15e9585a200d2d761e3

https://azure.microsoft.com/en-us/products/phi-3

https://azure.microsoft.com/en-us/blog/introducing-phi-3-redefining-whats-possible-with-slms/

https://onnxruntime.ai/blogs/accelerating-phi-3

Florence-2

A novel vision foundation model with a unified, prompt-based representation for a variety of computer vision and vision-language tasks.

https://arxiv.org/abs/2311.06242 - Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks

https://huggingface.co/collections/microsoft/florence-6669f44df0d87d9c3bfb76de

(Florence uses DaViT as the vision encoder) https://arxiv.org/abs/2204.03645 - DaViT: Dual Attention Vision Transformers

Company: DeepSeek

DeepSeek (深度求索), founded in 2023, is a Chinese company dedicated to making AGI a reality.

Unravel the mystery of AGI with curiosity. Answer the essential question with long-termism.

https://www.deepseek.com/

https://github.com/deepseek-ai

https://huggingface.co/deepseek-ai

https://arxiv.org/search/cs?searchtype=author&query=DeepSeek-AI

DeepSeek Articles

https://github.com/huggingface/open-r1 (Fully open reproduction of DeepSeek-R1 by Huggingface)

https://www.chinatalk.media/p/deepseek-ceo-interview-with-chinas

https://composio.dev/blog/notes-on-the-new-deepseek-r1/

DeepSeek-V3

DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token.

https://github.com/deepseek-ai/DeepSeek-V3

https://huggingface.co/deepseek-ai/DeepSeek-V3

https://huggingface.co/collections/deepseek-ai/deepseek-v3-676bc4546fb4876383c4208b

https://arxiv.org/abs/2412.19437 - DeepSeek-V3 Technical Report

DeepSeek-R1

Our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1.

https://github.com/deepseek-ai/DeepSeek-R1

https://huggingface.co/deepseek-ai/DeepSeek-R1

https://huggingface.co/collections/deepseek-ai/deepseek-r1-678e1e131c0169c0bc89728d

https://arxiv.org/abs/2501.12948 - DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Company: Huggingface

Idefics2

https://huggingface.co/blog/idefics2

https://huggingface.co/HuggingFaceM4/idefics2-8b

Company: AllenAI

Molmo

https://molmo.org/

https://molmo.allenai.org/blog

https://huggingface.co/collections/allenai/molmo-66f379e6fe3b8ef090a8ca19

https://www.arxiv.org/abs/2409.17146

Company: NousResearch

TODO

https://huggingface.co/NousResearch

https://nousresearch.com/releases/

Company: Nomic AI

Explainable & Accessible AI

https://huggingface.co/nomic-ai

https://github.com/nomic-ai

https://www.nomic.ai/

https://x.com/nomic_ai

ModernBERT Embed

https://huggingface.co/nomic-ai/modernbert-embed-base

nomic-embed-text-v1.5: Resizable Production Embeddings with Matryoshka Representation Learning

https://huggingface.co/nomic-ai/nomic-embed-text-v1.5

Company: lmsys.org

TODO

https://huggingface.co/lmsys

https://lmsys.org/

Vincuna

https://lmsys.org/blog/2023-03-30-vicuna/

https://huggingface.co/lmsys/vicuna-7b-v1.5

https://en.wikipedia.org/wiki/Vicuna_LLM

Company: Alibaba

TODO

TODO

Company: Mistral

https://en.wikipedia.org/wiki/Mistral_AI

https://mistral.ai/technology/#models

https://docs.mistral.ai/getting-started/open_weight_models/

Mistral

https://arxiv.org/abs/2310.06825 - Mistral 7B https://huggingface.co/papers/2310.06825

Company: Anthropic

https://huggingface.co/Anthropic

https://www.anthropic.com/research

TODO

Company: Cohere

https://huggingface.co/Cohere

https://en.wikipedia.org/wiki/Cohere

https://docs.cohere.com/docs/the-cohere-platform

TODO

Company: OpenAI

https://huggingface.co/openai

TODO

Company: X

https://x.ai/

https://huggingface.co/Twitter

Grok-1.5

Introducing Grok-1.5, our latest model capable of long context understanding and advanced reasoning.

https://huggingface.co/xai-org/grok-1

https://github.com/xai-org/grok-1

https://x.ai/blog/grok-1.5

Company: TII

The Technology Innovation Institute (TII) is a leading global research center dedicated to pushing the frontiers of knowledge.

https://huggingface.co/tiiuae

https://www.tii.ae/

Falcon2

https://arxiv.org/abs/2311.16867 - The Falcon Series of Open Language Models

https://arxiv.org/abs/2407.14885 - Falcon2-11B Technical Report

https://falconllm.tii.ae/

https://huggingface.co/collections/tiiuae/falcon2-6641c2f0b98ddf3fe49b4012

Company: NVIDIA?

https://huggingface.co/nvidia

Nemotron-4 340B

Nemotron-4: open models for Synthetic Data Generation (SDG). Includes Base, Instruct, and Reward models.

https://arxiv.org/abs/2406.11704 - Nemotron-4 340B Technical Report

https://huggingface.co/collections/nvidia/nemotron-4-340b-666b7ebaf1b3867caf2f1911

https://developer.nvidia.com/blog/leverage-our-latest-open-models-for-synthetic-data-generation-with-nvidia-nemotron-4-340b/

https://research.nvidia.com/publication/2024-06_nemotron-4-340b

Nemotron 3 8B

The Nemotron 3 8B Family of models is optimized for building production-ready generative AI applications for the enterprise.

https://huggingface.co/collections/nvidia/nemotron-3-8b-6553adeb226f6ab4ffc356f9

Minitron

A family of compressed models obtained via pruning and knowledge distillation

https://arxiv.org/abs/2407.14679

https://huggingface.co/collections/nvidia/minitron-669ac727dc9c86e6ab7f0f3e

Company: AI21 Labs

https://en.wikipedia.org/wiki/AI21_Labs

Jamba

Jamba is an open-weights large language model (LLM) developed by AI21 Labs using the Mamba deep learning architecture

https://en.wikipedia.org/wiki/Jamba_(language_model) https://www.ai21.com/jamba

Company: Stability AI

TODO

Company: Cartesia.AI

Cartesia is building next-gen foundation models using new subquadratic architectures

TODO

https://huggingface.co/cartesia-ai

https://cartesia.ai/

Rene

https://huggingface.co/cartesia-ai/Rene-v0.1-1.3b-pytorch

Company: Databricks

TODO

Company: Salesforce

TODO

Company: IBM

TODO

Company: Intel?

TODO

Files

Foundational Models.md

Latest commit

History

Foundational Models.md

File metadata and controls

Foundational Models

External links

Leaderboards

Serving Providers

Models By Company

Company: Meta

The Llama 3 Herd of Models

Chameleon

Company: Apple

Apple Intelligence Foundation Language Models

4M Massively Multimodal Masked Modeling

Company: Google

Gemini

Gemma

PaliGemma

CodeGemma

Company: Microsoft

Phi-3

Florence-2

Company: DeepSeek

DeepSeek Articles

DeepSeek-V3

DeepSeek-R1

Company: Huggingface

Idefics2

Company: AllenAI

Molmo

Company: NousResearch

Company: Nomic AI

ModernBERT Embed

nomic-embed-text-v1.5: Resizable Production Embeddings with Matryoshka Representation Learning

Company: lmsys.org

Vincuna

Company: Alibaba

Qwen

Qwen2

Qwen2 VL

Company: AWS

Company: Mistral

Mistral

Company: Anthropic

Company: Cohere

Company: OpenAI

Company: X

Grok-1.5

Company: TII

Falcon2

Company: NVIDIA?

Nemotron-4 340B

Nemotron 3 8B

Minitron

Company: AI21 Labs

Jamba

Company: Stability AI

Company: Cartesia.AI

Rene

Company: Databricks

Company: Salesforce

Company: IBM

Company: Intel?