Skip to content

Latest commit

 

History

History
527 lines (265 loc) · 12.3 KB

Foundational Models.md

File metadata and controls

527 lines (265 loc) · 12.3 KB

Foundational Models

WIP as at Sep 2024

A collection of notable foundational models, including language models, multimodal models, and vision models.

https://huggingface.co/collections/dylanhogg

External links

https://github.com/uncbiag/Awesome-Foundation-Models

https://huggingface.co/collections/taufiqdp/llms-658d4b6c8ea59eaa35665e60

https://github.com/Lightning-AI/litgpt#choose-from-20-llms

Leaderboards

LMSYS Chatbot Arena Leaderboard

Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference

Papers with Code: State-of-the-art (SOTA) results for all tasks

Serving Providers

https://cloud.google.com/model-garden

https://ai.azure.com/explore/models

https://aws.amazon.com/bedrock/

Models By Company

https://en.wikipedia.org/wiki/Large_language_model#List


Company: Meta

https://huggingface.co/facebook

The Llama 3 Herd of Models

https://arxiv.org/abs/2407.21783 - The Llama 3 Herd of Models

https://en.wikipedia.org/wiki/Llama_(language_model)

https://huggingface.co/papers/2407.21783

https://huggingface.co/models?other=llama-3

https://github.com/meta-llama/llama3

https://ai.meta.com/blog/meta-llama-3/

https://llama.meta.com/

Chameleon

https://arxiv.org/abs/2405.09818 - Chameleon: Mixed-Modal Early-Fusion Foundation Models

https://huggingface.co/collections/facebook/chameleon-668da9663f80d483b4c61f58

https://github.com/facebookresearch/chameleon


Company: Apple

https://huggingface.co/apple

https://huggingface.co/collections/apple

https://huggingface.co/EPFL-VILAB

Apple Intelligence Foundation Language Models

https://arxiv.org/abs/2407.21075 - Apple Intelligence Foundation Language Models

https://machinelearning.apple.com/research/apple-intelligence-foundation-language-models

https://machinelearning.apple.com/research/introducing-apple-foundation-models

4M Massively Multimodal Masked Modeling

https://arxiv.org/abs/2406.09406 - 4M-21: An Any-to-Any Vision Model for Tens of Tasks and Modalities

https://arxiv.org/abs/2312.06647 - 4M: Massively Multimodal Masked Modeling

https://4m.epfl.ch/

https://github.com/apple/ml-4m/

https://huggingface.co/collections/EPFL-VILAB/4m-models-660193abe3faf4b4d98a2742


Company: Google

https://huggingface.co/google

https://huggingface.co/collections/google

Gemini

A Family of Highly Capable Multimodal Models

https://arxiv.org/abs/2312.11805 - Gemini: A Family of Highly Capable Multimodal Models

https://arxiv.org/pdf/2403.05530 - Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

https://en.wikipedia.org/wiki/Gemini_(language_model)

https://deepmind.google/technologies/gemini/

Gemma

A family of lightweight, open, text-to-text, decoder-only LLMs, built from the research and technology used to create Gemini models

https://ai.google.dev/gemma/

https://arxiv.org/abs/2408.00118 - Gemma 2: Improving Open Language Models at a Practical Size

https://arxiv.org/abs/2403.08295 - Gemma: Open Models Based on Gemini Research and Technology

https://ai.google.dev/gemma/docs/model_card_2

https://huggingface.co/collections/google/gemma-2-release-667d6600fd5220e7b967f315

PaliGemma

PaliGemma is a powerful open VLM inspired by PaLI-3

https://arxiv.org/abs/2407.07726 - PaliGemma: A versatile 3B VLM for transfer

https://huggingface.co/collections/google/paligemma-release-6643a9ffbf57de2ae0448dda

https://ai.google.dev/gemma/docs/paligemma

https://www.kaggle.com/models/google/paligemma

https://github.com/google-research/big_vision/blob/main/big_vision/configs/proj/paligemma/README.md

CodeGemma

CodeGemma is a collection of powerful, lightweight models that can perform a variety of coding tasks like fill-in-the-middle code completion, code generation, natural language understanding, mathematical reasoning, and instruction following.

https://arxiv.org/abs/2406.11409 - CodeGemma: Open Code Models Based on Gemma

https://huggingface.co/blog/codegemma

https://huggingface.co/collections/google/codegemma-release-66152ac7b683e2667abdee11

https://ai.google.dev/gemma/docs/codegemma

https://www.kaggle.com/models/google/codegemma


Company: Microsoft

https://huggingface.co/microsoft

Phi-3

Phi-3 is a family of open AI models developed by Microsoft.

https://arxiv.org/abs/2404.14219 - Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

https://huggingface.co/collections/microsoft/phi-3-6626e15e9585a200d2d761e3

https://azure.microsoft.com/en-us/products/phi-3

https://azure.microsoft.com/en-us/blog/introducing-phi-3-redefining-whats-possible-with-slms/

https://onnxruntime.ai/blogs/accelerating-phi-3

Florence-2

A novel vision foundation model with a unified, prompt-based representation for a variety of computer vision and vision-language tasks.

https://arxiv.org/abs/2311.06242 - Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks

https://huggingface.co/collections/microsoft/florence-6669f44df0d87d9c3bfb76de

(Florence uses DaViT as the vision encoder) https://arxiv.org/abs/2204.03645 - DaViT: Dual Attention Vision Transformers


Company: DeepSeek

DeepSeek (深度求索), founded in 2023, is a Chinese company dedicated to making AGI a reality.

Unravel the mystery of AGI with curiosity. Answer the essential question with long-termism.

https://www.deepseek.com/

https://github.com/deepseek-ai

https://huggingface.co/deepseek-ai

https://arxiv.org/search/cs?searchtype=author&query=DeepSeek-AI

DeepSeek Articles

https://github.com/huggingface/open-r1 (Fully open reproduction of DeepSeek-R1 by Huggingface)

https://www.chinatalk.media/p/deepseek-ceo-interview-with-chinas

https://composio.dev/blog/notes-on-the-new-deepseek-r1/

DeepSeek-V3

DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token.

https://github.com/deepseek-ai/DeepSeek-V3

https://huggingface.co/deepseek-ai/DeepSeek-V3

https://huggingface.co/collections/deepseek-ai/deepseek-v3-676bc4546fb4876383c4208b

https://arxiv.org/abs/2412.19437 - DeepSeek-V3 Technical Report

DeepSeek-R1

Our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1.

https://github.com/deepseek-ai/DeepSeek-R1

https://huggingface.co/deepseek-ai/DeepSeek-R1

https://huggingface.co/collections/deepseek-ai/deepseek-r1-678e1e131c0169c0bc89728d

https://arxiv.org/abs/2501.12948 - DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning


Company: Huggingface

Idefics2

https://huggingface.co/blog/idefics2

https://huggingface.co/HuggingFaceM4/idefics2-8b


Company: AllenAI

Molmo

https://molmo.org/

https://molmo.allenai.org/blog

https://huggingface.co/collections/allenai/molmo-66f379e6fe3b8ef090a8ca19

https://www.arxiv.org/abs/2409.17146


Company: NousResearch

TODO

https://huggingface.co/NousResearch

https://nousresearch.com/releases/


Company: Nomic AI

Explainable & Accessible AI

https://huggingface.co/nomic-ai

https://github.com/nomic-ai

https://www.nomic.ai/

https://x.com/nomic_ai

ModernBERT Embed

https://huggingface.co/nomic-ai/modernbert-embed-base

nomic-embed-text-v1.5: Resizable Production Embeddings with Matryoshka Representation Learning

https://huggingface.co/nomic-ai/nomic-embed-text-v1.5


Company: lmsys.org

TODO

https://huggingface.co/lmsys

https://lmsys.org/

Vincuna

https://lmsys.org/blog/2023-03-30-vicuna/

https://huggingface.co/lmsys/vicuna-7b-v1.5

https://en.wikipedia.org/wiki/Vicuna_LLM


Company: Alibaba

TODO

Qwen

https://huggingface.co/Qwen

https://huggingface.co/collections/Qwen/qwen-65c0e50c3f1ab89cb8704144

Qwen2

https://huggingface.co/collections/Qwen/qwen2-6659360b33528ced941e557f

https://qwenlm.github.io/blog/qwen2/

https://github.com/QwenLM/Qwen2

https://arxiv.org/abs/2407.10671 (Qwen2 Technical Report)

Qwen2 VL

https://huggingface.co/collections/Qwen/qwen2-vl-66cee7455501d7126940800d

https://github.com/QwenLM/Qwen2-VL

https://qwenlm.github.io/blog/qwen2-vl/


Company: AWS

https://huggingface.co/amazon

TODO


Company: Mistral

https://en.wikipedia.org/wiki/Mistral_AI

https://mistral.ai/technology/#models

https://docs.mistral.ai/getting-started/open_weight_models/


Mistral

https://arxiv.org/abs/2310.06825 - Mistral 7B https://huggingface.co/papers/2310.06825


Company: Anthropic

https://huggingface.co/Anthropic

https://www.anthropic.com/research

TODO


Company: Cohere

https://huggingface.co/Cohere

https://en.wikipedia.org/wiki/Cohere

https://docs.cohere.com/docs/the-cohere-platform

TODO


Company: OpenAI

https://huggingface.co/openai

TODO


Company: X

https://x.ai/

https://huggingface.co/Twitter

Grok-1.5

Introducing Grok-1.5, our latest model capable of long context understanding and advanced reasoning.

https://huggingface.co/xai-org/grok-1

https://github.com/xai-org/grok-1

https://x.ai/blog/grok-1.5


Company: TII

The Technology Innovation Institute (TII) is a leading global research center dedicated to pushing the frontiers of knowledge.

https://huggingface.co/tiiuae

https://www.tii.ae/

Falcon2

https://arxiv.org/abs/2311.16867 - The Falcon Series of Open Language Models

https://arxiv.org/abs/2407.14885 - Falcon2-11B Technical Report

https://falconllm.tii.ae/

https://huggingface.co/collections/tiiuae/falcon2-6641c2f0b98ddf3fe49b4012


Company: NVIDIA?

https://huggingface.co/nvidia

Nemotron-4 340B

Nemotron-4: open models for Synthetic Data Generation (SDG). Includes Base, Instruct, and Reward models.

https://arxiv.org/abs/2406.11704 - Nemotron-4 340B Technical Report

https://huggingface.co/collections/nvidia/nemotron-4-340b-666b7ebaf1b3867caf2f1911

https://developer.nvidia.com/blog/leverage-our-latest-open-models-for-synthetic-data-generation-with-nvidia-nemotron-4-340b/

https://research.nvidia.com/publication/2024-06_nemotron-4-340b

Nemotron 3 8B

The Nemotron 3 8B Family of models is optimized for building production-ready generative AI applications for the enterprise.

https://huggingface.co/collections/nvidia/nemotron-3-8b-6553adeb226f6ab4ffc356f9

Minitron

A family of compressed models obtained via pruning and knowledge distillation

https://arxiv.org/abs/2407.14679

https://huggingface.co/collections/nvidia/minitron-669ac727dc9c86e6ab7f0f3e


Company: AI21 Labs

https://en.wikipedia.org/wiki/AI21_Labs


Jamba

Jamba is an open-weights large language model (LLM) developed by AI21 Labs using the Mamba deep learning architecture

https://en.wikipedia.org/wiki/Jamba_(language_model) https://www.ai21.com/jamba


Company: Stability AI

TODO


Company: Cartesia.AI

Cartesia is building next-gen foundation models using new subquadratic architectures

TODO

https://huggingface.co/cartesia-ai

https://cartesia.ai/

Rene

https://huggingface.co/cartesia-ai/Rene-v0.1-1.3b-pytorch


Company: Databricks

TODO


Company: Salesforce

TODO


Company: IBM

TODO


Company: Intel?

TODO