From 86a7bf11ef9e2333eaa3137a87a07643fb87e96b Mon Sep 17 00:00:00 2001
From: Yuan-Man <68322456+Yuan-ManX@users.noreply.github.com>
Date: Sun, 28 Jul 2024 10:46:58 +0800
Subject: [PATCH] Update index.md

---
 index.md | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/index.md b/index.md
index 2e1ae19..ce1bc28 100644
--- a/index.md
+++ b/index.md
@@ -55,6 +55,7 @@ Here we will keep track of the latest AI Game Development Tools, including LLM,
 | [CogVLM](https://www.modelscope.cn/models/ZhipuAI/CogVLM/summary)                              | CogVLM, a powerful open-source visual language foundation model.                                                                 |[arXiv](https://arxiv.org/abs/2311.03079)  |             |   Tool   |
 | [CoreNet](https://github.com/apple/corenet)                                                    | A library for training deep neural networks.                                                                                                                                                   |            |            |   Tool   |
 | [DBRX](https://github.com/databricks/dbrx)                                                     | DBRX is a large language model trained by Databricks.                                                                                                                                          |          |              |   Tool   |
+| [DCLM](https://github.com/mlfoundations/dclm)                                                  | DataComp for Language Models.                                                                                                    |[arXiv](https://arxiv.org/abs/2406.11794)  |             |   Tool   |
 | [DemoGPT](https://github.com/melih-unsal/DemoGPT)                                              | Auto Gen-AI App Generator with the Power of Llama 2                                                                                                                                            |          |              |   Tool   |
 | [Design2Code](https://github.com/NoviScl/Design2Code)                                          | Automating Front-End Engineering                                                                                                                                                               |          |              |   Tool   |
 | [Devika](https://github.com/stitionai/devika)                                                  | Devika is an Agentic AI Software Engineer.                                                                                                                                                     |          |              |   Tool   |
@@ -88,6 +89,7 @@ Here we will keep track of the latest AI Game Development Tools, including LLM,
 | [Lit-LLaMA](https://github.com/Lightning-AI/lit-llama)                                         | Implementation of the LLaMA language model based on nanoGPT. Supports flash attention, Int8 and GPTQ 4bit quantization, LoRA and LLaMA-Adapter fine-tuning, pre-training.                   |          |              |   Tool   |
 | [llama2-webui](https://github.com/liltom-eth/llama2-webui)                                     | Run Llama 2 locally with gradio UI on GPU or CPU from anywhere (Linux/Windows/Mac).                                                                                                            |          |              |   Tool   |
 | [Llama 3](https://github.com/meta-llama/llama3)                                                | The official Meta Llama 3 GitHub site.                                                                                                                                                         |          |              |   Tool   |
+| [Llama 3.1](https://github.com/meta-llama/llama-models)                                        | Llama is an accessible, open large language model (LLM) designed for developers, researchers, and businesses to build, experiment, and responsibly scale their generative AI ideas.                                                                                                                                                         |          |              |   Tool   |
 | [LLaSM](https://github.com/LinkSoul-AI/LLaSM)                                                  | Large Language and Speech Model.                                                                                                                                                               |          |              |   Tool   |
 | [LLM Answer Engine](https://github.com/developersdigest/llm-answer-engine)                     | Build a Perplexity-Inspired Answer Engine Using Next.js, Groq, Mixtral, Langchain, OpenAI, Brave & Serper.                                                                              |           |             |   Tool   |
 | [llm.c](https://github.com/karpathy/llm.c)                                                     | LLM training in simple, raw C/CUDA.                                                                                                                                                            |          |              |   Tool   |
@@ -185,10 +187,12 @@ Here we will keep track of the latest AI Game Development Tools, including LLM,
 | [gigax](https://github.com/GigaxGames/gigax)                                                   | Runtime, LLM-powered NPCs.                                                                                                                                                               |       |              |   Game   |
 | [HippoRAG](https://github.com/OSU-NLP-Group/HippoRAG)                                       | Neurobiologically Inspired Long-Term Memory for Large Language Models.                                                                   |[arXiv](https://arxiv.org/abs/2405.14831)  |              |   Agent   |
 | [Interactive LLM Powered NPCs](https://github.com/AkshitIreddy/Interactive-LLM-Powered-NPCs)   | Interactive LLM Powered NPCs, is an open-source project that completely transforms your interaction with non-player characters (NPCs) in any game!                                    |        |              |   Game   |
+| [IoA](https://github.com/OpenBMB/IoA)                                                          | An open-source framework for collaborative AI agents, enabling diverse, distributed agents to team up and tackle complex tasks through internet-like connectivity.                      |  |              |   Agent   |
 | [KwaiAgents](https://github.com/KwaiKEG/KwaiAgents)                                            | A generalized information-seeking agent system with Large Language Models (LLMs).                                                     |[arXiv](https://arxiv.org/abs/2312.04889)  |              |   Agent  |
 | [LangChain](https://github.com/langchain-ai/langchain)                                         | Get your LLM application from prototype to production.                                                                                                                                  |        |              |   Agent  |
 | [Langflow](https://github.com/logspace-ai/langflow)                                            | Langflow is a UI for LangChain, designed with react-flow to provide an effortless way to experiment and prototype flows.                                                               |        |              |   Agent  |
 | [LARP](https://github.com/MiAO-AI-Lab/LARP)                                                    | Language-Agent Role Play for open-world games.                                                                                  |[arXiv](https://arxiv.org/abs/2312.17653)  |              |   Agent  |
+| [LLama Agentic System](https://github.com/meta-llama/llama-agentic-system)                     | Agentic components of the Llama Stack APIs.                                                                                                                                               |      |              |   Agent  |
 | [LlamaIndex](https://github.com/run-llama/llama_index)                                         | LlamaIndex is a data framework for your LLM application.                                                                                                                                  |      |              |   Agent  |
 | [Mixture of Agents (MoA)](https://github.com/togethercomputer/MoA)                             | Mixture-of-Agents Enhances Large Language Model Capabilities.                                                                   |[arXiv](https://arxiv.org/abs/2406.04692)  |              |   Agent  |
 | [Moonlander.ai](https://www.moonlander.ai/)                                                    | Start building 3D games without any coding using generative AI.                                                                                                                          |       |              | Framework |
@@ -314,6 +318,7 @@ Here we will keep track of the latest AI Game Development Tools, including LLM,
 | [PuLID](https://github.com/ToTheBeginning/PuLID)                                               | Pure and Lightning ID Customization via Contrastive Alignment.                                                                  |[arXiv](https://arxiv.org/abs/2404.16022)  |              |   Image   |
 | [Rich-Text-to-Image](https://github.com/SongweiGe/rich-text-to-image)                          | Expressive Text-to-Image Generation with Rich Text.                                                                             |[arXiv](https://arxiv.org/abs/2304.06720)  |              |   Image   |
 | [RPG-DiffusionMaster](https://github.com/YangLing0818/RPG-DiffusionMaster)                     | Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs (PRG).                                                                                  |          |              |   Image   |
+| [SEED-Story](https://github.com/TencentARC/SEED-Story)                                         | SEED-Story: Multimodal Long Story Generation with Large Language Model.                                                         |[arXiv](https://arxiv.org/abs/2407.08683)  |              |   Image   |
 | [Segment Anything](https://segment-anything.com/)                                              | Segment Anything Model (SAM): a new AI model from Meta AI that can "cut out" any object , in any image , with a single click.   |[arXiv](https://arxiv.org/abs/2304.02643)  |              |   Image   |
 | [sd-webui-controlnet](https://github.com/Mikubill/sd-webui-controlnet)                         | WebUI extension for ControlNet.                                                                                                                                                        |          |              |   Image   |
 | [SDXL-Lightning](https://huggingface.co/ByteDance/SDXL-Lightning)                              | Progressive Adversarial Diffusion Distillation.                                                                                 |[arXiv](https://arxiv.org/abs/2402.13929)  |              |   Image   |
@@ -332,6 +337,8 @@ Here we will keep track of the latest AI Game Development Tools, including LLM,
 | [StreamDiffusion](https://github.com/cumulo-autumn/StreamDiffusion)                            | A Pipeline-Level Solution for Real-Time Interactive Generation.                                                                                                                        |          |              |   Image   |
 | [StyleDrop](https://styledrop.github.io/)                                                      | Text-To-Image Generation in Any Style.                                                                                          |[arXiv](https://arxiv.org/abs/2306.00983)  |              |   Image   |
 | [SyncDreamer](https://github.com/liuyuan-pal/SyncDreamer)                                      | Generating Multiview-consistent Images from a Single-view Image.                                                                |[arXiv](https://arxiv.org/abs/2309.03453)  |              |   Image   |
+| [UltraEdit](https://github.com/HaozheZhao/UltraEdit)                                           | UltraEdit: Instruction-based Fine-Grained Image Editing at Scale.                                                               |[arXiv](https://arxiv.org/abs/2407.05282)  |              |   Image   |
+| [UltraPixel](https://github.com/catcathh/UltraPixel)                                           | UltraPixel: Advancing Ultra-High-Resolution Image Synthesis to New Peaks.                                                       |[arXiv](https://arxiv.org/abs/2407.02158)  |              |   Image   |
 | [Unity ML Stable Diffusion](https://github.com/keijiro/UnityMLStableDiffusion)                 | Core ML Stable Diffusion on Unity.                                                                                                                                                    |           |     Unity     |   Image   |
 | [Vispunk Visions](https://vispunk.com/image)                                                   | Text-to-Image generation platform.                                                                                                                                                    |           |              |   Image   |
 
@@ -379,14 +386,17 @@ Here we will keep track of the latest AI Game Development Tools, including LLM,
 | [BlenderGPT](https://github.com/gd3kr/BlenderGPT)                                              | Use commands in English to control Blender with OpenAI's GPT-4.                                                                                         |                                          |    Blender    |   Model   |
 | [Blender-GPT](https://github.com/TREE-Ind/Blender-GPT)                                         | An all-in-one Blender assistant powered by GPT3/4 + Whisper integration.                                                                              |                                            |    Blender    |   Model   |
 | [Blockade Labs](https://www.blockadelabs.com/)                                                 | Digital alchemy is real with Skybox Lab - the ultimate AI-powered solution for generating incredible 360° skybox experiences from text prompts.         |                                          |              |   Model   |
+| [CF-3DGS](https://github.com/NVlabs/CF-3DGS)                                                   | COLMAP-Free 3D Gaussian Splatting.                                                                                              |[arXiv](https://arxiv.org/abs/2312.07504)  |              |   3D   |
 | [CharacterGen](https://github.com/zjp-shadow/CharacterGen)                                     | CharacterGen: Efficient 3D Character Generation from Single Images with Multi-View Pose Canonicalization.                     |[arXiv](https://arxiv.org/abs/2402.17214)  |              |   3D   |
 | [chatGPT-maya](https://github.com/LouisRossouw/chatGPT-maya)                                   | Simple Maya tool that utilizes open AI to perform basic tasks based on descriptive instructions.                                                |                                           |     Maya     |   Model   |
 | [CityDreamer](https://github.com/hzxie/city-dreamer)                                           | Compositional Generative Model of Unbounded 3D Cities.                                                                          |[arXiv](https://arxiv.org/abs/2309.00610)  |              |   3D   |
 | [CSM](https://www.csm.ai/)                                                                     | Generate 3D worlds from images and videos.                                                                                                             |                                           |              |   3D   |
 | [Dash](https://www.polygonflow.io/)                                                            | Your Copilot for World Building in Unreal Engine.                                                                                                     |                                            | Unreal Engine |   3D   |
+| [DreamCatalyst](https://github.com/kaist-cvml-lab/DreamCatalyst)                               | DreamCatalyst: Fast and High-Quality 3D Editing via Controlling Editability and Identity Preservation.                         |[arXiv](https://arxiv.org/abs/2407.11394)  |              |   3D   |
 | [DreamGaussian4D](https://github.com/jiawei-ren/dreamgaussian4d)                               | Generative 4D Gaussian Splatting.                                                                                               |[arXiv](https://arxiv.org/abs/2312.17142)  |              |   4D   |
 | [DUSt3R](https://github.com/naver/dust3r)                                                      | Geometric 3D Vision Made Easy.                                                                                                  |[arXiv](https://arxiv.org/abs/2312.14132)  |              |   3D   |
 | [GALA3D](https://github.com/VDIGPKU/GALA3D)                                                    | GALA3D: Towards Text-to-3D Complex Scene Generation via Layout-guided Generative Gaussian Splatting.                            |[arXiv](https://arxiv.org/abs/2402.07207)  |              |   3D   |
+| [GaussCtrl](https://github.com/ActiveVisionLab/gaussctrl)                                      | GaussCtrl: Multi-View Consistent Text-Driven 3D Gaussian Splatting Editing.                                                     |[arXiv](https://arxiv.org/abs/2403.08733)  |              |   3D   |
 | [GaussianCube](https://github.com/GaussianCube/GaussianCube)                                   | A Structured and Explicit Radiance Representation for 3D Generative Modeling.                                                   |[arXiv](https://arxiv.org/abs/2403.19655)  |              |   3D   |
 | [GaussianDreamer](https://github.com/hustvl/GaussianDreamer)                                   | Fast Generation from Text to 3D Gaussian Splatting with Point Cloud Priors.                                                     |[arXiv](https://arxiv.org/abs/2310.08529)  |              |   3D   |
 | [GenieLabs](https://www.genielabs.tech/)                                                       | Empower your game with AI-UGC.                                                                                                                         |                                           |              |   3D   |
@@ -448,6 +458,7 @@ Here we will keep track of the latest AI Game Development Tools, including LLM,
 | [GeneFace++](https://github.com/yerfor/GeneFacePlusPlus)                                       | Generalized and Stable Real-Time 3D Talking Face Generation.                                                                                                     |                                  |              |  Avatar  |
 | [Hallo](https://github.com/fudan-generative-vision/hallo)                                      | Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation.                                                        |[arXiv](https://arxiv.org/abs/2406.08801)  |              |  Avatar  |
 | [HeadSculpt](https://brandonhan.uk/HeadSculpt/)                                                | Crafting 3D Head Avatars with Text.                                                                                             |[arXiv](https://arxiv.org/abs/2306.03038)  |              |  Avatar  |
+| [IntrinsicAvatar](https://github.com/taconite/IntrinsicAvatar)                                 | IntrinsicAvatar: Physically Based Inverse Rendering of Dynamic Humans from Monocular Videos via Explicit Ray Tracing.      |[arXiv](https://arxiv.org/abs/2312.05210)  |              |  Avatar  |
 | [Linly-Talker](https://github.com/Kedreamix/Linly-Talker)                                      | Digital Avatar Conversational System.                                                                                                                          |                                    |              |  Avatar  |
 | [LivePortrait](https://github.com/KwaiVGI/LivePortrait)                                        | LivePortrait: Efficient Portrait Animation with Stitching and Retargeting Control.                                              |[arXiv](https://arxiv.org/abs/2407.03168)  |              |  Avatar  |
 | [MotionGPT](https://github.com/OpenMotionLab/MotionGPT)                                        | Human Motion as a Foreign Language, a unified motion-language generation model using LLMs.                                 |[arXiv](https://arxiv.org/abs/2306.14795)  |              |  Avatar  |
@@ -510,8 +521,10 @@ Here we will keep track of the latest AI Game Development Tools, including LLM,
 | [CoTracker](https://co-tracker.github.io/)                                                  | It is Better to Track Together.                                                                                                      |[arXiv](https://arxiv.org/abs/2307.07635)  |               | Visual |
 | [FaceHi](https://m.facehi.ai/)                                                              | It is Better to Track Together.                                                                                                                       |                                           |               | Visual |
 | [InternLM-XComposer2](https://github.com/InternLM/InternLM-XComposer)                       | InternLM-XComposer2 is a groundbreaking vision-language large model (VLLM) excelling in free-form text-image composition and comprehension.           |[arXiv](https://arxiv.org/abs/2404.06512)  |               | Visual |
+| [Kangaroo](https://github.com/KangarooGroup/Kangaroo)                                       | Kangaroo: A Powerful Video-Language Model Supporting Long-context Video Input.                                                                        |                                           |               | Visual |
 | [LGVI](https://jianzongwu.github.io/projects/rovi/)                                         | Towards Language-Driven Video Inpainting via Multimodal Large Language Models.                                                                         |                                           |               | Visual |
 | [LLaVA++](https://github.com/mbzuai-oryx/LLaVA-pp)                                          | Extending Visual Capabilities with LLaMA-3 and Phi-3.                                                                                                     |                                     |              |   Visual  |
+| [LongVA](https://github.com/EvolvingLMMs-Lab/LongVA)                                        | Long Context Transfer from Language to Vision.                                                                                        |[arXiv](https://arxiv.org/abs/2406.16852)  |              |   Visual  |
 | [MaskViT](https://maskedvit.github.io/)                                                     | Masked Visual Pre-Training for Video Prediction.                                                                                      |[arXiv](https://arxiv.org/abs/2206.11894)  |              | Visual |
 | [MiniCPM-Llama3-V 2.5](https://github.com/OpenBMB/MiniCPM-V)                                | A GPT-4V Level MLLM on Your Phone.                                                                                                                        |                                      |              |   Visual  |
 | [MoE-LLaVA](https://github.com/PKU-YuanGroup/MoE-LLaVA)                                     | Mixture of Experts for Large Vision-Language Models.                                                                                  |[arXiv](https://arxiv.org/abs/2401.15947)  |              |   Visual  |
@@ -519,10 +532,13 @@ Here we will keep track of the latest AI Game Development Tools, including LLM,
 | [PLLaVA](https://github.com/magic-research/PLLaVA)                                          | Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning.                                                      |[arXiv](https://arxiv.org/abs/2404.16994)  |              |   Visual  |
 | [Qwen-VL](https://github.com/QwenLM/Qwen-VL)                                                | A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond.                                          |[arXiv](https://arxiv.org/abs/2308.12966)  |              |   Visual  |
 | [ShareGPT4V](https://github.com/ShareGPT4Omni/ShareGPT4V)                                   | Improving Large Multi-modal Models with Better Captions.                                                                              |[arXiv](https://arxiv.org/abs/2311.12793)  |              |   Visual  |
+| [SOLO](https://github.com/Yangyi-Chen/SOLO)                                                 | SOLO: A Single Transformer for Scalable Vision-Language Modeling.                                                                     |[arXiv](https://arxiv.org/abs/2407.06438)  |              |   Visual  |
+| [Video-CCAM](https://github.com/QQ-MM/Video-CCAM)                                           | Video-CCAM: Advancing Video-Language Understanding with Causal Cross-Attention Masks.                                                                                          |  |              |   Visual  |
 | [Video-LLaVA](https://github.com/PKU-YuanGroup/Video-LLaVA)                                 | Learning United Visual Representation by Alignment Before Projection.                                                                 |[arXiv](https://arxiv.org/abs/2311.10122)  |              |   Visual  |
 | [VideoLLaMA 2](https://github.com/DAMO-NLP-SG/VideoLLaMA2)                                  | Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs.                                                            |[arXiv](https://arxiv.org/abs/2406.07476)  |              |   Visual  |
 | [Video-MME](https://github.com/BradyFU/Video-MME)                                           | The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis.                                              |[arXiv](https://arxiv.org/abs/2405.21075)  |              |   Visual  |
 | [Vitron](https://github.com/SkyworkAI/Vitron)                                               | A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing.                                                                      |                                      |              |   Visual  |
+| [VILA](https://github.com/NVlabs/VILA)                                                      | VILA: On Pre-training for Visual Language Models.                                                                                     |[arXiv](https://arxiv.org/abs/2312.07533)  |              |   Visual  |
 
 <p style="text-align: right;"><a href="#table-of-contents">^ Back to Contents ^</a></p>
 
@@ -664,12 +680,14 @@ Here we will keep track of the latest AI Game Development Tools, including LLM,
 | [Make-An-Audio 3](https://github.com/Text-to-Audio/Make-An-Audio-3)                            | Transforming Text into Audio via Flow-based Large Diffusion Transformers.                                                   |[arXiv](https://arxiv.org/abs/2305.18474)      |              |   Audio   |
 | [NeuralSound](https://github.com/hellojxt/NeuralSound)                                         | Learning-based Modal Sound Synthesis with Acoustic Transfer.                                                                |[arXiv](https://arxiv.org/abs/2108.07425)      |              |   Audio   |
 | [OptimizerAI](https://www.optimizerai.xyz/)                                                    | Sounds for Creators, Game makers, Artists, Video makers.                                                                    |            |              |   Audio   |
+| [Qwen2-Audio](https://github.com/QwenLM/Qwen2-Audio)                                           | Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.                                         |[arXiv](https://arxiv.org/abs/2407.10759)      |              |   Audio   |
 | [SEE-2-SOUND](https://github.com/see2sound/see2sound)                                          | Zero-Shot Spatial Environment-to-Spatial Sound.                                                                             |[arXiv](https://arxiv.org/abs/2406.06612)      |              |   Audio   |
 | [SoundStorm](https://google-research.github.io/seanet/soundstorm/examples/)                    | Efficient Parallel Audio Generation.                                                                                        |[arXiv](https://arxiv.org/abs/2305.09636)      |              |   Audio   |
 | [Stable Audio](https://www.stableaudio.com/)                                                   | Fast Timing-Conditioned Latent Audio Diffusion.                                                                                                                                      |            |              |   Audio   |
 | [Stable Audio Open](https://huggingface.co/stabilityai/stable-audio-open-1.0)                  | Stable Audio Open 1.0 generates variable-length (up to 47s) stereo audio at 44.1kHz from text prompts.                                                                              |            |              |   Audio   |
 | [SyncFusion](https://github.com/mcomunita/syncfusion)                                          | SyncFusion: Multimodal Onset-synchronized Video-to-Audio Foley Synthesis.                                                   |[arXiv](https://arxiv.org/abs/2310.15247)      |              |   Audio   |
 | [TANGO](https://github.com/declare-lab/tango)                                                  | Text-to-Audio Generation using Instruction Tuned LLM and Latent Diffusion Model.                                                                                                      |           |              |   Audio   |
+| [VTA-LDM](https://github.com/ariesssxu/vta-ldm)                                                | Video-to-Audio Generation with Hidden Alignment.                                                                            |[arXiv](https://arxiv.org/abs/2407.07464)      |              |   Audio   |
 | [WavJourney](https://github.com/Audio-AGI/WavJourney)                                          | Compositional Audio Creation with Large Language Models.                                                                    |[arXiv](https://arxiv.org/abs/2307.14335)      |              |   Audio   |
 
 <p style="text-align: right;"><a href="#table-of-contents">^ Back to Contents ^</a></p>