From 86a7bf11ef9e2333eaa3137a87a07643fb87e96b Mon Sep 17 00:00:00 2001 From: Yuan-Man <68322456+Yuan-ManX@users.noreply.github.com> Date: Sun, 28 Jul 2024 10:46:58 +0800 Subject: [PATCH] Update index.md --- index.md | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) diff --git a/index.md b/index.md index 2e1ae19..ce1bc28 100644 --- a/index.md +++ b/index.md @@ -55,6 +55,7 @@ Here we will keep track of the latest AI Game Development Tools, including LLM, | [CogVLM](https://www.modelscope.cn/models/ZhipuAI/CogVLM/summary) | CogVLM, a powerful open-source visual language foundation model. |[arXiv](https://arxiv.org/abs/2311.03079) | | Tool | | [CoreNet](https://github.com/apple/corenet) | A library for training deep neural networks. | | | Tool | | [DBRX](https://github.com/databricks/dbrx) | DBRX is a large language model trained by Databricks. | | | Tool | +| [DCLM](https://github.com/mlfoundations/dclm) | DataComp for Language Models. |[arXiv](https://arxiv.org/abs/2406.11794) | | Tool | | [DemoGPT](https://github.com/melih-unsal/DemoGPT) | Auto Gen-AI App Generator with the Power of Llama 2 | | | Tool | | [Design2Code](https://github.com/NoviScl/Design2Code) | Automating Front-End Engineering | | | Tool | | [Devika](https://github.com/stitionai/devika) | Devika is an Agentic AI Software Engineer. | | | Tool | @@ -88,6 +89,7 @@ Here we will keep track of the latest AI Game Development Tools, including LLM, | [Lit-LLaMA](https://github.com/Lightning-AI/lit-llama) | Implementation of the LLaMA language model based on nanoGPT. Supports flash attention, Int8 and GPTQ 4bit quantization, LoRA and LLaMA-Adapter fine-tuning, pre-training. | | | Tool | | [llama2-webui](https://github.com/liltom-eth/llama2-webui) | Run Llama 2 locally with gradio UI on GPU or CPU from anywhere (Linux/Windows/Mac). | | | Tool | | [Llama 3](https://github.com/meta-llama/llama3) | The official Meta Llama 3 GitHub site. | | | Tool | +| [Llama 3.1](https://github.com/meta-llama/llama-models) | Llama is an accessible, open large language model (LLM) designed for developers, researchers, and businesses to build, experiment, and responsibly scale their generative AI ideas. | | | Tool | | [LLaSM](https://github.com/LinkSoul-AI/LLaSM) | Large Language and Speech Model. | | | Tool | | [LLM Answer Engine](https://github.com/developersdigest/llm-answer-engine) | Build a Perplexity-Inspired Answer Engine Using Next.js, Groq, Mixtral, Langchain, OpenAI, Brave & Serper. | | | Tool | | [llm.c](https://github.com/karpathy/llm.c) | LLM training in simple, raw C/CUDA. | | | Tool | @@ -185,10 +187,12 @@ Here we will keep track of the latest AI Game Development Tools, including LLM, | [gigax](https://github.com/GigaxGames/gigax) | Runtime, LLM-powered NPCs. | | | Game | | [HippoRAG](https://github.com/OSU-NLP-Group/HippoRAG) | Neurobiologically Inspired Long-Term Memory for Large Language Models. |[arXiv](https://arxiv.org/abs/2405.14831) | | Agent | | [Interactive LLM Powered NPCs](https://github.com/AkshitIreddy/Interactive-LLM-Powered-NPCs) | Interactive LLM Powered NPCs, is an open-source project that completely transforms your interaction with non-player characters (NPCs) in any game! | | | Game | +| [IoA](https://github.com/OpenBMB/IoA) | An open-source framework for collaborative AI agents, enabling diverse, distributed agents to team up and tackle complex tasks through internet-like connectivity. | | | Agent | | [KwaiAgents](https://github.com/KwaiKEG/KwaiAgents) | A generalized information-seeking agent system with Large Language Models (LLMs). |[arXiv](https://arxiv.org/abs/2312.04889) | | Agent | | [LangChain](https://github.com/langchain-ai/langchain) | Get your LLM application from prototype to production. | | | Agent | | [Langflow](https://github.com/logspace-ai/langflow) | Langflow is a UI for LangChain, designed with react-flow to provide an effortless way to experiment and prototype flows. | | | Agent | | [LARP](https://github.com/MiAO-AI-Lab/LARP) | Language-Agent Role Play for open-world games. |[arXiv](https://arxiv.org/abs/2312.17653) | | Agent | +| [LLama Agentic System](https://github.com/meta-llama/llama-agentic-system) | Agentic components of the Llama Stack APIs. | | | Agent | | [LlamaIndex](https://github.com/run-llama/llama_index) | LlamaIndex is a data framework for your LLM application. | | | Agent | | [Mixture of Agents (MoA)](https://github.com/togethercomputer/MoA) | Mixture-of-Agents Enhances Large Language Model Capabilities. |[arXiv](https://arxiv.org/abs/2406.04692) | | Agent | | [Moonlander.ai](https://www.moonlander.ai/) | Start building 3D games without any coding using generative AI. | | | Framework | @@ -314,6 +318,7 @@ Here we will keep track of the latest AI Game Development Tools, including LLM, | [PuLID](https://github.com/ToTheBeginning/PuLID) | Pure and Lightning ID Customization via Contrastive Alignment. |[arXiv](https://arxiv.org/abs/2404.16022) | | Image | | [Rich-Text-to-Image](https://github.com/SongweiGe/rich-text-to-image) | Expressive Text-to-Image Generation with Rich Text. |[arXiv](https://arxiv.org/abs/2304.06720) | | Image | | [RPG-DiffusionMaster](https://github.com/YangLing0818/RPG-DiffusionMaster) | Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs (PRG). | | | Image | +| [SEED-Story](https://github.com/TencentARC/SEED-Story) | SEED-Story: Multimodal Long Story Generation with Large Language Model. |[arXiv](https://arxiv.org/abs/2407.08683) | | Image | | [Segment Anything](https://segment-anything.com/) | Segment Anything Model (SAM): a new AI model from Meta AI that can "cut out" any object , in any image , with a single click. |[arXiv](https://arxiv.org/abs/2304.02643) | | Image | | [sd-webui-controlnet](https://github.com/Mikubill/sd-webui-controlnet) | WebUI extension for ControlNet. | | | Image | | [SDXL-Lightning](https://huggingface.co/ByteDance/SDXL-Lightning) | Progressive Adversarial Diffusion Distillation. |[arXiv](https://arxiv.org/abs/2402.13929) | | Image | @@ -332,6 +337,8 @@ Here we will keep track of the latest AI Game Development Tools, including LLM, | [StreamDiffusion](https://github.com/cumulo-autumn/StreamDiffusion) | A Pipeline-Level Solution for Real-Time Interactive Generation. | | | Image | | [StyleDrop](https://styledrop.github.io/) | Text-To-Image Generation in Any Style. |[arXiv](https://arxiv.org/abs/2306.00983) | | Image | | [SyncDreamer](https://github.com/liuyuan-pal/SyncDreamer) | Generating Multiview-consistent Images from a Single-view Image. |[arXiv](https://arxiv.org/abs/2309.03453) | | Image | +| [UltraEdit](https://github.com/HaozheZhao/UltraEdit) | UltraEdit: Instruction-based Fine-Grained Image Editing at Scale. |[arXiv](https://arxiv.org/abs/2407.05282) | | Image | +| [UltraPixel](https://github.com/catcathh/UltraPixel) | UltraPixel: Advancing Ultra-High-Resolution Image Synthesis to New Peaks. |[arXiv](https://arxiv.org/abs/2407.02158) | | Image | | [Unity ML Stable Diffusion](https://github.com/keijiro/UnityMLStableDiffusion) | Core ML Stable Diffusion on Unity. | | Unity | Image | | [Vispunk Visions](https://vispunk.com/image) | Text-to-Image generation platform. | | | Image | @@ -379,14 +386,17 @@ Here we will keep track of the latest AI Game Development Tools, including LLM, | [BlenderGPT](https://github.com/gd3kr/BlenderGPT) | Use commands in English to control Blender with OpenAI's GPT-4. | | Blender | Model | | [Blender-GPT](https://github.com/TREE-Ind/Blender-GPT) | An all-in-one Blender assistant powered by GPT3/4 + Whisper integration. | | Blender | Model | | [Blockade Labs](https://www.blockadelabs.com/) | Digital alchemy is real with Skybox Lab - the ultimate AI-powered solution for generating incredible 360° skybox experiences from text prompts. | | | Model | +| [CF-3DGS](https://github.com/NVlabs/CF-3DGS) | COLMAP-Free 3D Gaussian Splatting. |[arXiv](https://arxiv.org/abs/2312.07504) | | 3D | | [CharacterGen](https://github.com/zjp-shadow/CharacterGen) | CharacterGen: Efficient 3D Character Generation from Single Images with Multi-View Pose Canonicalization. |[arXiv](https://arxiv.org/abs/2402.17214) | | 3D | | [chatGPT-maya](https://github.com/LouisRossouw/chatGPT-maya) | Simple Maya tool that utilizes open AI to perform basic tasks based on descriptive instructions. | | Maya | Model | | [CityDreamer](https://github.com/hzxie/city-dreamer) | Compositional Generative Model of Unbounded 3D Cities. |[arXiv](https://arxiv.org/abs/2309.00610) | | 3D | | [CSM](https://www.csm.ai/) | Generate 3D worlds from images and videos. | | | 3D | | [Dash](https://www.polygonflow.io/) | Your Copilot for World Building in Unreal Engine. | | Unreal Engine | 3D | +| [DreamCatalyst](https://github.com/kaist-cvml-lab/DreamCatalyst) | DreamCatalyst: Fast and High-Quality 3D Editing via Controlling Editability and Identity Preservation. |[arXiv](https://arxiv.org/abs/2407.11394) | | 3D | | [DreamGaussian4D](https://github.com/jiawei-ren/dreamgaussian4d) | Generative 4D Gaussian Splatting. |[arXiv](https://arxiv.org/abs/2312.17142) | | 4D | | [DUSt3R](https://github.com/naver/dust3r) | Geometric 3D Vision Made Easy. |[arXiv](https://arxiv.org/abs/2312.14132) | | 3D | | [GALA3D](https://github.com/VDIGPKU/GALA3D) | GALA3D: Towards Text-to-3D Complex Scene Generation via Layout-guided Generative Gaussian Splatting. |[arXiv](https://arxiv.org/abs/2402.07207) | | 3D | +| [GaussCtrl](https://github.com/ActiveVisionLab/gaussctrl) | GaussCtrl: Multi-View Consistent Text-Driven 3D Gaussian Splatting Editing. |[arXiv](https://arxiv.org/abs/2403.08733) | | 3D | | [GaussianCube](https://github.com/GaussianCube/GaussianCube) | A Structured and Explicit Radiance Representation for 3D Generative Modeling. |[arXiv](https://arxiv.org/abs/2403.19655) | | 3D | | [GaussianDreamer](https://github.com/hustvl/GaussianDreamer) | Fast Generation from Text to 3D Gaussian Splatting with Point Cloud Priors. |[arXiv](https://arxiv.org/abs/2310.08529) | | 3D | | [GenieLabs](https://www.genielabs.tech/) | Empower your game with AI-UGC. | | | 3D | @@ -448,6 +458,7 @@ Here we will keep track of the latest AI Game Development Tools, including LLM, | [GeneFace++](https://github.com/yerfor/GeneFacePlusPlus) | Generalized and Stable Real-Time 3D Talking Face Generation. | | | Avatar | | [Hallo](https://github.com/fudan-generative-vision/hallo) | Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation. |[arXiv](https://arxiv.org/abs/2406.08801) | | Avatar | | [HeadSculpt](https://brandonhan.uk/HeadSculpt/) | Crafting 3D Head Avatars with Text. |[arXiv](https://arxiv.org/abs/2306.03038) | | Avatar | +| [IntrinsicAvatar](https://github.com/taconite/IntrinsicAvatar) | IntrinsicAvatar: Physically Based Inverse Rendering of Dynamic Humans from Monocular Videos via Explicit Ray Tracing. |[arXiv](https://arxiv.org/abs/2312.05210) | | Avatar | | [Linly-Talker](https://github.com/Kedreamix/Linly-Talker) | Digital Avatar Conversational System. | | | Avatar | | [LivePortrait](https://github.com/KwaiVGI/LivePortrait) | LivePortrait: Efficient Portrait Animation with Stitching and Retargeting Control. |[arXiv](https://arxiv.org/abs/2407.03168) | | Avatar | | [MotionGPT](https://github.com/OpenMotionLab/MotionGPT) | Human Motion as a Foreign Language, a unified motion-language generation model using LLMs. |[arXiv](https://arxiv.org/abs/2306.14795) | | Avatar | @@ -510,8 +521,10 @@ Here we will keep track of the latest AI Game Development Tools, including LLM, | [CoTracker](https://co-tracker.github.io/) | It is Better to Track Together. |[arXiv](https://arxiv.org/abs/2307.07635) | | Visual | | [FaceHi](https://m.facehi.ai/) | It is Better to Track Together. | | | Visual | | [InternLM-XComposer2](https://github.com/InternLM/InternLM-XComposer) | InternLM-XComposer2 is a groundbreaking vision-language large model (VLLM) excelling in free-form text-image composition and comprehension. |[arXiv](https://arxiv.org/abs/2404.06512) | | Visual | +| [Kangaroo](https://github.com/KangarooGroup/Kangaroo) | Kangaroo: A Powerful Video-Language Model Supporting Long-context Video Input. | | | Visual | | [LGVI](https://jianzongwu.github.io/projects/rovi/) | Towards Language-Driven Video Inpainting via Multimodal Large Language Models. | | | Visual | | [LLaVA++](https://github.com/mbzuai-oryx/LLaVA-pp) | Extending Visual Capabilities with LLaMA-3 and Phi-3. | | | Visual | +| [LongVA](https://github.com/EvolvingLMMs-Lab/LongVA) | Long Context Transfer from Language to Vision. |[arXiv](https://arxiv.org/abs/2406.16852) | | Visual | | [MaskViT](https://maskedvit.github.io/) | Masked Visual Pre-Training for Video Prediction. |[arXiv](https://arxiv.org/abs/2206.11894) | | Visual | | [MiniCPM-Llama3-V 2.5](https://github.com/OpenBMB/MiniCPM-V) | A GPT-4V Level MLLM on Your Phone. | | | Visual | | [MoE-LLaVA](https://github.com/PKU-YuanGroup/MoE-LLaVA) | Mixture of Experts for Large Vision-Language Models. |[arXiv](https://arxiv.org/abs/2401.15947) | | Visual | @@ -519,10 +532,13 @@ Here we will keep track of the latest AI Game Development Tools, including LLM, | [PLLaVA](https://github.com/magic-research/PLLaVA) | Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning. |[arXiv](https://arxiv.org/abs/2404.16994) | | Visual | | [Qwen-VL](https://github.com/QwenLM/Qwen-VL) | A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond. |[arXiv](https://arxiv.org/abs/2308.12966) | | Visual | | [ShareGPT4V](https://github.com/ShareGPT4Omni/ShareGPT4V) | Improving Large Multi-modal Models with Better Captions. |[arXiv](https://arxiv.org/abs/2311.12793) | | Visual | +| [SOLO](https://github.com/Yangyi-Chen/SOLO) | SOLO: A Single Transformer for Scalable Vision-Language Modeling. |[arXiv](https://arxiv.org/abs/2407.06438) | | Visual | +| [Video-CCAM](https://github.com/QQ-MM/Video-CCAM) | Video-CCAM: Advancing Video-Language Understanding with Causal Cross-Attention Masks. | | | Visual | | [Video-LLaVA](https://github.com/PKU-YuanGroup/Video-LLaVA) | Learning United Visual Representation by Alignment Before Projection. |[arXiv](https://arxiv.org/abs/2311.10122) | | Visual | | [VideoLLaMA 2](https://github.com/DAMO-NLP-SG/VideoLLaMA2) | Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs. |[arXiv](https://arxiv.org/abs/2406.07476) | | Visual | | [Video-MME](https://github.com/BradyFU/Video-MME) | The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis. |[arXiv](https://arxiv.org/abs/2405.21075) | | Visual | | [Vitron](https://github.com/SkyworkAI/Vitron) | A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing. | | | Visual | +| [VILA](https://github.com/NVlabs/VILA) | VILA: On Pre-training for Visual Language Models. |[arXiv](https://arxiv.org/abs/2312.07533) | | Visual |

^ Back to Contents ^

@@ -664,12 +680,14 @@ Here we will keep track of the latest AI Game Development Tools, including LLM, | [Make-An-Audio 3](https://github.com/Text-to-Audio/Make-An-Audio-3) | Transforming Text into Audio via Flow-based Large Diffusion Transformers. |[arXiv](https://arxiv.org/abs/2305.18474) | | Audio | | [NeuralSound](https://github.com/hellojxt/NeuralSound) | Learning-based Modal Sound Synthesis with Acoustic Transfer. |[arXiv](https://arxiv.org/abs/2108.07425) | | Audio | | [OptimizerAI](https://www.optimizerai.xyz/) | Sounds for Creators, Game makers, Artists, Video makers. | | | Audio | +| [Qwen2-Audio](https://github.com/QwenLM/Qwen2-Audio) | Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud. |[arXiv](https://arxiv.org/abs/2407.10759) | | Audio | | [SEE-2-SOUND](https://github.com/see2sound/see2sound) | Zero-Shot Spatial Environment-to-Spatial Sound. |[arXiv](https://arxiv.org/abs/2406.06612) | | Audio | | [SoundStorm](https://google-research.github.io/seanet/soundstorm/examples/) | Efficient Parallel Audio Generation. |[arXiv](https://arxiv.org/abs/2305.09636) | | Audio | | [Stable Audio](https://www.stableaudio.com/) | Fast Timing-Conditioned Latent Audio Diffusion. | | | Audio | | [Stable Audio Open](https://huggingface.co/stabilityai/stable-audio-open-1.0) | Stable Audio Open 1.0 generates variable-length (up to 47s) stereo audio at 44.1kHz from text prompts. | | | Audio | | [SyncFusion](https://github.com/mcomunita/syncfusion) | SyncFusion: Multimodal Onset-synchronized Video-to-Audio Foley Synthesis. |[arXiv](https://arxiv.org/abs/2310.15247) | | Audio | | [TANGO](https://github.com/declare-lab/tango) | Text-to-Audio Generation using Instruction Tuned LLM and Latent Diffusion Model. | | | Audio | +| [VTA-LDM](https://github.com/ariesssxu/vta-ldm) | Video-to-Audio Generation with Hidden Alignment. |[arXiv](https://arxiv.org/abs/2407.07464) | | Audio | | [WavJourney](https://github.com/Audio-AGI/WavJourney) | Compositional Audio Creation with Large Language Models. |[arXiv](https://arxiv.org/abs/2307.14335) | | Audio |

^ Back to Contents ^