docs: add some more content from another list

dustland · Nov 10, 2024 · e6bc27a · e6bc27a
1 parent 04551ee
commit e6bc27a
Showing 1 changed file with 115 additions and 2 deletions.
diff --git a/README.md b/README.md
@@ -3,19 +3,124 @@
 A curated list of awesome projects, resources, and research related to **Embodied AI** and **Humanoid Robotics**.
 
 ## Contents
+
 - [Introduction](#introduction)
+- [Scene Understanding](#scene-understanding)
+- [Data Collection](#data-collection)
+- [Action Output](#action-output)
 - [Open Source Projects](#open-source-projects)
 - [Datasets](#datasets)
 - [Companies & Robots](#companies--robots)
 - [Research Papers](#research-papers)
 - [Community](#community)
 
 ## Introduction
+
 Embodied Intelligence refers to AI systems that learn and interact through physical embodiment, often in robotics platforms. This list focuses on projects combining artificial intelligence with physical interaction capabilities.
 
+## Scene Understanding
+
+### Image Processing
+
+|            | Description               | Paper                                     | Code                                                         |
+| ---------- | ------------------------- | ----------------------------------------- | ------------------------------------------------------------ |
+| SAM        | Segmentation              | [Paper](https://arxiv.org/abs/2304.02643) | [Code](https://github.com/facebookresearch/segment-anything) |
+| YOLO-World | Open-Vocabulary Detection | [Paper](https://arxiv.org/abs/2401.17270) | [Code](https://github.com/AILab-CVC/YOLO-World)              |
+
+### Point Cloud Processing
+
+|            | Description   | Paper                                     | Code                                                         |
+| ---------- | ------------- | ----------------------------------------- | ------------------------------------------------------------ |
+| SAM3D      | Segmentation  | [Paper](https://arxiv.org/abs/2306.03908) | [Code](https://github.com/Pointcept/SegmentAnything3D)       |
+| PointMixer | Understanding | [Paper](https://arxiv.org/abs/2401.17270) | [Code](https://github.com/LifeBeyondExpectations/PointMixer) |
+
+### Multi-Modal Grounding
+
+|              | Description                   | Paper                                                   | Code                                                               |
+| ------------ | ----------------------------- | ------------------------------------------------------- | ------------------------------------------------------------------ |
+| GPT4V        | MLM(Image+Language->Language) | [Paper](https://arxiv.org/abs/2303.08774)               |                                                                    |
+| Claude3-Opus | MLM(Image+Language->Language) | [Paper](https://www.anthropic.com/news/claude-3-family) |                                                                    |
+| GLaMM        | Pixel Grounding               | [Paper](https://arxiv.org/abs/2311.03356)               | [Code](https://github.com/mbzuai-oryx/groundingLMM)                |
+| All-Seeing   | Pixel Grounding               | [Paper](https://arxiv.org/abs/2402.19474)               | [Code](https://github.com/OpenGVLab/all-seeing)                    |
+| LEO          | 3D                            | [Paper](https://arxiv.org/abs/2311.12871)               | [Code](https://github.com/embodied-generalist/embodied-generalist) |
+
+## Data Collection
+
+### From Video
+
+|               | Description | Paper                                                      | Code                                      |
+| ------------- | ----------- | ---------------------------------------------------------- | ----------------------------------------- |
+| Vid2Robot     |             | [Paper](https://vid2robot.github.io/vid2robot.pdf)         |                                           |
+| RT-Trajectory |             | [Paper](https://arxiv.org/abs/2311.01977)                  |                                           |
+| MimicPlay     |             | [Paper](https://mimic-play.github.io/assets/MimicPlay.pdf) | [Code](https://github.com/j96w/MimicPlay) |
+
+### Hardware
+
+|           | Description    | Paper                                                      | Code                                                                      |
+| --------- | -------------- | ---------------------------------------------------------- | ------------------------------------------------------------------------- |
+| UMI       | Two-Fingers    | [Paper](https://arxiv.org/abs/2402.10329)                  | [Code](https://github.com/real-stanford/universal_manipulation_interface) |
+| DexCap    | Five-Fingers   | [Paper](https://dex-cap.github.io/assets/DexCap_paper.pdf) | [Code](https://github.com/j96w/DexCap)                                    |
+| HIRO Hand | Hand-over-hand | [Paper](https://sites.google.com/view/hiro-hand)           |                                                                           |
+
+### Generative Simulation
+
+|          | Description | Paper                                     | Code                                                    |
+| -------- | ----------- | ----------------------------------------- | ------------------------------------------------------- |
+| MimicGen |             | [Paper](https://arxiv.org/abs/2310.17596) | [Code](https://github.com/NVlabs/mimicgen_environments) |
+| RoboGen  |             | [Paper](https://arxiv.org/abs/2311.01455) | [Code](https://github.com/Genesis-Embodied-AI/RoboGen)  |
+
+## Action Output
+
+### Generative Imitation Learning
+
+|                  | Description | Paper                                     | Code                                                      |
+| ---------------- | ----------- | ----------------------------------------- | --------------------------------------------------------- |
+| Diffusion Policy |             | [Paper](https://arxiv.org/abs/2303.04137) | [Code](https://github.com/real-stanford/diffusion_policy) |
+| ACT              |             | [Paper](https://arxiv.org/abs/2304.13705) | [Code](https://github.com/tonyzhaozh/act)                 |
+
+### Affordance Map
+
+|                              | Description                                | Paper                                                                                                                     | Code                                                                                         |
+| ---------------------------- | ------------------------------------------ | ------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------- |
+| CLIPort                      | Pick&Place                                 | [Paper](https://arxiv.org/pdf/2109.12098.pdf)                                                                             | [Code](https://github.com/cliport/cliport)                                                   |
+| Robo-Affordances             | Contact&Post-contact trajectories          | [Paper](https://arxiv.org/abs/2304.08488)                                                                                 | [Code](https://github.com/shikharbahl/vrb)                                                   |
+| Robo-ABC                     |                                            | [Paper](https://arxiv.org/abs/2401.07487)                                                                                 | [Code](https://github.com/TEA-Lab/Robo-ABC)                                                  |
+| Where2Explore                | Few shot learning from semantic similarity | [Paper](https://proceedings.neurips.cc/paper_files/paper/2023/file/0e7e2af2e5ba822c9ad35a37b31b5dd4-Paper-Conference.pdf) |                                                                                              |
+| Move as You Say              | Affordance to motion from diffusion model  | [Paper](https://arxiv.org/pdf/2403.18036.pdf)                                                                             |                                                                                              |
+| AffordanceLLM                | Grounding affordance with LLM              | [Paper](https://arxiv.org/pdf/2401.06341.pdf)                                                                             |                                                                                              |
+| Environment-aware Affordance |                                            | [Paper](https://proceedings.neurips.cc/paper_files/paper/2023/file/bf78fc727cf882df66e6dbc826161e86-Paper-Conference.pdf) |                                                                                              |
+| OpenAD                       | Open-Voc Affordance Detection              | [Paper](https://www.csc.liv.ac.uk/~anguyen/assets/pdfs/2023_OpenAD.pdf)                                                   | [Code](https://github.com/Fsoft-AIC/Open-Vocabulary-Affordance-Detection-in-3D-Point-Clouds) |
+| RLAfford                     | End-to-End affordance learning             | [Paper](https://gengyiran.github.io/pdf/RLAfford.pdf)                                                                     |                                                                                              |
+| General Flow                 | Collect affordance from video              | [Paper](https://general-flow.github.io/general_flow.pdf)                                                                  | [Code](https://github.com/michaelyuancb/general_flow)                                        |
+| PreAffordance                | Pre-grasping planning                      | [Paper](https://arxiv.org/pdf/2404.03634.pdf)                                                                             |                                                                                              |
+| ScenFun3d                    | Fine-grained functionality                 | [Paper](https://aycatakmaz.github.io/data/SceneFun3D-preprint.pdf)                                                        | [Code](https://github.com/SceneFun3D/scenefun3d)                                             |
+
+### Question & Answer from LLM
+
+|          | Description | Paper                                         | Code                                              |
+| -------- | ----------- | --------------------------------------------- | ------------------------------------------------- |
+| COPA     |             | [Paper](https://arxiv.org/abs/2403.08248)     |                                                   |
+| ManipLLM |             | [Paper](https://arxiv.org/abs/2312.16217)     |                                                   |
+| ManipVQA |             | [Paper](https://arxiv.org/pdf/2403.11289.pdf) | [Code](https://github.com/SiyuanHuang95/ManipVQA) |
+
+### Language Corrections
+
+|          | Description | Paper                                     | Code                                           |
+| -------- | ----------- | ----------------------------------------- | ---------------------------------------------- |
+| OLAF     |             | [Paper](https://arxiv.org/pdf/2310.17555) |                                                |
+| YAYRobot |             | [Paper](https://arxiv.org/abs/2403.12910) | [Code](https://github.com/yay-robot/yay_robot) |
+
+### Planning from LLM
+
+|        | Description  | Paper                                     | Code                                                                          |
+| ------ | ------------ | ----------------------------------------- | ----------------------------------------------------------------------------- |
+| SayCan | API Level    | [Paper](https://arxiv.org/abs/2204.01691) | [Code](https://github.com/google-research/google-research/tree/master/saycan) |
+| VILA   | Prompt Level | [Paper](https://arxiv.org/abs/2311.17842) |                                                                               |
+
 ## Open Source Projects
 
 ### Simulation Environments
+
 - [Isaac Gym](https://developer.nvidia.com/isaac-gym) - NVIDIA's physics simulation environment
 - [MuJoCo](https://mujoco.org/) - Physics engine for robotics, biomechanics, and graphics
 - [PyBullet](https://pybullet.org/) - Physics simulation for robotics and machine learning
@@ -38,22 +143,25 @@ Embodied Intelligence refers to AI systems that learn and interact through physi
 ## Research Frameworks
 
 ### Learning Frameworks
+
 - [MoveIt](https://moveit.ros.org/) - Motion planning framework
 - [OMPL](https://ompl.kavrakilab.org/) - Open Motion Planning Library
 - [Drake](https://drake.mit.edu/) - Planning and control toolkit
 - [GR-1](https://gr1-manipulation.github.io/) - ByteDance Research: Unleashing Large-Scale Video Generative Pre-training
-for Visual Robot Manipulation
+  for Visual Robot Manipulation
 - [Universal Manipulation Interface(UMI)](https://umi-gripper.github.io/)
 
 ## Datasets
 
 ### Robot Learning Datasets
+
 - [RoboNet](https://www.robonet.wiki/) - Large-scale robot manipulation dataset
 - [BEHAVIOR-1K](https://behavior.stanford.edu/) - Dataset of human household activities
 - [Google Robot Data](https://github.com/google-research/robotics_datasets) - Various robotics datasets
 - [Contact-GraspNet](https://github.com/NVlabs/contact-graspnet) - Dataset for robotic grasping
 
 ### Human Motion Datasets
+
 - [AMASS](https://amass.is.tue.mpg.de/) - Large human motion dataset
 - [Human3.6M](http://vision.imar.ro/human3.6m/) - Large-scale dataset for human sensing
 - [CMU Graphics Lab Motion Capture Database](http://mocap.cs.cmu.edu/)
@@ -66,20 +174,23 @@ for Visual Robot Manipulation
 ## Companies & Robots
 
 ### Research Labs
+
 - [Berkeley AI Research (BAIR)](https://bair.berkeley.edu/)
 - [Stanford Robotics & Embodied Artificial Intelligence Lab (REAL)](http://real.stanford.edu/)
 - [MIT CSAIL](https://www.csail.mit.edu/)
 - [Google Research](https://research.google/research-areas/robotics/)
 - [UT Austin Robot Perception and Learning Lab (RPL)](https://rpl.cs.utexas.edu/)
 
 ### Companies
+
 - [Boston Dynamics](https://www.bostondynamics.com/)
 - [Agility Robotics](https://www.agilityrobotics.com/)
 - [Figure AI](https://www.figure.ai/)
 - [1X Technologies](https://www.1x.tech/)
 - [Sanctuary AI](https://www.sanctuary.ai/)
 
 ### Humanoid Robotics
+
 - [Atlas](https://www.bostondynamics.com/atlas) - Advanced humanoid robot by Boston Dynamics
 - [Digit](https://www.agilityrobotics.com/digit) - Bipedal robot by Agility Robotics
 - [Tesla Optimus](https://www.tesla.com/optimus) - Tesla's humanoid robot project
@@ -94,17 +205,19 @@ for Visual Robot Manipulation
 
 ### Researchers
 
-[Yuke Zhu](https://yukezhu.me) [Yuzhe Qin](https://yzqin.github.io/) 
+[Yuke Zhu](https://yukezhu.me) [Yuzhe Qin](https://yzqin.github.io/)
 
 ## Community
 
 ### Conferences
+
 - [RSS](http://www.roboticsconference.org/) - Robotics: Science and Systems
 - [ICRA](https://www.icra2024.org/) - International Conference on Robotics and Automation
 - [CoRL](https://www.corl2024.org/) - Conference on Robot Learning
 - [IROS](https://iros2024.org/) - International Conference on Intelligent Robots and Systems
 
 ### Forums & Discussion
+
 - [ROS Discourse](https://discourse.ros.org/)
 - [Robotics StackExchange](https://robotics.stackexchange.com/)
 - [Reddit r/robotics](https://www.reddit.com/r/robotics/)