diff --git a/README.md b/README.md index c581dc8..d7267fc 100644 --- a/README.md +++ b/README.md @@ -3,7 +3,11 @@ A curated list of awesome projects, resources, and research related to **Embodied AI** and **Humanoid Robotics**. ## Contents + - [Introduction](#introduction) +- [Scene Understanding](#scene-understanding) +- [Data Collection](#data-collection) +- [Action Output](#action-output) - [Open Source Projects](#open-source-projects) - [Datasets](#datasets) - [Companies & Robots](#companies--robots) @@ -11,11 +15,112 @@ A curated list of awesome projects, resources, and research related to **Embodie - [Community](#community) ## Introduction + Embodied Intelligence refers to AI systems that learn and interact through physical embodiment, often in robotics platforms. This list focuses on projects combining artificial intelligence with physical interaction capabilities. +## Scene Understanding + +### Image Processing + +| | Description | Paper | Code | +| ---------- | ------------------------- | ----------------------------------------- | ------------------------------------------------------------ | +| SAM | Segmentation | [Paper](https://arxiv.org/abs/2304.02643) | [Code](https://github.com/facebookresearch/segment-anything) | +| YOLO-World | Open-Vocabulary Detection | [Paper](https://arxiv.org/abs/2401.17270) | [Code](https://github.com/AILab-CVC/YOLO-World) | + +### Point Cloud Processing + +| | Description | Paper | Code | +| ---------- | ------------- | ----------------------------------------- | ------------------------------------------------------------ | +| SAM3D | Segmentation | [Paper](https://arxiv.org/abs/2306.03908) | [Code](https://github.com/Pointcept/SegmentAnything3D) | +| PointMixer | Understanding | [Paper](https://arxiv.org/abs/2401.17270) | [Code](https://github.com/LifeBeyondExpectations/PointMixer) | + +### Multi-Modal Grounding + +| | Description | Paper | Code | +| ------------ | ----------------------------- | ------------------------------------------------------- | ------------------------------------------------------------------ | +| GPT4V | MLM(Image+Language->Language) | [Paper](https://arxiv.org/abs/2303.08774) | | +| Claude3-Opus | MLM(Image+Language->Language) | [Paper](https://www.anthropic.com/news/claude-3-family) | | +| GLaMM | Pixel Grounding | [Paper](https://arxiv.org/abs/2311.03356) | [Code](https://github.com/mbzuai-oryx/groundingLMM) | +| All-Seeing | Pixel Grounding | [Paper](https://arxiv.org/abs/2402.19474) | [Code](https://github.com/OpenGVLab/all-seeing) | +| LEO | 3D | [Paper](https://arxiv.org/abs/2311.12871) | [Code](https://github.com/embodied-generalist/embodied-generalist) | + +## Data Collection + +### From Video + +| | Description | Paper | Code | +| ------------- | ----------- | ---------------------------------------------------------- | ----------------------------------------- | +| Vid2Robot | | [Paper](https://vid2robot.github.io/vid2robot.pdf) | | +| RT-Trajectory | | [Paper](https://arxiv.org/abs/2311.01977) | | +| MimicPlay | | [Paper](https://mimic-play.github.io/assets/MimicPlay.pdf) | [Code](https://github.com/j96w/MimicPlay) | + +### Hardware + +| | Description | Paper | Code | +| --------- | -------------- | ---------------------------------------------------------- | ------------------------------------------------------------------------- | +| UMI | Two-Fingers | [Paper](https://arxiv.org/abs/2402.10329) | [Code](https://github.com/real-stanford/universal_manipulation_interface) | +| DexCap | Five-Fingers | [Paper](https://dex-cap.github.io/assets/DexCap_paper.pdf) | [Code](https://github.com/j96w/DexCap) | +| HIRO Hand | Hand-over-hand | [Paper](https://sites.google.com/view/hiro-hand) | | + +### Generative Simulation + +| | Description | Paper | Code | +| -------- | ----------- | ----------------------------------------- | ------------------------------------------------------- | +| MimicGen | | [Paper](https://arxiv.org/abs/2310.17596) | [Code](https://github.com/NVlabs/mimicgen_environments) | +| RoboGen | | [Paper](https://arxiv.org/abs/2311.01455) | [Code](https://github.com/Genesis-Embodied-AI/RoboGen) | + +## Action Output + +### Generative Imitation Learning + +| | Description | Paper | Code | +| ---------------- | ----------- | ----------------------------------------- | --------------------------------------------------------- | +| Diffusion Policy | | [Paper](https://arxiv.org/abs/2303.04137) | [Code](https://github.com/real-stanford/diffusion_policy) | +| ACT | | [Paper](https://arxiv.org/abs/2304.13705) | [Code](https://github.com/tonyzhaozh/act) | + +### Affordance Map + +| | Description | Paper | Code | +| ---------------------------- | ------------------------------------------ | ------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------- | +| CLIPort | Pick&Place | [Paper](https://arxiv.org/pdf/2109.12098.pdf) | [Code](https://github.com/cliport/cliport) | +| Robo-Affordances | Contact&Post-contact trajectories | [Paper](https://arxiv.org/abs/2304.08488) | [Code](https://github.com/shikharbahl/vrb) | +| Robo-ABC | | [Paper](https://arxiv.org/abs/2401.07487) | [Code](https://github.com/TEA-Lab/Robo-ABC) | +| Where2Explore | Few shot learning from semantic similarity | [Paper](https://proceedings.neurips.cc/paper_files/paper/2023/file/0e7e2af2e5ba822c9ad35a37b31b5dd4-Paper-Conference.pdf) | | +| Move as You Say | Affordance to motion from diffusion model | [Paper](https://arxiv.org/pdf/2403.18036.pdf) | | +| AffordanceLLM | Grounding affordance with LLM | [Paper](https://arxiv.org/pdf/2401.06341.pdf) | | +| Environment-aware Affordance | | [Paper](https://proceedings.neurips.cc/paper_files/paper/2023/file/bf78fc727cf882df66e6dbc826161e86-Paper-Conference.pdf) | | +| OpenAD | Open-Voc Affordance Detection | [Paper](https://www.csc.liv.ac.uk/~anguyen/assets/pdfs/2023_OpenAD.pdf) | [Code](https://github.com/Fsoft-AIC/Open-Vocabulary-Affordance-Detection-in-3D-Point-Clouds) | +| RLAfford | End-to-End affordance learning | [Paper](https://gengyiran.github.io/pdf/RLAfford.pdf) | | +| General Flow | Collect affordance from video | [Paper](https://general-flow.github.io/general_flow.pdf) | [Code](https://github.com/michaelyuancb/general_flow) | +| PreAffordance | Pre-grasping planning | [Paper](https://arxiv.org/pdf/2404.03634.pdf) | | +| ScenFun3d | Fine-grained functionality | [Paper](https://aycatakmaz.github.io/data/SceneFun3D-preprint.pdf) | [Code](https://github.com/SceneFun3D/scenefun3d) | + +### Question & Answer from LLM + +| | Description | Paper | Code | +| -------- | ----------- | --------------------------------------------- | ------------------------------------------------- | +| COPA | | [Paper](https://arxiv.org/abs/2403.08248) | | +| ManipLLM | | [Paper](https://arxiv.org/abs/2312.16217) | | +| ManipVQA | | [Paper](https://arxiv.org/pdf/2403.11289.pdf) | [Code](https://github.com/SiyuanHuang95/ManipVQA) | + +### Language Corrections + +| | Description | Paper | Code | +| -------- | ----------- | ----------------------------------------- | ---------------------------------------------- | +| OLAF | | [Paper](https://arxiv.org/pdf/2310.17555) | | +| YAYRobot | | [Paper](https://arxiv.org/abs/2403.12910) | [Code](https://github.com/yay-robot/yay_robot) | + +### Planning from LLM + +| | Description | Paper | Code | +| ------ | ------------ | ----------------------------------------- | ----------------------------------------------------------------------------- | +| SayCan | API Level | [Paper](https://arxiv.org/abs/2204.01691) | [Code](https://github.com/google-research/google-research/tree/master/saycan) | +| VILA | Prompt Level | [Paper](https://arxiv.org/abs/2311.17842) | | + ## Open Source Projects ### Simulation Environments + - [Isaac Gym](https://developer.nvidia.com/isaac-gym) - NVIDIA's physics simulation environment - [MuJoCo](https://mujoco.org/) - Physics engine for robotics, biomechanics, and graphics - [PyBullet](https://pybullet.org/) - Physics simulation for robotics and machine learning @@ -38,22 +143,25 @@ Embodied Intelligence refers to AI systems that learn and interact through physi ## Research Frameworks ### Learning Frameworks + - [MoveIt](https://moveit.ros.org/) - Motion planning framework - [OMPL](https://ompl.kavrakilab.org/) - Open Motion Planning Library - [Drake](https://drake.mit.edu/) - Planning and control toolkit - [GR-1](https://gr1-manipulation.github.io/) - ByteDance Research: Unleashing Large-Scale Video Generative Pre-training -for Visual Robot Manipulation + for Visual Robot Manipulation - [Universal Manipulation Interface(UMI)](https://umi-gripper.github.io/) ## Datasets ### Robot Learning Datasets + - [RoboNet](https://www.robonet.wiki/) - Large-scale robot manipulation dataset - [BEHAVIOR-1K](https://behavior.stanford.edu/) - Dataset of human household activities - [Google Robot Data](https://github.com/google-research/robotics_datasets) - Various robotics datasets - [Contact-GraspNet](https://github.com/NVlabs/contact-graspnet) - Dataset for robotic grasping ### Human Motion Datasets + - [AMASS](https://amass.is.tue.mpg.de/) - Large human motion dataset - [Human3.6M](http://vision.imar.ro/human3.6m/) - Large-scale dataset for human sensing - [CMU Graphics Lab Motion Capture Database](http://mocap.cs.cmu.edu/) @@ -66,6 +174,7 @@ for Visual Robot Manipulation ## Companies & Robots ### Research Labs + - [Berkeley AI Research (BAIR)](https://bair.berkeley.edu/) - [Stanford Robotics & Embodied Artificial Intelligence Lab (REAL)](http://real.stanford.edu/) - [MIT CSAIL](https://www.csail.mit.edu/) @@ -73,6 +182,7 @@ for Visual Robot Manipulation - [UT Austin Robot Perception and Learning Lab (RPL)](https://rpl.cs.utexas.edu/) ### Companies + - [Boston Dynamics](https://www.bostondynamics.com/) - [Agility Robotics](https://www.agilityrobotics.com/) - [Figure AI](https://www.figure.ai/) @@ -80,6 +190,7 @@ for Visual Robot Manipulation - [Sanctuary AI](https://www.sanctuary.ai/) ### Humanoid Robotics + - [Atlas](https://www.bostondynamics.com/atlas) - Advanced humanoid robot by Boston Dynamics - [Digit](https://www.agilityrobotics.com/digit) - Bipedal robot by Agility Robotics - [Tesla Optimus](https://www.tesla.com/optimus) - Tesla's humanoid robot project @@ -94,17 +205,19 @@ for Visual Robot Manipulation ### Researchers -[Yuke Zhu](https://yukezhu.me) [Yuzhe Qin](https://yzqin.github.io/) +[Yuke Zhu](https://yukezhu.me) [Yuzhe Qin](https://yzqin.github.io/) ## Community ### Conferences + - [RSS](http://www.roboticsconference.org/) - Robotics: Science and Systems - [ICRA](https://www.icra2024.org/) - International Conference on Robotics and Automation - [CoRL](https://www.corl2024.org/) - Conference on Robot Learning - [IROS](https://iros2024.org/) - International Conference on Intelligent Robots and Systems ### Forums & Discussion + - [ROS Discourse](https://discourse.ros.org/) - [Robotics StackExchange](https://robotics.stackexchange.com/) - [Reddit r/robotics](https://www.reddit.com/r/robotics/)